This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

EM and SAGE algorithms for DOA estimation in the presence of unknown uniform noise

Ming-yan Gong and Bin Lyu M. Gong is with the School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China (e-mail: [email protected]).Bin Lyu is with the Key Laboratory of Ministry of Education in Broadband Wireless Communication and Sensor Network Technology, Nanjing University of Posts and Telecommunications, Nanjing 210003, China (e-mail: [email protected]).
Abstract

The expectation-maximization (EM) and space-alternating generalized EM (SAGE) algorithms have been applied to direction of arrival (DOA) estimation in known noise. In this work, the two algorithms are proposed for DOA estimation in unknown uniform noise. Both the deterministic and stochastic signal models are considered. Moreover, a modified EM (MEM) algorithm applicable to the noise assumption is also proposed. These proposed algorithms are improved to ensure the stability when the powers of sources are unequal. After being improved, numerical results illustrate that the EM algorithm has similar convergence with the MEM algorithm and the SAGE algorithm outperforms the EM and MEM algorithms for the deterministic signal model. Furthermore, numerical results show that processing the same samples from the stochastic signal model, the SAGE algorithm for the deterministic signal model requires the fewest iterations.

Index Terms:
Array signal processing, DOA estimation, EM algorithm, Maximum likelihood, Statistical signal processing.

I Introduction

Direction of arrival (DOA) estimation is an important part of array signal processing and some high resolution estimation techniques have been proposed. In particular, the maximum-likelihood (ML) technique plays a critical role due to its superior performance. However, ML direction finding problems are non-convex and difficult to obtain their closed-form solutions, thus leading to various iterative methods of solution [1], [2].

One computationally efficient method to compute ML estimators is the expectation-maximization (EM) algorithm in [3]. [4] and [5] have employed the EM algorithm to solve ML direction finding problems. The EM algorithm consists of two sequential steps at every iteration [4], [5]: an expectation step (E-step) is to estimate the complete-data sufficient statistics by finding their conditional expectations, and a maximization step (M-step) is to estimate the signal parameters by parallel maximizations. However, the EM algorithm updates all of the parameter estimates simultaneously, which results in slow convergence. In order to speed up the convergence of the EM algorithm, [6] proposes the space-alternating generalized EM (SAGE) algorithm. [7] and [8] show that the SAGE algorithm does yield faster convergence in terms of DOA estimation.

The EM and SAGE algorithms in ML direction finding are usually derived under known noise [4], [5], [7], [8]. Know noise is with a known statistical model and without unknown parameters, which may be unrealistic in certain applications. In fact, many seminal works in ML direction finding consider the so-called unknown uniform noise model [1], [2], [9][11]. The covariance matrix of unknown uniform noise can be expressed as σ𝐈N\sigma\mathbf{I}_{N}, where σ\sigma is an unknown common variance and 𝐈N\mathbf{I}_{N} is the NN-order identity matrix. Under this noise assumption, [9] presents a computationally attractive alternating projection algorithm for computing the deterministic ML estimator, [10] investigates the statistical performance of this ML estimator and derives the Cramer-Rao lower bound, [11] compares some statistical properties of both the deterministic and stochastic ML estimators. In addition to uniform noise, nonuniform noise has also attracted increasing attention. Nonuniform noise has an arbitrary diagonal covariance matrix and thus makes associated ML direction finding problems complex. For efficiently computing the deterministic and stochastic ML estimators in unknown nonuniform noise, [12] and [13] have presented two alternating optimization algorithms, respectively.

In this work, we develop the EM and SAGE algorithms in ML direction finding for unknown uniform noise. Theoretical analyses indicate that σ\sigma has little effect on the two algorithms for the deterministic signal model. However, the M-step in the EM algorithm for the stochastic signal model can be no longer simplified to parallel subproblems easily when σ\sigma is unknown. Hence, we divide the M-step into two conditional maximization steps (CM-steps) based on the expectation-CM (ECM) algorithm [14]. Besides, we propose a modified EM (MEM) algorithm applicable to unknown uniform noise. Note that although the EM algorithm in [15] is similar to the MEM algorithm, it is incorrectly derived.

Existing simulations about the EM and SAGE algorithms always adopt sources of equal power [4], [5], [7], [8], [15]. However, we find that when the powers of sources are unequal, multiple DOA estimates obtained by the EM, MEM, and SAGE algorithms tend to be consistent with the true DOA of the source with the largest power. Hence, we improve these algorithms. After being improved, numerical results illustrate that 1) the EM algorithm has similar convergence with the MEM algorithm, 2) the SAGE algorithm outperforms the EM and MEM algorithms for the deterministic signal model, i.e., the SAGE algorithm converges faster and is more efficient for avoiding the convergence to an unwanted stationary point of the log-likelihood function, and 3) the SAGE algorithm cannot always outperform the EM and MEM algorithms for the stochastic signal model.

The EM, MEM, and SAGE algorithms for the deterministic signal model can process samples from the stochastic signal model. Hence, we via simulation compare the convergence of the EM and SAGE algorithms for both models. Numerical results illustrate that under the same samples, initial DOA estimates, and stopping criterion, the SAGE algorithm for the deterministic signal model requires the fewest iterations.

The contributions of this work are summarized as follows:

  • We develop the EM and SAGE algorithms in ML direction finding for unknown uniform noise. In particular, we derive the SAGE algorithm for the stochastic signal model, which is not discussed in [6], [7], and [15].

  • We propose an MEM algorithm applicable to the unknown uniform noise assumption.

  • We improve the EM, MEM, and SAGE algorithms to ensure the stability when the powers of sources are unequal.

  • We via simulation show that the EM algorithm has similar convergence with the MEM algorithm and the SAGE algorithm outperforms the EM and MEM algorithms for the deterministic signal model. However, the SAGE algorithm cannot always outperform the EM and MEM algorithms for the stochastic signal model.

  • We via simulation show that processing the same samples from the stochastic signal model, the SAGE algorithm for the deterministic signal model requires the fewest iterations.

Notations: 𝐚T\mathbf{a}^{T}, 𝐚H\mathbf{a}^{H}, and 𝐚\|\mathbf{a}\| are the transposition, conjugate transposition, and Euclidean norm of a vector 𝐚\mathbf{a}, respectively. 𝟎=[00]T\mathbf{0}=[0~{}\cdots~{}0]^{T} and 𝟏=[11]T\mathbf{1}=[1~{}\cdots~{}1]^{T}. 𝐀1\mathbf{A}^{-1}, |𝐀||\mathbf{A}|, and Tr{𝐀}\mathrm{Tr}\{\mathbf{A}\} are the inversion, determinant, and trace of a square matrix 𝐀\mathbf{A}, respectively. 𝟎N\mathbf{0}_{N} is the NN-order zero matrix. 𝐀𝟎N\mathbf{A}\geq\mathbf{0}_{N} and 𝐀>𝟎N\mathbf{A}>\mathbf{0}_{N} denote that the NN-order square matrix 𝐀\mathbf{A} is positive semi-definite and definite, respectively. 𝔼{}\mathbb{E}\{\cdot\} and 𝔻{}\mathbb{D}\{\cdot\} denote expectation and variance, respectively. ȷ\jmath is the imaginary unit.

II Data Model and Problem Formulation

We consider an array composed of NN isotropic sensors receiving the signals emitted by MM narrowband far-field sources with the same known center wavelength λ\lambda. The array geometry is arbitrary and known in a Cartesian coordinate system and let the nnth sensor be at a known position 𝐩n=[xnynzn]T\mathbf{p}_{n}=[x_{n}~{}y_{n}~{}z_{n}]^{T} close to the origin. Spherical coordinates are used to show the DOAs of the MM sources. The elevation and azimuth angles of the mmth source are denoted by φm[0,π]\varphi_{m}\in[0,\pi] and ϕm[0,2π)\phi_{m}\in[0,2\pi), respectively. Thus, its unit directional vector can be expressed as 𝐪m=[sin(φm)cos(ϕm)sin(φm)sin(ϕm)cos(φm)]T\mathbf{q}_{m}=[\sin(\varphi_{m})\cos(\phi_{m})~{}\sin(\varphi_{m})\sin(\phi_{m})~{}\cos(\varphi_{m})]^{T} in the Cartesian coordinate system.

We use the origin as the reference point of the array. Then, the phase difference, with respect to the mmth source, between the two signals, respectively, received at the origin and the nnth sensor can be approximated as ψn,m=2πλ𝐩nT𝐪m\psi_{n,m}=-\frac{2\pi}{\lambda}\mathbf{p}_{n}^{T}\mathbf{q}_{m}. After being down-converted to baseband, the received signal vector of the array can be characterized by [1], [2]

𝐲(t)=m=1M𝐚(θm)sm(t)+𝐰(t),\displaystyle\mathbf{y}(t)={\sum}_{m=1}^{M}\mathbf{a}(\theta_{m})s_{m}(t)+\mathbf{w}(t), (1)

where θm=[φmϕm]TΩ\theta_{m}=[\varphi_{m}~{}\phi_{m}]^{T}\in\Omega, 𝐚(θm)=[eȷψ1,meȷψN,m]T\mathbf{a}(\theta_{m})=[e^{\jmath\psi_{1,m}}~{}\cdots~{}e^{\jmath\psi_{N,m}}]^{T}, sm(t)s_{m}(t) is the mmth source baseband signal received at the origin and its power is PmP_{m}, 𝐰(t)𝒞𝒩(0,σ𝐈N)\mathbf{w}(t)\sim\mathcal{CN}(\textbf{0},\sigma\mathbf{I}_{N}) represents a white Gaussian noise vector with a common noise variance σ\sigma.

EM-type algorithms require the definition of the underlying complete data and their associated log-likelihood functions. According to the EM paradigm in [4] and [5], after sampling we design the samples (“snapshots”) or incomplete data as

𝐲(t)\displaystyle\mathbf{y}(t) =\displaystyle= m=1M[𝐚(θm)sm(t)+𝐰m(t)]\displaystyle{\sum}_{m=1}^{M}\big{[}\mathbf{a}(\theta_{m})s_{m}(t)+\mathbf{w}_{m}(t)\big{]} (2)
=\displaystyle= m=1M𝐱m(t),t=1,2,,T,\displaystyle{\sum}_{m=1}^{M}\mathbf{x}_{m}(t),t=1,2,\dots,T,

where 𝐰m(t)𝒞𝒩(0,αmσ𝐈N)\mathbf{w}_{m}(t)\sim\mathcal{CN}(\textbf{0},\alpha_{m}\sigma\mathbf{I}_{N}) and the 𝐰m(t)\mathbf{w}_{m}(t)’s are mutually uncorrelated, 𝜶=[α1αM]T𝟎\boldsymbol{\alpha}=[\alpha_{1}~{}\cdots~{}\alpha_{M}]^{T}\geq\mathbf{0} and 𝟏T𝜶=1\mathbf{1}^{T}\boldsymbol{\alpha}=1, the 𝐱m(t)\mathbf{x}_{m}(t)’s are the complete data, TT is the number of samples.

Since the joint probability density functions (PDFs) of the incomplete and complete data depend also on the statistical model of the sm(t)s_{m}(t)’s, we consider the deterministic and stochastic signal models separately.

II-A Deterministic Signal Model

The deterministic signal model requires the deterministic, arbitrary, and unknown sm(t)s_{m}(t)’s [4], [5], [9][11], i.e., 𝐱m(t)𝒞𝒩(𝐚(θm)sm(t),αmσ𝐈N)\mathbf{x}_{m}(t)\sim\mathcal{CN}(\mathbf{a}(\theta_{m})s_{m}(t),\alpha_{m}\sigma\mathbf{I}_{N}), 𝐲(t)𝒞𝒩(𝐀(𝜽)𝐬(t),σ𝐈N)\mathbf{y}(t)\sim\mathcal{CN}(\mathbf{A}(\boldsymbol{\theta})\mathbf{s}(t),\sigma\mathbf{I}_{N}) with 𝐀(𝜽)=[𝐚(θ1)𝐚(θM)]\mathbf{A}(\boldsymbol{\theta})=[\mathbf{a}(\theta_{1})~{}\cdots~{}\mathbf{a}(\theta_{M})], 𝜽=[θ1θM]𝛀(𝛀=ΩM)\boldsymbol{\theta}=[\theta_{1}~{}\cdots~{}\theta_{M}]\in\boldsymbol{\Omega}~{}(\boldsymbol{\Omega}=\Omega^{M}), and 𝐬(t)=[s1(t)sM(t)]T\mathbf{s}(t)=[s_{1}(t)~{}\cdots~{}s_{M}(t)]^{T}. Then, the incomplete- and complete-data log-likelihood functions are, respectively, expressed as

(𝚯,σ)=lnp(𝐘;𝜽,𝐒,σ)\displaystyle\mathcal{L}(\boldsymbol{\Theta},\sigma)=\ln p(\mathbf{Y};\boldsymbol{\theta},\mathbf{S},\sigma)
=\displaystyle= TNln(πσ)1σt=1T𝐲(t)𝐀(𝜽)𝐬(t)2,\displaystyle-TN\ln(\pi\sigma)-\frac{1}{\sigma}\sum_{t=1}^{T}\|\mathbf{y}(t)-\mathbf{A}(\boldsymbol{\theta})\mathbf{s}(t)\|^{2}, (3a)
𝒬(𝐗1,,𝐗M,𝚯,σ)=lnp(𝐗1,,𝐗M;𝜽,𝐒,σ)\displaystyle\mathcal{Q}(\mathbf{X}_{1},\dots,\mathbf{X}_{M},\boldsymbol{\Theta},\sigma)=\ln p(\mathbf{X}_{1},\dots,\mathbf{X}_{M};\boldsymbol{\theta},\mathbf{S},\sigma)
=\displaystyle= TMNln(πσ)TNm=1Mln(αm)\displaystyle-TMN\ln(\pi\sigma)-TN\sum_{m=1}^{M}\ln(\alpha_{m})- (3b)
1σt=1Tm=1M1αm𝐱m(t)𝐚(θm)sm(t)2,\displaystyle\frac{1}{\sigma}\sum_{t=1}^{T}\sum_{m=1}^{M}\frac{1}{\alpha_{m}}\|\mathbf{x}_{m}(t)-\mathbf{a}(\theta_{m})s_{m}(t)\|^{2},

where 𝐒=[𝐬(1)𝐬(T)]\mathbf{S}=[\mathbf{s}(1)~{}\cdots~{}\mathbf{s}(T)], 𝐘=[𝐲(1)𝐲(T)]\mathbf{Y}=[\mathbf{y}(1)~{}\cdots~{}\mathbf{y}(T)], 𝐗m=[𝐱m(1)𝐱m(T)]\mathbf{X}_{m}=[\mathbf{x}_{m}(1)~{}\cdots~{}\mathbf{x}_{m}(T)], and 𝚯=(𝜽,𝐒)\boldsymbol{\Theta}=(\boldsymbol{\theta},\mathbf{S}) denotes the signal parameters while σ\sigma is the only noise parameter. Thus, the ML estimation problem, i.e., max𝜽𝛀,𝐒,σ>0(𝚯,σ)\max_{\boldsymbol{\theta}\in\boldsymbol{\Omega},\mathbf{S},\sigma>0}\mathcal{L}(\boldsymbol{\Theta},\sigma), can be simplified to

min𝜽𝛀,𝐒,σ>0Nln(σ)+1σTt=1T𝐲(t)𝐀(𝜽)𝐬(t)2.\displaystyle\mathop{\min_{\boldsymbol{\theta}\in\boldsymbol{\Omega},\mathbf{S},\sigma>0}}N\ln(\sigma)+\frac{1}{\sigma T}\sum_{t=1}^{T}\big{\|}\mathbf{y}(t)-\mathbf{A}(\boldsymbol{\theta})\mathbf{s}(t)\big{\|}^{2}. (4)

II-B Stochastic Signal Model

In the stochastic signal model, sm(t)𝒞𝒩(0,Pm)s_{m}(t)\sim\mathcal{CN}(0,P_{m}). For simplicity, all of the sm(t)s_{m}(t)’s and 𝐰m(t)\mathbf{w}_{m}(t)’s are assumed to be mutually uncorrelated [4], [5], i.e., 𝐱m(t)𝒞𝒩(0,𝐂m)\mathbf{x}_{m}(t)\sim\mathcal{CN}(\textbf{0},\mathbf{C}_{m}) with 𝐂m=Pm𝐚(θm)𝐚H(θm)+αmσ𝐈N\mathbf{C}_{m}=P_{m}\mathbf{a}(\theta_{m})\mathbf{a}^{H}(\theta_{m})+\alpha_{m}\sigma\mathbf{I}_{N}, and 𝐲(t)𝒞𝒩(𝟎,𝐂y)\mathbf{y}(t)\sim\mathcal{CN}(\mathbf{0},\mathbf{C}_{y}) with 𝐂y=m=1M𝐂m\mathbf{C}_{y}=\sum_{m=1}^{M}\mathbf{C}_{m}. Then, the incomplete- and complete-data log-likelihood functions are, respectively, expressed as

(𝚯,σ)=lnp(𝐘;𝜽,𝐏,σ)\displaystyle\mathcal{L}(\boldsymbol{\Theta},\sigma)=\ln p(\mathbf{Y};\boldsymbol{\theta},\mathbf{P},\sigma)
=\displaystyle= TNln(π)Tln|𝐂y|t=1T𝐲H(t)𝐂y1𝐲(t),\displaystyle-TN\ln(\pi)-T\ln|\mathbf{C}_{y}|-\sum_{t=1}^{T}\mathbf{y}^{H}(t)\mathbf{C}_{y}^{-1}\mathbf{y}(t), (5a)
𝒬(𝐗1,,𝐗M,𝚯,σ)=lnp(𝐗1,,𝐗M;𝜽,𝐏,σ)\displaystyle\mathcal{Q}(\mathbf{X}_{1},\dots,\mathbf{X}_{M},\boldsymbol{\Theta},\sigma)=\ln p(\mathbf{X}_{1},\dots,\mathbf{X}_{M};\boldsymbol{\theta},\mathbf{P},\sigma)
=\displaystyle= TMNln(π)Tm=1Mln|𝐂m|\displaystyle-TMN\ln(\pi)-T\sum_{m=1}^{M}\ln|\mathbf{C}_{m}|- (5b)
t=1Tm=1M𝐱mH(t)𝐂m1𝐱m(t),\displaystyle\sum_{t=1}^{T}\sum_{m=1}^{M}\mathbf{x}^{H}_{m}(t)\mathbf{C}_{m}^{-1}\mathbf{x}_{m}(t),

where 𝐏=[P1PM]T\mathbf{P}=[P_{1}~{}\cdots~{}P_{M}]^{T} and 𝚯=(𝜽,𝐏)\boldsymbol{\Theta}=(\boldsymbol{\theta},\mathbf{P}). Finally, the ML estimation problem can be simplified to

min𝜽𝛀,𝐏𝟎,σ>0ln|𝐂y|+Tr{𝐂y1𝐑^y},\displaystyle\min_{\boldsymbol{\theta}\in\boldsymbol{\Omega},\mathbf{P}\geq\mathbf{0},\sigma>0}\ln|\mathbf{C}_{y}|+\mathrm{Tr}\big{\{}\mathbf{C}_{y}^{-1}\hat{\mathbf{R}}_{y}\big{\}}, (6)

where 𝐑^y=1Tt=1T𝐲(t)𝐲H(t)𝟎N\hat{\mathbf{R}}_{y}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{y}(t)\mathbf{y}^{H}(t)\geq\mathbf{0}_{N} is the sample covariance matrix.

III EM Algorithm

In this section, we derive the EM algorithm for solving problems (4) and (6). For convenience, we define the following notations:

  • 1)

    ()(k)(\cdot)^{(k)} denotes an iterative value obtained during the kkth iteration and ()(0)(\cdot)^{(0)} denotes an initial value.

  • 2)

    𝐑^m=1Tt=1T𝐱m(t)𝐱mH(t)𝟎N\hat{\mathbf{R}}_{m}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{x}_{m}(t)\mathbf{x}^{H}_{m}(t)\geq\mathbf{0}_{N}.

  • 3)

    𝚷m=1N𝐚(θm)𝐚H(θm)\boldsymbol{\Pi}_{m}=\frac{1}{N}\mathbf{a}(\theta_{m})\mathbf{a}^{H}(\theta_{m}).

  • 4)

    em(k)=Tr{𝚷m(k)𝐑^m(k)}0e^{(k)}_{m}=\mathrm{Tr}\big{\{}\boldsymbol{\Pi}^{(k)}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\geq 0.

  • 5)

    dm(k)=Tr{(𝐈N𝚷m(k))𝐑^m(k)}=Tr{𝐑^m(k)}em(k)0d^{(k)}_{m}=\mathrm{Tr}\big{\{}\big{(}\mathbf{I}_{N}-\boldsymbol{\Pi}^{(k)}_{m}\big{)}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}=\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}-e^{(k)}_{m}\geq 0.

III-A Deterministic Signal Model

At the kkth E-step, the conditional expectation of (3b) is computed by

𝔼{𝒬(𝐗1,,𝐗M,𝚯,σ)|𝐘;𝚯(k1),σ(k1)}\displaystyle\mathbb{E}\left\{\mathcal{Q}(\mathbf{X}_{1},\dots,\mathbf{X}_{M},\boldsymbol{\Theta},\sigma)|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}\right\}
=\displaystyle= T{MNln(πσ)+Nm=1Mln(αm)+1σ[c(k)+\displaystyle-T\Big{\{}MN\ln(\pi\sigma)+N\sum_{m=1}^{M}\ln(\alpha_{m})+\frac{1}{\sigma}\big{[}c^{(k)}+ (7)
1Tt=1Tm=1M1αm𝐱m(k)(t)𝐚(θm)sm(t)2]},\displaystyle\frac{1}{T}\sum_{t=1}^{T}\sum_{m=1}^{M}\frac{1}{\alpha_{m}}\|\mathbf{x}^{(k)}_{m}(t)-\mathbf{a}(\theta_{m})s_{m}(t)\|^{2}\big{]}\Big{\}},

where the conditional PDF of 𝐱m(t)\mathbf{x}_{m}(t) can be derived from [17] and

𝐱m(k)(t)=\displaystyle\mathbf{x}^{(k)}_{m}(t)= 𝔼{𝐱m(t)|𝐘;𝚯(k1),σ(k1)}\displaystyle\mathbb{E}\left\{\mathbf{x}_{m}(t)|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}\right\}
=\displaystyle= αm[𝐲(t)𝐀(𝜽(k1))𝐬(k1)(t)]\displaystyle\alpha_{m}\big{[}\mathbf{y}(t)-\mathbf{A}(\boldsymbol{\theta}^{(k-1)})\mathbf{s}^{(k-1)}(t)\big{]} (8a)
+𝐚(θm(k1))sm(k1)(t),m,t,\displaystyle+\mathbf{a}(\theta^{(k-1)}_{m})s^{(k-1)}_{m}(t),\forall m,t,
c(k)\displaystyle c^{(k)} =\displaystyle= 1Tt=1Tm=1M1αmTr{𝔻{𝐱m(t)|𝐘;𝚯(k1),σ(k1)}}\displaystyle\frac{1}{T}\sum_{t=1}^{T}\sum_{m=1}^{M}\frac{1}{\alpha_{m}}\mathrm{Tr}\left\{\mathbb{D}\big{\{}\mathbf{x}_{m}(t)|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}\big{\}}\right\} (8b)
=\displaystyle= 1Tt=1Tm=1M1αmTr{αm(1αm)σ(k1)𝐈N}\displaystyle\frac{1}{T}\sum_{t=1}^{T}\sum_{m=1}^{M}\frac{1}{\alpha_{m}}\mathrm{Tr}\left\{\alpha_{m}(1-\alpha_{m})\sigma^{(k-1)}\mathbf{I}_{N}\right\}
=\displaystyle= N(M1)σ(k1).\displaystyle N(M-1)\sigma^{(k-1)}.

At the kkth M-step, based on (7) the EM algorithm updates the estimates of 𝚯\boldsymbol{\Theta} and σ\sigma by

min𝜽𝛀,𝐒,σ>0\displaystyle\min_{\boldsymbol{\theta}\in\boldsymbol{\Omega},\mathbf{S},\sigma>0} MNln(σ)+1σ[c(k)+\displaystyle MN\ln(\sigma)+\frac{1}{\sigma}\big{[}c^{(k)}+ (9)
1Tt=1Tm=1M1αm𝐱m(k)(t)𝐚(θm)sm(t)2],\displaystyle\frac{1}{T}\sum_{t=1}^{T}\sum_{m=1}^{M}\frac{1}{\alpha_{m}}\|\mathbf{x}^{(k)}_{m}(t)-\mathbf{a}(\theta_{m})s_{m}(t)\|^{2}\big{]},

which can be solved easily in a separable fashion. Then, we have the parameter estimates:

θm(k)=argmaxθmΩTr{𝚷m𝐑^m(k)},m,\displaystyle\theta^{(k)}_{m}=\arg\max_{\theta_{m}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m, (10a)
sm(k)(t)=1N𝐚H(θm(k))𝐱m(k)(t),m,t,\displaystyle s^{(k)}_{m}(t)=\frac{1}{N}\mathbf{a}^{H}(\theta^{(k)}_{m})\mathbf{x}^{(k)}_{m}(t),\forall m,t, (10b)
σ(k)=(11M)σ(k1)+1MNm=1Mdm(k)/αm,\displaystyle\sigma^{(k)}=(1-\frac{1}{M})\sigma^{(k-1)}+\frac{1}{MN}\sum_{m=1}^{M}d^{(k)}_{m}/\alpha_{m}, (10c)

where 𝐑^m(k)=1Tt=1T𝐱m(k)(t)[𝐱m(k)(t)]H\hat{\mathbf{R}}^{(k)}_{m}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{x}^{(k)}_{m}(t)[\mathbf{x}^{(k)}_{m}(t)]^{H} and σ(k)>0\sigma^{(k)}>0 if σ(k1)>0\sigma^{(k-1)}>0.

Remark 1.

Eliminating σ\sigma, problem (4) can be simplified to

min𝜽𝛀,𝐒1Tt=1T𝐲(t)𝐀(𝜽)𝐬(t)2,\displaystyle\min_{\boldsymbol{\theta}\in\boldsymbol{\Omega},\mathbf{S}}\frac{1}{T}\sum_{t=1}^{T}\big{\|}\mathbf{y}(t)-\mathbf{A}(\boldsymbol{\theta})\mathbf{s}(t)\big{\|}^{2}, (11)

which indicates that for the deterministic signal model, the ML estimator of 𝚯\boldsymbol{\Theta} is unrelated to σ\sigma. Additionally, note that the 𝐱m(k)(t)\mathbf{x}^{(k)}_{m}(t)’s in (8a), the θm(k)\theta^{(k)}_{m}’s in (10a), and the sm(k)(t)s^{(k)}_{m}(t)’s in (10b) are unrelated to σ(k1)\sigma^{(k-1)} and respectively identical with these in the EM algorithm under the known σ\sigma [5], [8]. Then, we can conclude that the EM algorithm under the unknown σ\sigma is equivalent to that under the known σ\sigma when not estimating σ\sigma. Accordingly, (10c) can be omitted when the algorithm does not consider the nuisance parameter σ\sigma.

III-B Stochastic Signal Model

Note that 𝐑^m\hat{\mathbf{R}}_{m} is a complete-data sufficient statistic for 𝐂m\mathbf{C}_{m} [5], [16], which contains θm\theta_{m}, PmP_{m}, and σ\sigma. At the kkth E-step, the EM algorithm estimates the 𝐑^m\hat{\mathbf{R}}_{m}’s by finding their conditional expectations [3]:

𝐑^m(k)=\displaystyle\hat{\mathbf{R}}^{(k)}_{m}= 𝔼{𝐑^m|𝐘;𝚯(k1),σ(k1)}\displaystyle\mathbb{E}\left\{\hat{\mathbf{R}}_{m}|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}\right\}
=\displaystyle= 𝐂m(k1)[𝐂y(k1)]1𝐑^y[𝐂y(k1)]1𝐂m(k1)+\displaystyle\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\hat{\mathbf{R}}_{y}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}+ (12)
{𝐂m(k1)𝐂m(k1)[𝐂y(k1)]1𝐂m(k1)},m,\displaystyle\left\{\mathbf{C}^{(k-1)}_{m}-\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}\right\},\forall m,

where

𝔼{𝐱m(t)|𝐘;𝚯(k1),σ(k1)}\displaystyle\mathbb{E}\left\{\mathbf{x}_{m}(t)|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}\right\}
=\displaystyle= 𝐂m(k1)[𝐂y(k1)]1𝐲(t),m,t,\displaystyle\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{y}(t),\forall m,t,
𝔻{𝐱m(t)|𝐘;𝚯(k1),σ(k1)}\displaystyle\mathbb{D}\left\{\mathbf{x}_{m}(t)|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}\right\}
=\displaystyle= 𝐂m(k1)𝐂m(k1)[𝐂y(k1)]1𝐂m(k1)𝟎N,m,t.\displaystyle\mathbf{C}^{(k-1)}_{m}-\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}\geq\mathbf{0}_{N},\forall m,t.

At the kkth M-step, based on (5b) and (12) the EM algorithm updates the estimates of 𝚯\boldsymbol{\Theta} and σ\sigma by

min𝜽𝛀,𝐫𝟎,σ>0\displaystyle\min_{\boldsymbol{\theta}\in\boldsymbol{\Omega},\mathbf{r}\geq\mathbf{0},\sigma>0} MNln(σ)+m=1M[ln|𝐃m|\displaystyle MN\ln(\sigma)+\sum_{m=1}^{M}\big{[}\ln|\mathbf{D}_{m}| (13)
+1σTr{𝐃m1𝐑^m(k)}],\displaystyle+\frac{1}{\sigma}\mathrm{Tr}\big{\{}\mathbf{D}_{m}^{-1}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\big{]},

where 𝐂m=σ𝐃m\mathbf{C}_{m}=\sigma\mathbf{D}_{m} with 𝐃m=rm𝐚(θm)𝐚H(θm)+αm𝐈N\mathbf{D}_{m}=r_{m}\mathbf{a}(\theta_{m})\mathbf{a}^{H}(\theta_{m})+\alpha_{m}\mathbf{I}_{N} and 𝐫=[r1rM]T\mathbf{r}=[r_{1}~{}\cdots~{}r_{M}]^{T} with rm=Pmσr_{m}=\frac{P_{m}}{\sigma} being the signal to noise ratio of the mmth source.

Due to σ\sigma, it seems to be very difficult that problem (13) can be simplified to parallel subproblems. To perform the M-step simply, we divide it into two CM-steps based on the ECM algorithm [14].

  • First CM-step: estimate 𝚯\boldsymbol{\Theta} but hold σ=σ(k1)\sigma=\sigma^{(k-1)} fixed. Then, (13) can be simplified to the MM parallel subproblems

    minθmΩ,rm0ln|𝐃m|+1σ(k1)Tr{𝐃m1𝐑^m(k)},m,\displaystyle\min_{\theta_{m}\in\Omega,r_{m}\geq 0}\ln|\mathbf{D}_{m}|+\frac{1}{\sigma^{(k-1)}}\mathrm{Tr}\big{\{}\mathbf{D}_{m}^{-1}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m, (14)

    which can be solved using the method in [5]. Accordingly, the estimate of 𝚯\boldsymbol{\Theta} is updated by

    θm(k)=argmaxθmΩTr{𝚷m𝐑^m(k)},m,\displaystyle\theta^{(k)}_{m}=\arg\max_{\theta_{m}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m, (15a)
    Pm(k)=\displaystyle P^{(k)}_{m}= rm(k)σ(k1)\displaystyle r^{(k)}_{m}\sigma^{(k-1)}
    =\displaystyle= max{1N(em(k)αmσ(k1)),0},m,\displaystyle\max\left\{\frac{1}{N}\big{(}e^{(k)}_{m}-\alpha_{m}\sigma^{(k-1)}\big{)},0\right\},\forall m, (15b)

    where θm(k)\theta^{(k)}_{m} is indeterminate if Pm(k)=0P^{(k)}_{m}=0.

  • Second CM-step: estimate σ\sigma but hold 𝚯=𝚯(k)\boldsymbol{\Theta}=\boldsymbol{\Theta}^{(k)} fixed. Then, (13) can be simplified to

    minσ>0MNln(σ)+1σm=1MTr{[𝐃m(k)]1𝐑^m(k)}.\displaystyle\min_{\sigma>0}MN\ln(\sigma)+\frac{1}{\sigma}\sum_{m=1}^{M}\mathrm{Tr}\left\{[\mathbf{D}^{(k)}_{m}]^{-1}\hat{\mathbf{R}}^{(k)}_{m}\right\}. (16)

    Thus, the estimate of σ\sigma is updated by

    σ(k)\displaystyle\sigma^{(k)} =\displaystyle= 1MNm=1MTr{[𝐃m(k)]1𝐑^m(k)}\displaystyle\frac{1}{MN}\sum_{m=1}^{M}\mathrm{Tr}\left\{[\mathbf{D}^{(k)}_{m}]^{-1}\hat{\mathbf{R}}^{(k)}_{m}\right\} (17)
    =\displaystyle= 1Nσ(k1)+1MNm=1Mdm(k)/αm,\displaystyle\frac{1}{N}\sigma^{(k-1)}+\frac{1}{MN}\sum_{m=1}^{M}d^{(k)}_{m}/\alpha_{m},

    where σ(k)>0\sigma^{(k)}>0 if σ(k1)>0\sigma^{(k-1)}>0.

Remark 2.

Although the M-step (13) of the EM algorithm is divided into the two sequential CM-steps (14) and (16), the monotonicity of the algorithm still holds [14], i.e.,

(𝚯(k),σ(k))(𝚯(k1),σ(k1)).\displaystyle\mathcal{L}(\boldsymbol{\Theta}^{(k)},\sigma^{(k)})\geq\mathcal{L}(\boldsymbol{\Theta}^{(k-1)},\sigma^{(k-1)}).

Obviously, this algorithm is only a generalized EM (GEM) algorithm [3] but for convenience, we still call it the EM algorithm.

IV MEM Algorithm

In the previous section, 𝜶\boldsymbol{\alpha} is fixed and known. In this section, we regard 𝜶\boldsymbol{\alpha} as a parameter to be estimated and thus propose an MEM algorithm applicable to the unknown uniform noise assumption.

In order to estimate σ\sigma and 𝜶\boldsymbol{\alpha} in the MEM algorithm easily, we introduce σm=αmσ\sigma_{m}=\alpha_{m}\sigma as the common noise variance of the mmth source, i.e.,

𝐰m(t)𝒞𝒩(𝟎,σm𝐈N).\displaystyle\mathbf{w}_{m}(t)\sim\mathcal{CN}(\mathbf{0},\sigma_{m}\mathbf{I}_{N}). (18)

Then, σ\sigma and 𝜶\boldsymbol{\alpha} can be estimated by σ=m=1Mσm\sigma=\sum_{m=1}^{M}\sigma_{m} and αm=σm/σ\alpha_{m}=\sigma_{m}/\sigma after estimating 𝝈=[σ1σM]T\boldsymbol{\sigma}=[\sigma_{1}~{}\cdots~{}\sigma_{M}]^{T} at the M-step of the MEM algorithm. In addition, we first assume 𝝈(k)>𝟎\boldsymbol{\sigma}^{(k)}>\mathbf{0} for k0k\geq 0.

IV-A Deterministic Signal Model

According to (18), (3b) is rewritten as

𝒬(𝐗1,,𝐗M,𝚯,𝝈)=lnp(𝐗1,,𝐗M;𝜽,𝐒,𝝈)\displaystyle\mathcal{Q}(\mathbf{X}_{1},\dots,\mathbf{X}_{M},\boldsymbol{\Theta},\boldsymbol{\sigma})=\ln p(\mathbf{X}_{1},\dots,\mathbf{X}_{M};\boldsymbol{\theta},\mathbf{S},\boldsymbol{\sigma})
=\displaystyle= TMNln(π)TNm=1Mln(σm)\displaystyle-TMN\ln(\pi)-TN\sum_{m=1}^{M}\ln(\sigma_{m})- (19)
t=1Tm=1M1σm𝐱m(t)𝐚(θm)sm(t)2.\displaystyle\sum_{t=1}^{T}\sum_{m=1}^{M}\frac{1}{\sigma_{m}}\|\mathbf{x}_{m}(t)-\mathbf{a}(\theta_{m})s_{m}(t)\|^{2}.

With the help of (19), at the kkth E-step the MEM algorithm computes the conditional expectation:

𝔼{𝒬(𝐗1,,𝐗M,𝚯,𝝈)|𝐘;𝚯(k1),𝝈(k1)}\displaystyle\mathbb{E}\left\{\mathcal{Q}(\mathbf{X}_{1},\dots,\mathbf{X}_{M},\boldsymbol{\Theta},\boldsymbol{\sigma})|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\boldsymbol{\sigma}^{(k-1)}\right\}
=\displaystyle= TMNln(π)Tm=1M{Nln(σm)+1σm[cm(k)\displaystyle-TMN\ln(\pi)-T\sum_{m=1}^{M}\Big{\{}N\ln(\sigma_{m})+\frac{1}{\sigma_{m}}\big{[}c^{(k)}_{m} (20)
+1Tt=1T𝐱m(k)(t)𝐚(θm)sm(t)2]},\displaystyle+\frac{1}{T}\sum_{t=1}^{T}\|\mathbf{x}^{(k)}_{m}(t)-\mathbf{a}(\theta_{m})s_{m}(t)\|^{2}\big{]}\Big{\}},

where

𝐱m(k)(t)=\displaystyle\mathbf{x}^{(k)}_{m}(t)= 𝔼{𝐱m(t)|𝐘;𝚯(k1),𝝈(k1)}\displaystyle\mathbb{E}\left\{\mathbf{x}_{m}(t)|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\boldsymbol{\sigma}^{(k-1)}\right\}
=\displaystyle= σm(k1)/σ(k1)[𝐲(t)𝐀(𝜽(k1))𝐬(k1)(t)]\displaystyle\sigma^{(k-1)}_{m}/\sigma^{(k-1)}\big{[}\mathbf{y}(t)-\mathbf{A}(\boldsymbol{\theta}^{(k-1)})\mathbf{s}^{(k-1)}(t)\big{]} (21a)
+𝐚(θm(k1))sm(k1)(t),m,t,\displaystyle+\mathbf{a}(\theta^{(k-1)}_{m})s^{(k-1)}_{m}(t),\forall m,t,
cm(k)\displaystyle c^{(k)}_{m} =\displaystyle= 1Tt=1TTr{𝔻{𝐱m(t)|𝐘;𝚯(k1),𝝈(k1)}}\displaystyle\frac{1}{T}\sum_{t=1}^{T}\mathrm{Tr}\left\{\mathbb{D}\big{\{}\mathbf{x}_{m}(t)|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\boldsymbol{\sigma}^{(k-1)}\big{\}}\right\} (21b)
=\displaystyle= 1Tt=1TTr{σm(k1)(1σm(k1)/σ(k1))𝐈N}\displaystyle\frac{1}{T}\sum_{t=1}^{T}\mathrm{Tr}\left\{\sigma^{(k-1)}_{m}\big{(}1-\sigma^{(k-1)}_{m}/\sigma^{(k-1)}\big{)}\mathbf{I}_{N}\right\}
=\displaystyle= Nσm(k1)(1σm(k1)/σ(k1)),m.\displaystyle N\sigma^{(k-1)}_{m}\big{(}1-\sigma^{(k-1)}_{m}/\sigma^{(k-1)}\big{)},\forall m.

At the kkth M-step, based on (20) the MEM algorithm updates the estimates of 𝚯\boldsymbol{\Theta} and 𝝈\boldsymbol{\sigma} by the following MM parallel subproblems:

minθmΩ,𝐬m,σm>0\displaystyle\min_{\theta_{m}\in\Omega,\mathbf{s}_{m},\sigma_{m}>0} Nln(σm)+1σm[cm(k)+\displaystyle N\ln(\sigma_{m})+\frac{1}{\sigma_{m}}\big{[}c^{(k)}_{m}+ (22)
1Tt=1T𝐱m(k)(t)𝐚(θm)sm(t)2],m,\displaystyle\frac{1}{T}\sum_{t=1}^{T}\|\mathbf{x}^{(k)}_{m}(t)-\mathbf{a}(\theta_{m})s_{m}(t)\|^{2}\big{]},\forall m,

where 𝐬m=[sm(1)sm(T)]\mathbf{s}_{m}=[s_{m}(1)~{}\cdots~{}s_{m}(T)]. Then, we can obtain the parameter estimates:

θm(k)=argmaxθmΩTr{𝚷m𝐑^m(k)},m,\displaystyle\theta^{(k)}_{m}=\arg\max_{\theta_{m}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m, (23a)
sm(k)(t)=1N𝐚H(θm(k))𝐱m(k)(t),m,t,\displaystyle s^{(k)}_{m}(t)=\frac{1}{N}\mathbf{a}^{H}(\theta^{(k)}_{m})\mathbf{x}^{(k)}_{m}(t),\forall m,t, (23b)
σm(k)=σm(k1)(1σm(k1)/σ(k1))+dm(k)/N,m,\displaystyle\sigma^{(k)}_{m}=\sigma^{(k-1)}_{m}\big{(}1-\sigma^{(k-1)}_{m}/\sigma^{(k-1)}\big{)}+d^{(k)}_{m}/N,\forall m, (23c)

where 𝐑^m(k)=1Tt=1T𝐱m(k)(t)[𝐱m(k)(t)]H\hat{\mathbf{R}}^{(k)}_{m}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{x}^{(k)}_{m}(t)[\mathbf{x}^{(k)}_{m}(t)]^{H}. Note that when 𝝈(k1)>𝟎\boldsymbol{\sigma}^{(k-1)}>\mathbf{0}, based on (23c) we have 1σm(k1)/σ(k1)>0,m1-\sigma^{(k-1)}_{m}/\sigma^{(k-1)}>0,\forall m, and σm(k)>0,m\sigma^{(k)}_{m}>0,\forall m, i.e., 𝝈(k)>𝟎\boldsymbol{\sigma}^{(k)}>\mathbf{0}.

Remark 3.

The 𝐱m(k)(t)\mathbf{x}^{(k)}_{m}(t)’s in (21a) are related to 𝛔(k1)\boldsymbol{\sigma}^{(k-1)}, so the θm(k)\theta^{(k)}_{m}’s and sm(k)(t)s^{(k)}_{m}(t)’s in (23) are related to 𝛔(k1)\boldsymbol{\sigma}^{(k-1)}, i.e., iterative knowledge associated with σ\sigma is utilized to estimate 𝚯\boldsymbol{\Theta} in the MEM algorithm for the deterministic signal model.

IV-B Stochastic Signal Model

According to (18), 𝐱m(t)𝒞𝒩(𝟎,𝐂m)\mathbf{x}_{m}(t)\sim\mathcal{CN}(\mathbf{0},\mathbf{C}_{m}) with 𝐂m=Pm𝐚(θm)𝐚H(θm)+σm𝐈N\mathbf{C}_{m}=P_{m}\mathbf{a}(\theta_{m})\mathbf{a}^{H}(\theta_{m})+\sigma_{m}\mathbf{I}_{N} and (5b) is rewritten as

𝒬(𝐗1,,𝐗M,𝚯,𝝈)=lnp(𝐗1,,𝐗M;𝜽,𝐏,𝝈)\displaystyle\mathcal{Q}(\mathbf{X}_{1},\dots,\mathbf{X}_{M},\boldsymbol{\Theta},\boldsymbol{\sigma})=\ln p(\mathbf{X}_{1},\dots,\mathbf{X}_{M};\boldsymbol{\theta},\mathbf{P},\boldsymbol{\sigma})
=\displaystyle= Tm=1M[Nln(π)+ln|𝐂m|+Tr{𝐂m1𝐑^m}].\displaystyle-T\sum_{m=1}^{M}\big{[}N\ln(\pi)+\ln|\mathbf{C}_{m}|+\mathrm{Tr}\big{\{}\mathbf{C}_{m}^{-1}\hat{\mathbf{R}}_{m}\big{\}}\big{]}. (24)

With the help of (24), at the kkth E-step the MEM algorithm estimates the 𝐑^m\hat{\mathbf{R}}_{m}’s by finding their conditional expectations:

𝐑^m(k)=\displaystyle\hat{\mathbf{R}}^{(k)}_{m}= 𝔼{𝐑^m|𝐘;𝚯(k1),𝝈(k1)}\displaystyle\mathbb{E}\left\{\hat{\mathbf{R}}_{m}|\mathbf{Y};\boldsymbol{\Theta}^{(k-1)},\boldsymbol{\sigma}^{(k-1)}\right\}
=\displaystyle= 𝐂m(k1)[𝐂y(k1)]1𝐑^y[𝐂y(k1)]1𝐂m(k1)+\displaystyle\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\hat{\mathbf{R}}_{y}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}+ (25)
{𝐂m(k1)𝐂m(k1)[𝐂y(k1)]1𝐂m(k1)},m,\displaystyle\left\{\mathbf{C}^{(k-1)}_{m}-\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}\right\},\forall m,

where

𝐂m(k1)[𝐂y(k1)]1𝐑^y[𝐂y(k1)]1𝐂m(k1)𝟎N,m,\displaystyle\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\hat{\mathbf{R}}_{y}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}\geq\mathbf{0}_{N},\forall m,
𝐂m(k1)𝐂m(k1)[𝐂y(k1)]1𝐂m(k1)𝟎N,m.\displaystyle\mathbf{C}^{(k-1)}_{m}-\mathbf{C}^{(k-1)}_{m}[\mathbf{C}^{(k-1)}_{y}]^{-1}\mathbf{C}^{(k-1)}_{m}\geq\mathbf{0}_{N},\forall m.

At the kkth M-step, based on (24) and (25) the MEM algorithm estimates 𝚯\boldsymbol{\Theta} and 𝝈\boldsymbol{\sigma} by the following MM parallel subproblems111Unlike the M-step (13) in the EM algorithm for the stochastic signal model, the M-step in the MEM algorithm for the stochastic signal model can be easily simplified to parallel subproblems.:

minθmΩ,rm0,σm>0\displaystyle\min_{\theta_{m}\in\Omega,r_{m}\geq 0,\sigma_{m}>0} Nln(σm)+ln|𝐃m|+\displaystyle N\ln(\sigma_{m})+\ln|\mathbf{D}_{m}|+ (26)
1σmTr{𝐃m1𝐑^m(k)},m,\displaystyle\frac{1}{\sigma_{m}}\mathrm{Tr}\big{\{}\mathbf{D}_{m}^{-1}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m,

where 𝐂m=σm𝐃m\mathbf{C}_{m}=\sigma_{m}\mathbf{D}_{m} with 𝐃m=rm𝐚(θm)𝐚H(θm)+𝐈N\mathbf{D}_{m}=r_{m}\mathbf{a}(\theta_{m})\mathbf{a}^{H}(\theta_{m})+\mathbf{I}_{N} and rm=Pm/σmr_{m}=P_{m}/\sigma_{m}. Since |𝐃m|=Nrm+1|\mathbf{D}_{m}|=Nr_{m}+1 and 𝐃m1=𝐈NNrmNrm+1𝚷m\mathbf{D}_{m}^{-1}=\mathbf{I}_{N}-\frac{Nr_{m}}{Nr_{m}+1}\boldsymbol{\Pi}_{m}, subproblems (26) can be rewritten as

minθmΩ,rm0,σm>0Nln(σm)+ln(Nrm+1)+\displaystyle\min_{\theta_{m}\in\Omega,r_{m}\geq 0,\sigma_{m}>0}N\ln(\sigma_{m})+\ln(Nr_{m}+1)+
1σmTr{𝐑^m(k)}Nrmσm(Nrm+1)Tr{𝚷m𝐑^m(k)},m.\displaystyle\frac{1}{\sigma_{m}}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}-\frac{Nr_{m}}{\sigma_{m}(Nr_{m}+1)}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m. (27)

We first simplify subproblems (27) by eliminating 𝐫\mathbf{r} [18], [19]. Hence, after estimating 𝜽\boldsymbol{\theta} and 𝝈\boldsymbol{\sigma}, 𝐫\mathbf{r} and 𝐏\mathbf{P} are estimated by

rm(k)=max{1N(em(k)/σm(k)1),0},m,\displaystyle r^{(k)}_{m}=\max\left\{\frac{1}{N}\big{(}e^{(k)}_{m}/\sigma^{(k)}_{m}-1\big{)},0\right\},\forall m,
Pm(k)=rm(k)σm(k)=max{1N(em(k)σm(k)),0},m,\displaystyle P^{(k)}_{m}=r^{(k)}_{m}\sigma^{(k)}_{m}=\max\left\{\frac{1}{N}\big{(}e^{(k)}_{m}-\sigma^{(k)}_{m}\big{)},0\right\},\forall m,

which imply that if rm(k)=0r^{(k)}_{m}=0, θm(k)\theta^{(k)}_{m} will be indeterminate and σm(k)=1NTr{𝐑^m(k)}\sigma^{(k)}_{m}=\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}} by (27). However, to estimate 𝜽\boldsymbol{\theta} and 𝝈\boldsymbol{\sigma}, we must assume 𝐫(k)>𝟎\mathbf{r}^{(k)}>\mathbf{0}. After eliminating 𝐫\mathbf{r}, subproblems (27) are simplified to

minθmΩ,σm>0\displaystyle\min_{\theta_{m}\in\Omega,\sigma_{m}>0} (N1)ln(σm)+ln(Tr{𝚷m𝐑^m(k)})+\displaystyle(N-1)\ln(\sigma_{m})+\ln\left(\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\right)+ (28)
1σmTr{(𝐈N𝚷m)𝐑^m(k)},m.\displaystyle\frac{1}{\sigma_{m}}\mathrm{Tr}\left\{\big{(}\mathbf{I}_{N}-\boldsymbol{\Pi}_{m}\big{)}\hat{\mathbf{R}}^{(k)}_{m}\right\},\forall m.

Next, we simplify subproblems (28) by eliminating 𝝈\boldsymbol{\sigma}. Thus, after estimating 𝜽\boldsymbol{\theta}, 𝝈\boldsymbol{\sigma} is estimated by

σm(k)=dm(k)/(N1),m.\displaystyle\sigma^{(k)}_{m}=d^{(k)}_{m}/(N-1),\forall m.

After eliminating 𝝈\boldsymbol{\sigma}, subproblems (28) are simplified to

minθmΩ\displaystyle\min_{\theta_{m}\in\Omega} (N1)ln(Tr{𝐑^m(k)}Tr{𝚷m𝐑^m(k)})\displaystyle(N-1)\ln\left(\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}-\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\right) (29)
+ln(Tr{𝚷m𝐑^m(k)}),m,\displaystyle+\ln\left(\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\right),\forall m,

where θmδm(k)={θmΩ|Tr{𝚷m𝐑^m(k)}>1NTr{𝐑^m(k)}}\theta_{m}\in\delta^{(k)}_{m}=\big{\{}\theta_{m}\in\Omega|\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}>\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\big{\}} due to the fact that when θm(k)δm(k)\theta^{(k)}_{m}\in\delta^{(k)}_{m}, em(k)>1NTr{𝐑^m(k)}e^{(k)}_{m}>\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}} and

rm(k)\displaystyle r^{(k)}_{m} =\displaystyle= max{1N(em(k)/σm(k)1),0}\displaystyle\max\left\{\frac{1}{N}\big{(}e^{(k)}_{m}/\sigma^{(k)}_{m}-1\big{)},0\right\}
=\displaystyle= max{1N[(N1)em(k)/dm(k)1],0}\displaystyle\max\left\{\frac{1}{N}\big{[}(N-1)e^{(k)}_{m}/d^{(k)}_{m}-1\big{]},0\right\}
=\displaystyle= (em(k)1NTr{𝐑^m(k)})/dm(k)>0,m.\displaystyle\big{(}e^{(k)}_{m}-\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\big{)}/d^{(k)}_{m}>0,\forall m.

Note that (N1)ln(Tr{𝐑^m(k)}x)+ln(x)(N-1)\ln\big{(}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}-x\big{)}+\ln(x) is a monotonically decreasing function of xx for x1NTr{𝐑^m(k)}x\geq\frac{1}{N}\mathrm{Tr}\{\hat{\mathbf{R}}^{(k)}_{m}\}, subproblems (29) are thus equivalent to

maxθmδm(k)Tr{𝚷m𝐑^m(k)},m.\displaystyle\max_{\theta_{m}\in\delta^{(k)}_{m}}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m. (30)

Based on the above analysis, the estimates of 𝚯\boldsymbol{\Theta} and 𝝈\boldsymbol{\sigma} are updated by

θm(k)=argmaxθmΩTr{𝚷m𝐑^m(k)},m,\displaystyle\theta^{(k)}_{m}=\arg\max_{\theta_{m}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}},\forall m, (31a)
σm(k)={dm(k)/(N1)em(k)>1NTr{𝐑^m(k)},1NTr{𝐑^m(k)}em(k)1NTr{𝐑^m(k)},m,\displaystyle\sigma^{(k)}_{m}=\left\{\begin{array}[]{ll}{d^{(k)}_{m}/(N-1)}&{e^{(k)}_{m}>\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}},\\ {\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}}&{e^{(k)}_{m}\leq\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}},\\ \end{array}\right.\forall m, (31d)
Pm(k)=max{1N1(em(k)1NTr{𝐑^m(k)}),0},m.\displaystyle P^{(k)}_{m}=\max\left\{\frac{1}{N-1}\big{(}e^{(k)}_{m}-\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}\big{)},0\right\},\forall m. (31e)

Finally, we give the following remark.

Remark 4.

In the MEM algorithm for the stochastic signal model, 𝛔(k)>𝟎\boldsymbol{\sigma}^{(k)}>\mathbf{0} for k1k\geq 1 if 𝛔(0)>𝟎\boldsymbol{\sigma}^{(0)}>\mathbf{0}.

Proof.

We utilize a proof by contradiction. Without loss of generality, assume 𝝈(k)>𝟎\boldsymbol{\sigma}^{(k)}>\mathbf{0} for 0kK10\leq k\leq K-1 (K1)(K\geq 1) and σi(K)=0\sigma^{(K)}_{i}=0. Obviously, we have

𝐂m(K1)=\displaystyle\mathbf{C}^{(K-1)}_{m}= Pm(K1)𝐚(θm(K1))𝐚H(θm(K1))\displaystyle P^{(K-1)}_{m}\mathbf{a}(\theta^{(K-1)}_{m})\mathbf{a}^{H}(\theta^{(K-1)}_{m}) (32)
+σm(K1)𝐈N>𝟎N,m.\displaystyle+\sigma^{(K-1)}_{m}\mathbf{I}_{N}>\mathbf{0}_{N},\forall m.

Based on (31b) and σi(K)=0\sigma^{(K)}_{i}=0, we first consider σi(K)=1NTr{𝐑^i(K)}=0\sigma^{(K)}_{i}=\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(K)}_{i}\big{\}}=0. Then, 𝐑^i(K)=𝟎N\hat{\mathbf{R}}^{(K)}_{i}=\mathbf{0}_{N} as 𝐑^i(K)𝟎N\hat{\mathbf{R}}^{(K)}_{i}\geq\mathbf{0}_{N}, resulting to that ei(K)=0e^{(K)}_{i}=0 and by (25) we have

𝐂i(K1)[𝐂y(K1)]1𝐑^y[𝐂y(K1)]1𝐂i(K1)\displaystyle\mathbf{C}^{(K-1)}_{i}[\mathbf{C}^{(K-1)}_{y}]^{-1}\hat{\mathbf{R}}_{y}[\mathbf{C}^{(K-1)}_{y}]^{-1}\mathbf{C}^{(K-1)}_{i} =\displaystyle= 𝟎N,\displaystyle\mathbf{0}_{N},
𝐂i(K1)𝐂i(K1)[𝐂y(K1)]1𝐂i(K1)\displaystyle\mathbf{C}^{(K-1)}_{i}-\mathbf{C}^{(K-1)}_{i}[\mathbf{C}^{(K-1)}_{y}]^{-1}\mathbf{C}^{(K-1)}_{i} =\displaystyle= 𝟎N,\displaystyle\mathbf{0}_{N},

which indicate 𝐑^y=𝟎N\hat{\mathbf{R}}_{y}=\mathbf{0}_{N} and 𝐂i(K1)=𝐂y(K1)=m=1M𝐂m(K1)\mathbf{C}^{(K-1)}_{i}=\mathbf{C}^{(K-1)}_{y}=\sum_{m=1}^{M}\mathbf{C}^{(K-1)}_{m}. Thus, mi𝐂m(K1)=𝟎N\sum_{m\neq i}\mathbf{C}^{(K-1)}_{m}=\mathbf{0}_{N}, which contradicts mi𝐂m(K1)>𝟎N\sum_{m\neq i}\mathbf{C}^{(K-1)}_{m}>\mathbf{0}_{N} by (32).

Next, we consider σi(K)=di(K)/(N1)=0\sigma^{(K)}_{i}=d^{(K)}_{i}/(N-1)=0, i.e., 𝐑^i(K)=fi(K1)𝚷i>𝟎N\hat{\mathbf{R}}^{(K)}_{i}=f^{(K-1)}_{i}\boldsymbol{\Pi}_{i}>\mathbf{0}_{N} as ei(K)>1NTr{𝐑^i(K)}e^{(K)}_{i}>\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(K)}_{i}\big{\}}. Then, by (25) we have

𝐂i(K1)[𝐂y(K1)]1𝐑^y[𝐂y(K1)]1𝐂i(K1)\displaystyle\mathbf{C}^{(K-1)}_{i}[\mathbf{C}^{(K-1)}_{y}]^{-1}\hat{\mathbf{R}}_{y}[\mathbf{C}^{(K-1)}_{y}]^{-1}\mathbf{C}^{(K-1)}_{i} =\displaystyle= hi(K1)𝚷i,\displaystyle h^{(K-1)}_{i}\boldsymbol{\Pi}_{i},
𝐂i(K1)𝐂i(K1)[𝐂y(K1)]1𝐂i(K1)\displaystyle\mathbf{C}^{(K-1)}_{i}-\mathbf{C}^{(K-1)}_{i}[\mathbf{C}^{(K-1)}_{y}]^{-1}\mathbf{C}^{(K-1)}_{i} =\displaystyle= vi(K1)𝚷i,\displaystyle v^{(K-1)}_{i}\boldsymbol{\Pi}_{i},

where hi(K1)0h^{(K-1)}_{i}\geq 0 and vi(K1)>0v^{(K-1)}_{i}>0, resulting in

|𝐂i(K1)𝐂i(K1)[𝐂y(K1)]1𝐂i(K1)|=0,\displaystyle\left|\mathbf{C}^{(K-1)}_{i}-\mathbf{C}^{(K-1)}_{i}[\mathbf{C}^{(K-1)}_{y}]^{-1}\mathbf{C}^{(K-1)}_{i}\right|=0,

and

|𝐂y(K1)𝐂i(K1)|=|mi𝐂m(K1)|=0,\displaystyle\left|\mathbf{C}^{(K-1)}_{y}-\mathbf{C}^{(K-1)}_{i}\right|=\left|{\sum}_{m\neq i}\mathbf{C}^{(K-1)}_{m}\right|=0,

which contradicts mi𝐂m(K1)>𝟎N\sum_{m\neq i}\mathbf{C}^{(K-1)}_{m}>\mathbf{0}_{N} by (32). The proof is completed. ∎

V SAGE Algorithm

In this section, the SAGE algorithm is proposed. We design that the SAGE algorithm updates the DOA estimates in the ascending order of source number at each iteration and one iteration finishes when all the parameter estimates are updated. Besides, let ()(k,i)(\cdot)^{(k,i)} denote an iterative value obtained during updating the estimate of θi\theta_{i} at the kkth iteration, ()(k1)=()(k1,M)=()(k,0)(\cdot)^{(k-1)}=(\cdot)^{(k-1,M)}=(\cdot)^{(k,0)}.

The SAGE algorithm updates the parameter estimates by the E- and M-steps of the EM or MEM algorithm. When updating the estimate of θi\theta_{i} at the kkth iteration, the SAGE algorithm first associates all of the noise with the iith source signal component [6], [7]:

αm={1m=i,0mi.\alpha_{m}=\left\{\begin{array}[]{ll}{1}&{m=i},\\ {0}&{m\neq i}.\\ \end{array}\right. (33)

V-A Deterministic Signal Model

According to (33), 𝐱i(t)𝒞𝒩(𝐚(θi)si(t),σ𝐈N)\mathbf{x}_{i}(t)\sim\mathcal{CN}(\mathbf{a}(\theta_{i})s_{i}(t),\sigma\mathbf{I}_{N}), 𝐱m(t)=𝐚(θm)sm(t)\mathbf{x}_{m}(t)=\mathbf{a}(\theta_{m})s_{m}(t) for mim\neq i, and (3b) is rewritten as

𝒬(𝐗i,θi,𝐬i,σ)=lnp(𝐗1,,𝐗M;𝜽,𝐒,σ)\displaystyle\mathcal{Q}(\mathbf{X}_{i},\theta_{i},\mathbf{s}_{i},\sigma)=\ln p(\mathbf{X}_{1},\dots,\mathbf{X}_{M};\boldsymbol{\theta},\mathbf{S},\sigma)
=\displaystyle= TNln(πσ)1σt=1T𝐱i(t)𝐚(θi)si(t)2.\displaystyle-TN\ln(\pi\sigma)-\frac{1}{\sigma}\sum_{t=1}^{T}\|\mathbf{x}_{i}(t)-\mathbf{a}(\theta_{i})s_{i}(t)\|^{2}. (34)

At the E-step, the SAGE algorithm computes the conditional expectation of 𝒬(𝐗i,θi,𝐬i,σ)\mathcal{Q}(\mathbf{X}_{i},\theta_{i},\mathbf{s}_{i},\sigma) by

𝔼{𝒬(𝐗i,θi,𝐬i,σ)|𝐘;𝚯(k,i1),σ(k,i1)}\displaystyle\mathbb{E}\left\{\mathcal{Q}(\mathbf{X}_{i},\theta_{i},\mathbf{s}_{i},\sigma)|\mathbf{Y};\boldsymbol{\Theta}^{(k,i-1)},\sigma^{(k,i-1)}\right\}
=\displaystyle= TNln(πσ)1σt=1T𝐱i(k)(t)𝐚(θi)si(t)2,\displaystyle-TN\ln(\pi\sigma)-\frac{1}{\sigma}\sum_{t=1}^{T}\|\mathbf{x}^{(k)}_{i}(t)-\mathbf{a}(\theta_{i})s_{i}(t)\|^{2}, (35)

where 𝚯=(𝜽,𝐒)\boldsymbol{\Theta}=(\boldsymbol{\theta},\mathbf{S}) and

𝐱i(k)(t)=\displaystyle\mathbf{x}_{i}^{(k)}(t)= 𝐱i(k,i)(t)=𝔼{𝐱i(t)|𝐘;𝚯(k,i1),σ(k,i1)}\displaystyle\mathbf{x}_{i}^{(k,i)}(t)=\mathbb{E}\left\{\mathbf{x}_{i}(t)|\mathbf{Y};\boldsymbol{\Theta}^{(k,i-1)},\sigma^{(k,i-1)}\right\}
=\displaystyle= [𝐲(t)𝐀(𝜽(k,i1))𝐬(k,i1)(t)]\displaystyle\big{[}\mathbf{y}(t)-\mathbf{A}(\boldsymbol{\theta}^{(k,i-1)})\mathbf{s}^{(k,i-1)}(t)\big{]} (36a)
+𝐚(θi(k,i1))si(k,i1)(t),t,\displaystyle+\mathbf{a}(\theta_{i}^{(k,i-1)})s_{i}^{(k,i-1)}(t),\forall t,
𝔻{𝐱i(t)|𝐘;𝚯(k,i1),σ(k,i1)}=𝟎N,t.\displaystyle\mathbb{D}\left\{\mathbf{x}_{i}(t)|\mathbf{Y};\boldsymbol{\Theta}^{(k,i-1)},\sigma^{(k,i-1)}\right\}=\mathbf{0}_{N},\forall t. (36b)

At the M-step, based on (35) the SAGE algorithm estimates θi\theta_{i}, 𝐬i\mathbf{s}_{i}, and σ\sigma by

minθiΩ,𝐬i,σ>0Nln(σ)+1σTt=1T𝐱i(k)(t)𝐚(θi)si(t)2.\displaystyle\min_{\theta_{i}\in\Omega,\mathbf{s}_{i},\sigma>0}N\ln(\sigma)+\frac{1}{\sigma T}\sum_{t=1}^{T}\|\mathbf{x}_{i}^{(k)}(t)-\mathbf{a}(\theta_{i})s_{i}(t)\|^{2}. (37)

Thus, the estimates of θi\theta_{i}, 𝐬i\mathbf{s}_{i}, and σ\sigma are updated by

θi(k)=θi(k,i)=argmaxθiΩTr{𝚷i𝐑^i(k)},\displaystyle\theta^{(k)}_{i}=\theta^{(k,i)}_{i}=\arg\max_{\theta_{i}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{i}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}, (38a)
si(k)(t)=si(k,i)(t)=1N𝐚H(θi(k))𝐱i(k)(t),t,\displaystyle s^{(k)}_{i}(t)=s^{(k,i)}_{i}(t)=\frac{1}{N}\mathbf{a}^{H}(\theta^{(k)}_{i})\mathbf{x}^{(k)}_{i}(t),\forall t, (38b)
σ(k,i)=di(k)/N,\displaystyle\sigma^{(k,i)}=d^{(k)}_{i}/N, (38c)

where 𝐑^i(k)=1Tt=1T𝐱i(k)(t)[𝐱i(k)(t)]H\hat{\mathbf{R}}^{(k)}_{i}=\frac{1}{T}\sum_{t=1}^{T}\mathbf{x}^{(k)}_{i}(t)[\mathbf{x}^{(k)}_{i}(t)]^{H}.

Finally, the other parameter estimates are not updated and their iterative values are

θm(k,i)=θm(k,i1),mi,\displaystyle\theta^{(k,i)}_{m}=\theta^{(k,i-1)}_{m},\forall m\neq i, (39a)
sm(k,i)(t)=sm(k,i1)(t),mi,t.\displaystyle s^{(k,i)}_{m}(t)=s^{(k,i-1)}_{m}(t),\forall m\neq i,t. (39b)

At each iteration of the SAGE algorithm, the E- and M-steps are repeated MM times, leading to that 𝐗1,,𝐗M\mathbf{X}_{1},\dots,\mathbf{X}_{M} and all elements in 𝚯\boldsymbol{\Theta} are estimated once while σ\sigma is estimated MM times.

Remark 5.

Notice that the 𝐱i(k)(t)\mathbf{x}^{(k)}_{i}(t)’s in (36a), θi(k)\theta^{(k)}_{i} in (38a), and the si(k)(t)s^{(k)}_{i}(t)’s in (38b) are unrelated to σ(k,i1)\sigma^{(k,i-1)} and respectively identical with these in the SAGE algorithm under the known σ\sigma [8], i.e., the SAGE algorithm under the unknown σ\sigma is equivalent to that under the known σ\sigma when not estimating σ\sigma. Accordingly, (38c) can be omitted when the algorithm does not consider the nuisance parameter σ\sigma.

V-B Stochastic Signal Model

According to (33), 𝐱i(t)𝒞𝒩(0,𝐂i)\mathbf{x}_{i}(t)\sim\mathcal{CN}(\textbf{0},\mathbf{C}_{i}) with 𝐂i=Pi𝐚(θi)𝐚H(θi)+σ𝐈N\mathbf{C}_{i}=P_{i}\mathbf{a}(\theta_{i})\mathbf{a}^{H}(\theta_{i})+\sigma\mathbf{I}_{N} and for mim\neq i, the statistical model of 𝐱m(t)\mathbf{x}_{m}(t) depends only on sm(t)s_{m}(t) due to 𝐱m(t)=𝐚(θm)sm(t)\mathbf{x}_{m}(t)=\mathbf{a}(\theta_{m})s_{m}(t). Thus, (5b) is rewritten as

𝒬(𝐒i,𝐗i,θi,𝐏,σ)=lnp(𝐗1,,𝐗M;𝜽,𝐏,σ)\displaystyle\mathcal{Q}(\mathbf{S}_{i},\mathbf{X}_{i},\theta_{i},\mathbf{P},\sigma)=\ln p(\mathbf{X}_{1},\dots,\mathbf{X}_{M};\boldsymbol{\theta},\mathbf{P},\sigma)
=\displaystyle= T(M1)ln(π)Tmi[ln(Pm)+P^m/Pm]\displaystyle-T(M-1)\ln(\pi)-T\sum_{m\neq i}\big{[}\ln(P_{m})+\hat{P}_{m}/P_{m}\big{]} (40)
TNln(π)T[ln|𝐂i|+Tr{𝐂i1𝐑^i}],\displaystyle-TN\ln(\pi)-T\big{[}\ln|\mathbf{C}_{i}|+\mathrm{Tr}\big{\{}\mathbf{C}_{i}^{-1}\hat{\mathbf{R}}_{i}\big{\}}\big{]},

where |a||a| denotes the modulus of a complex number aa, P^m\hat{P}_{m} is a sufficient statistic for PmP_{m} [16],

𝐒i\displaystyle\mathbf{S}_{i} =\displaystyle= [𝐬i2T𝐬i1T𝐬i+1T𝐬i+2T]T,\displaystyle[\cdots~{}\mathbf{s}_{i-2}^{T}~{}\mathbf{s}_{i-1}^{T}~{}\mathbf{s}_{i+1}^{T}~{}\mathbf{s}_{i+2}^{T}~{}\cdots]^{T},
P^m\displaystyle\hat{P}_{m} =\displaystyle= 1Tt=1T|sm(t)|2.\displaystyle\frac{1}{T}{\sum}_{t=1}^{T}|s_{m}(t)|^{2}.

At the E-step, the SAGE algorithm computes the conditional expectation of 𝒬(𝐒i,𝐗i,θi,𝐏,σ)\mathcal{Q}(\mathbf{S}_{i},\mathbf{X}_{i},\theta_{i},\mathbf{P},\sigma) by

𝔼{𝒬(𝐒i,𝐗i,θi,𝐏,σ)|𝐘;𝚯(k,i1),σ(k,i1)}\displaystyle\mathbb{E}\left\{\mathcal{Q}(\mathbf{S}_{i},\mathbf{X}_{i},\theta_{i},\mathbf{P},\sigma)|\mathbf{Y};\boldsymbol{\Theta}^{(k,i-1)},\sigma^{(k,i-1)}\right\}
=\displaystyle= T(M1)ln(π)TNln(π)\displaystyle-T(M-1)\ln(\pi)-TN\ln(\pi) (41)
Tmi[ln(Pm)+P^m(k,i)/Pm]\displaystyle-T\sum_{m\neq i}\big{[}\ln(P_{m})+\hat{P}^{(k,i)}_{m}/P_{m}\big{]}
T[ln|𝐂i|+Tr{𝐂i1𝐑^i(k)}],\displaystyle-T\big{[}\ln|\mathbf{C}_{i}|+\mathrm{Tr}\big{\{}\mathbf{C}_{i}^{-1}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}\big{]},

where 𝚯=(𝜽,𝐏)\boldsymbol{\Theta}=(\boldsymbol{\theta},\mathbf{P}),

P^m(k,i)=\displaystyle\hat{P}^{(k,i)}_{m}= 𝔼{P^m|𝐘;𝚯(k,i1),σ(k,i1)}\displaystyle\mathbb{E}\left\{\hat{P}_{m}|\mathbf{Y};\boldsymbol{\Theta}^{(k,i-1)},\sigma^{(k,i-1)}\right\}
=\displaystyle= Pm(k,i1)[1𝐚H(θm(k,i1))𝐛m(k,i1)]\displaystyle P^{(k,i-1)}_{m}\big{[}1-\mathbf{a}^{H}(\theta^{(k,i-1)}_{m})\mathbf{b}^{(k,i-1)}_{m}\big{]} (42)
+[𝐛m(k,i1)]H𝐑^y𝐛m(k,i1)0\displaystyle+[\mathbf{b}^{(k,i-1)}_{m}]^{H}\hat{\mathbf{R}}_{y}\mathbf{b}^{(k,i-1)}_{m}\geq 0

with 𝐛m(k,i1)=[𝐂y(k,i1)]1𝐚(θm(k,i1))Pm(k,i1)\mathbf{b}^{(k,i-1)}_{m}=[\mathbf{C}^{(k,i-1)}_{y}]^{-1}\mathbf{a}(\theta^{(k,i-1)}_{m})P^{(k,i-1)}_{m}, and

𝐑^i(k)=\displaystyle\hat{\mathbf{R}}^{(k)}_{i}= 𝐑^i(k,i)=𝔼{𝐑^i|𝐘;𝚯(k,i1),σ(k,i1)}\displaystyle\hat{\mathbf{R}}^{(k,i)}_{i}=\mathbb{E}\left\{\hat{\mathbf{R}}_{i}|\mathbf{Y};\boldsymbol{\Theta}^{(k,i-1)},\sigma^{(k,i-1)}\right\}
=\displaystyle= 𝐂i(k,i1)[𝐂y(k,i1)]1𝐑^y[𝐂y(k,i1)]1𝐂i(k,i1)+\displaystyle\mathbf{C}^{(k,i-1)}_{i}[\mathbf{C}^{(k,i-1)}_{y}]^{-1}\hat{\mathbf{R}}_{y}[\mathbf{C}^{(k,i-1)}_{y}]^{-1}\mathbf{C}^{(k,i-1)}_{i}+ (43)
{𝐂i(k,i1)𝐂i(k,i1)[𝐂y(k,i1)]1𝐂i(k,i1)}.\displaystyle\left\{\mathbf{C}^{(k,i-1)}_{i}-\mathbf{C}^{(k,i-1)}_{i}[\mathbf{C}^{(k,i-1)}_{y}]^{-1}\mathbf{C}^{(k,i-1)}_{i}\right\}.

At the M-step, based on (41) the SAGE algorithm updates the estimates of θi\theta_{i}, 𝐏\mathbf{P}, and σ\sigma by

minθiΩ,𝐏𝟎,σ>0\displaystyle\min_{\theta_{i}\in\Omega,\mathbf{P}\geq\mathbf{0},\sigma>0} mi[ln(Pm)+P^m(k,i)/Pm]\displaystyle\sum_{m\neq i}\big{[}\ln(P_{m})+\hat{P}^{(k,i)}_{m}/P_{m}\big{]} (44)
+[ln|𝐂i|+Tr{𝐂i1𝐑^i(k)}],\displaystyle+\big{[}\ln|\mathbf{C}_{i}|+\mathrm{Tr}\big{\{}\mathbf{C}_{i}^{-1}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}\big{]},

resulting in that

Pm(k,i)=P^m(k,i),mi,\displaystyle P^{(k,i)}_{m}=\hat{P}^{(k,i)}_{m},\forall m\neq i, (45)

while the estimates of θi\theta_{i}, PiP_{i}, and σ\sigma are updated by

minθiΩ,ri0,σ>0Nln(σ)+ln|𝐃i|+1σTr{𝐃i1𝐑^i(k)},\displaystyle\min_{\theta_{i}\in\Omega,r_{i}\geq 0,\sigma>0}N\ln(\sigma)+\ln|\mathbf{D}_{i}|+\frac{1}{\sigma}\mathrm{Tr}\big{\{}\mathbf{D}_{i}^{-1}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}, (46)

where 𝐂i=σ𝐃i\mathbf{C}_{i}=\sigma\mathbf{D}_{i} with 𝐃i=ri𝐚(θi)𝐚H(θi)+𝐈N\mathbf{D}_{i}=r_{i}\mathbf{a}(\theta_{i})\mathbf{a}^{H}(\theta_{i})+\mathbf{I}_{N} and the solution can be obtained from the solutions of subproblems (26). Following (31), θi\theta_{i}, PiP_{i}, and σ\sigma can be estimated by

θi(k)=θi(k,i)=argmaxθiΩTr{𝚷i𝐑^i(k)},\displaystyle\theta^{(k)}_{i}=\theta^{(k,i)}_{i}=\arg\max_{\theta_{i}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{i}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}, (47a)
σ(k,i)={di(k)/(N1)ei(k)>1NTr{𝐑^i(k)},1NTr{𝐑^i(k)}ei(k)1NTr{𝐑^i(k)},\displaystyle\sigma^{(k,i)}=\left\{\begin{array}[]{ll}{d^{(k)}_{i}/(N-1)}&{e^{(k)}_{i}>\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}},\\ {\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}}&{e^{(k)}_{i}\leq\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}},\\ \end{array}\right. (47d)
Pi(k,i)=max{1N1(ei(k)1NTr{𝐑^i(k)}),0},\displaystyle P^{(k,i)}_{i}=\max\left\{\frac{1}{N-1}\big{(}e^{(k)}_{i}-\frac{1}{N}\mathrm{Tr}\big{\{}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}\big{)},0\right\}, (47e)

where σ(k,i)=0\sigma^{(k,i)}=0 is possible although its probability is very low. For example, if M=2M=2, T=1T=1 (a single snapshot), and P1(1,1)=0P^{(1,1)}_{1}=0 after updating the estimate of θ1\theta_{1}, we will have

𝐂2(1,1)=𝐂y(1,1)=P2(1,1)𝐚(θ2(1,1))𝐚H(θ2(1,1))+σ(1,1)𝐈N>𝟎N\displaystyle\mathbf{C}^{(1,1)}_{2}=\mathbf{C}^{(1,1)}_{y}=P^{(1,1)}_{2}\mathbf{a}(\theta^{(1,1)}_{2})\mathbf{a}^{H}(\theta^{(1,1)}_{2})+\sigma^{(1,1)}\mathbf{I}_{N}>\mathbf{0}_{N}

and 𝐑^2(1)=𝐲(1)𝐲H(1)\hat{\mathbf{R}}^{(1)}_{2}=\mathbf{y}(1)\mathbf{y}^{H}(1) by (43) when updating the estimate of θ2\theta_{2}. Furthermore, if 𝐲(1)=u𝐚(θ¯)\mathbf{y}(1)=u\mathbf{a}(\bar{\theta}) (u0)(u\neq 0), we will obtain θ2(1)=θ¯\theta^{(1)}_{2}=\bar{\theta} by (47a) and σ(1,2)=d2(1)/(N1)=0\sigma^{(1,2)}=d^{(1)}_{2}/(N-1)=0 by (47b).

To avoid σ(k,i)=0\sigma^{(k,i)}=0, we use two CM-steps based on the ECM algorithm to reestimate θi\theta_{i}, PiP_{i}, and σ\sigma by problem (46) if σ(k,i)=0\sigma^{(k,i)}=0 in (47b).

  • First CM-step: estimate θi\theta_{i} and PiP_{i} but hold σ=σ(k,i1)\sigma=\sigma^{(k,i-1)} fixed. Then, problem (46) is simplified to

    minθiΩ,ri0ln|𝐃i|+1σ(k,i1)Tr{𝐃i1𝐑^i(k)},\displaystyle\min_{\theta_{i}\in\Omega,r_{i}\geq 0}\ln|\mathbf{D}_{i}|+\frac{1}{\sigma^{(k,i-1)}}\mathrm{Tr}\big{\{}\mathbf{D}_{i}^{-1}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}, (48)

    which can be solved by referring to (15). Thus, the estimates of θi\theta_{i} and PiP_{i} are updated by

    θi(k)=θi(k,i)=argmaxθiΩTr{𝚷i𝐑^i(k)},\displaystyle\theta^{(k)}_{i}=\theta^{(k,i)}_{i}=\arg\max_{\theta_{i}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{i}\hat{\mathbf{R}}^{(k)}_{i}\big{\}}, (49a)
    Pi(k,i)=\displaystyle P^{(k,i)}_{i}= ri(k,i)σ(k,i1)\displaystyle r^{(k,i)}_{i}\sigma^{(k,i-1)}
    =\displaystyle= max{1N(ei(k)σ(k,i1)),0},\displaystyle\max\left\{\frac{1}{N}\big{(}e^{(k)}_{i}-\sigma^{(k,i-1)}\big{)},0\right\}, (49b)

    where θi(k)\theta^{(k)}_{i} is indeterminate if Pi(k,i)=0P^{(k,i)}_{i}=0.

  • Second CM-step: estimate σ\sigma but hold θi=θi(k)\theta_{i}=\theta^{(k)}_{i} and ri=ri(k,i)r_{i}=r^{(k,i)}_{i} fixed. Then, problem (46) is simplified to

    minσ>0Nln(σ)+1σTr{[𝐃i(k)]1𝐑^i(k)},\displaystyle\min_{\sigma>0}N\ln(\sigma)+\frac{1}{\sigma}\mathrm{Tr}\left\{[\mathbf{D}^{(k)}_{i}]^{-1}\hat{\mathbf{R}}^{(k)}_{i}\right\}, (50)

    where 𝐃i(k)=ri(k,i)𝐚(θi(k))𝐚H(θi(k))+𝐈N\mathbf{D}^{(k)}_{i}=r^{(k,i)}_{i}\mathbf{a}(\theta^{(k)}_{i})\mathbf{a}^{H}(\theta^{(k)}_{i})+\mathbf{I}_{N}. Thus, the estimate of σ\sigma is updated by

    σ(k,i)\displaystyle\sigma^{(k,i)} =\displaystyle= 1NTr{[𝐃i(k)]1𝐑^i(k)}\displaystyle\frac{1}{N}\mathrm{Tr}\left\{[\mathbf{D}^{(k)}_{i}]^{-1}\hat{\mathbf{R}}^{(k)}_{i}\right\} (51)
    =\displaystyle= (σ(k,i1)+di(k))/N,\displaystyle\big{(}\sigma^{(k,i-1)}+d^{(k)}_{i}\big{)}/N,

    which implies σ(k,i)>0\sigma^{(k,i)}>0 if σ(k,i1)>0\sigma^{(k,i-1)}>0.

Finally, the other parameter estimate(s) is(are) not updated and the iterative value(s) is(are)

θm(k,i)=θm(k,i1),mi.\displaystyle\theta^{(k,i)}_{m}=\theta^{(k,i-1)}_{m},\forall m\neq i. (52)

The E- and M-steps are repeated MM times at each iteration of the SAGE algorithm, so 𝐑^1,,𝐑^M\hat{\mathbf{R}}_{1},\dots,\hat{\mathbf{R}}_{M} and all elements in 𝜽\boldsymbol{\theta} are estimated once, σ\sigma and all elements in 𝐏\mathbf{P} are estimated MM times, each element of 𝐏^=[P^1P^M]T\hat{\mathbf{P}}=[\hat{P}_{1}~{}\cdots~{}\hat{P}_{M}]^{T} is estimated M1M-1 times or once (M=2M=2).

VI Properties of the Proposed EM, MEM, and SAGE Algorithms

VI-A Convergence Point

It is well known that under the known σ\sigma, the EM and SAGE algorithms satisfy standard regularity conditions [4], [6], [20] and converge to different stationary points or the same stationary point of (𝚯)\mathcal{L}(\boldsymbol{\Theta}), which is (𝚯,σ)\mathcal{L}(\boldsymbol{\Theta},\sigma) with σ\sigma fixed, implying that (𝚯,σ)\mathcal{L}(\boldsymbol{\Theta},\sigma) is a well-behaved objective function. Thus, the proposed algorithms also satisfy the regularity conditions and converge to different stationary points or the same stationary point of (𝚯,σ)\mathcal{L}(\boldsymbol{\Theta},\sigma).

Of course, the convergence points of the proposed algorithms depend on their initial points. Given a poor initial point, the proposed algorithms may never converge to the maximum point of (𝚯,σ)\mathcal{L}(\boldsymbol{\Theta},\sigma). To generate an appropriate initial point, the effective initialization procedure in [9] can be adopted using the deterministic signal model.

VI-B Complexity and Stability

Note that the computational complexities of the EM, MEM, and SAGE algorithms are dominated by searching the θm(k)\theta_{m}^{(k)}’s:

θm(k)=argmaxθmΩTr{𝚷m𝐑^m(k)},m.\displaystyle\theta_{m}^{(k)}=\arg\max_{\theta_{m}\in\Omega}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}_{m}^{(k)}\big{\}},\forall m. (53)

Hence, if we adopt brute force to search θm(k)\theta_{m}^{(k)}, i.e., evaluating the objective function Tr{𝚷m𝐑^m(k)}\mathrm{Tr}\{\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}_{m}^{(k)}\} on a coarse grid to locate a grid point, close to the maximum point of Tr{𝚷m𝐑^m(k)}\mathrm{Tr}\{\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}_{m}^{(k)}\}, as the initial point of a gradient algorithm and then applying this gradient algorithm to search the maximum point as θm(k)\theta_{m}^{(k)} [4], these algorithms will have almost the same computational complexity at each iteration [7].

However, when the powers of sources are unequal, we have found via simulation that the DOA estimates of multiple sources, updated by (53), tend to be consistent with the true DOA of the source with the largest power and the algorithms may be unstable.

To address this issue, we can choose θm(k1)\theta_{m}^{(k-1)} as the initial point of a gradient algorithm and then apply this gradient algorithm to search a local maximum point of Tr{𝚷m𝐑^m(k)}\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}_{m}^{(k)}\big{\}} as θm(k)\theta_{m}^{(k)}, e.g., Algorithm 2 in the next section has given excellent numerical results. Under this choice, we still have

Tr{𝚷m(k)𝐑^m(k)}Tr{𝚷m(k1)𝐑^m(k)},m,\displaystyle\mathrm{Tr}\big{\{}\boldsymbol{\Pi}^{(k)}_{m}\hat{\mathbf{R}}_{m}^{(k)}\big{\}}\geq\mathrm{Tr}\big{\{}\boldsymbol{\Pi}^{(k-1)}_{m}\hat{\mathbf{R}}_{m}^{(k)}\big{\}},\forall m, (54)

which actually guarantee the monotonicity of the algorithms but only meet the requirement of GEM algorithms [3]. For convenience, we do not change the names of the algorithms under this choice.

VII Numerical Results

In this section, numerical results are provided to illustrate the convergence performances of the proposed algorithms. To ensure that a two-dimensional scatter plot can reflect the DOAs of the MM sources, the array is assumed to be a uniform linear array with 𝐩n=[λ2(n1)00]T\mathbf{p}_{n}=[\frac{\lambda}{2}(n-1)~{}0~{}0]^{T}, N=10N=10, and T=20T=20, then M=2M=2 and φ1=φ2=90°\varphi_{1}=\varphi_{2}=90\degree are known while ϕ1\phi_{1} and ϕ2\phi_{2} are to be estimated. SAGE222In this section, the EM, MEM, and SAGE algorithms are simply written as EM, MEM, and SAGE, respectively. for the stochastic signal model is given in Algorithm 1 and the other algorithms in this section can be obtained by referring to Algorithm 1. For comparison, let all tolerances be ϵ=0.001\epsilon=0.001, and 𝐒\mathbf{S} in the deterministic signal model is also generated by the independent random numbers sm(t)𝒞𝒩(0,Pm)s_{m}(t)\sim\mathcal{CN}(0,P_{m}).

Algorithm 1 SAGE for the Stochastic Signal Model
1:  Initialize the parameter set ξ(0)={𝜽(0),𝐏(0),σ(0)}\xi^{(0)}=\{\boldsymbol{\theta}^{(0)},\mathbf{P}^{(0)},\sigma^{(0)}\} with 𝜽=[ϕ1ϕ2]T\boldsymbol{\theta}=[\phi_{1}~{}\phi_{2}]^{T}, k=1k=1, and i=1i=1.
2:  while iMi\leq M do
3:     Find 𝐑^i(k)\hat{\mathbf{R}}^{(k)}_{i} and P^m(k,i)\hat{P}^{(k,i)}_{m} for mim\neq i at the E-step and obtain ξ(k,i)\xi^{(k,i)} at the M-step, then i=i+1i=i+1.
4:  end while
5:  while 𝜽(k)𝜽(k1)>ϵ(degree)\|\boldsymbol{\theta}^{(k)}-\boldsymbol{\theta}^{(k-1)}\|>\epsilon~{}(\mathrm{degree}) do
6:     k=k+1k=k+1 and i=1i=1.
7:     while iMi\leq M do
8:        Find 𝐑^i(k)\hat{\mathbf{R}}^{(k)}_{i} and P^m(k,i)\hat{P}^{(k,i)}_{m} for mim\neq i at the E-step and obtain ξ(k,i)\xi^{(k,i)} at the M-step, then i=i+1i=i+1.
9:     end while
10:  end while
11:  Output 𝜽(k)\boldsymbol{\theta}^{(k)}.

Furthermore, 𝜶=[0.50.5]T\boldsymbol{\alpha}=[0.5~{}0.5]^{T} in the EM algorithm and the gradient ascent method with the backtracking line search [21], given in Algorithm 2, is adopted to search the ϕm(k)\phi^{(k)}_{m}’s in (54). Several simulation parameters in Algorithm 2 are ρ=0.1\rho=0.1, η=0.3\eta=0.3, and γ=0.5\gamma=0.5.

Algorithm 2 Gradient Ascent Based Angle Estimation
1:  Define g(ϕm)=N×Tr{𝚷m𝐑^m(k)}g(\phi_{m})=N\times\mathrm{Tr}\big{\{}\boldsymbol{\Pi}_{m}\hat{\mathbf{R}}^{(k)}_{m}\big{\}}, initialize ϕm=ϕm(k1)(0,π)(radian)\phi_{m}=\phi_{m}^{(k-1)}\in(0,\pi)~{}(\mathrm{radian}), ρ(0,1)\rho\in(0,1), η(0,0.5)\eta\in(0,0.5), and γ(0,1)\gamma\in(0,1).
2:  while |g(ϕm)|>ϵ|g^{\prime}(\phi_{m})|>\epsilon do
3:     t={ρ×πϕmg(ϕm),g(ϕm)>0,ρ×ϕmg(ϕm),g(ϕm)<0.t=\left\{\begin{array}[]{ll}\rho\times\frac{\pi-\phi_{m}}{g^{\prime}(\phi_{m})},&g^{\prime}(\phi_{m})>0,\\ \rho\times\frac{-\phi_{m}}{g^{\prime}(\phi_{m})},&g^{\prime}(\phi_{m})<0.\end{array}\right.
4:     while g(ϕm+tg(ϕm))<g(ϕm)+ηt|g(ϕm)|2g\big{(}\phi_{m}+tg^{\prime}(\phi_{m})\big{)}<g(\phi_{m})+\eta t|g^{\prime}(\phi_{m})|^{2} do
5:        t=γtt=\gamma t.
6:     end while
7:     ϕm=ϕm+tg(ϕm)\phi_{m}=\phi_{m}+tg^{\prime}(\phi_{m}).
8:  end while
9:  ϕm(k)=ϕm\phi_{m}^{(k)}=\phi_{m}.

VII-A Deterministic Signal Model

For comparing the convergence of EM, MEM, and SAGE, Fig. 1 plots their (𝚯(k),σ(k))\mathcal{L}(\boldsymbol{\Theta}^{(k)},\sigma^{(k)})’s, ϕ1(k)\phi^{(k)}_{1}’s, and ϕ2(k)\phi^{(k)}_{2}’s as functions of kk under one realization. We observe that given a good initial point, the algorithms can converge to consistent DOA estimates, and EM has similar convergence with MEM while SAGE converges faster than EM and MEM.

Refer to caption
Figure 1: (𝚯(k),σ(k))\mathcal{L}(\boldsymbol{\Theta}^{(k)},\sigma^{(k)}), ϕ1(k)\phi^{(k)}_{1}, and ϕ2(k)\phi^{(k)}_{2} comparison of EM, MEM, and SAGE for the deterministic signal model under one realization versus kk with ϕ1=20°\phi_{1}=20\degree, ϕ2=80°\phi_{2}=80\degree, P1=2dBP_{1}=-2~{}\mathrm{dB}, P2=4dBP_{2}=4~{}\mathrm{dB}, σ=4dB\sigma=4~{}\mathrm{dB}, ϕ1(0)=24°\phi^{(0)}_{1}=24\degree, ϕ2(0)=84°\phi^{(0)}_{2}=84\degree, 𝐒(0)=[𝟏𝟏]T\mathbf{S}^{(0)}=[\mathbf{1}~{}\mathbf{1}]^{T}, 𝝈(0)=[0.50.5]T\boldsymbol{\sigma}^{(0)}=[0.5~{}0.5]^{T}, and σ(0)=1\sigma^{(0)}=1.

Figs. 2 and 3 show two scatter plots of the DOA estimates obtained by the algorithms under 200 independent realizations. The same samples of each realization are processed by the algorithms. In Fig. 2, the total numbers of wanted points from EM, MEM, and SAGE are 68, 72, and 179, respectively. In Fig. 3, the total numbers of wanted points from EM, MEM, and SAGE are 159, 157, and 190, respectively. Figs. 2 and 3 imply that given a poor initial point, SAGE is more efficient for avoiding the convergence to an unwanted stationary point of (𝚯,σ)\mathcal{L}(\boldsymbol{\Theta},\sigma) than EM and MEM.

Note that both sources in Fig. 2 are not closely spaced, so it is very difficult to mix up both sources and the wanted points in Fig. 2 are centered around the true position (25°,75°)(25\degree,75\degree). However, both sources in Fig. 3 are closely spaced and the wanted points are centered around (78°,70°)(78\degree,70\degree) or (70°,78°)(70\degree,78\degree), i.e., the algorithms are likely to mix up closely spaced sources. Fortunately, we only focus on the DOAs of sources in DOA estimation and mixing up sources has no effect on the result.

Refer to caption
Figure 2: Scatter plot of the DOA estimates obtained from EM, MEM, and SAGE for the deterministic signal model under 200 independent realizations with ϕ1=25°\phi_{1}=25\degree, ϕ2=75°\phi_{2}=75\degree, P1=4dBP_{1}=-4~{}\mathrm{dB}, P2=2dBP_{2}=2~{}\mathrm{dB}, σ=4dB\sigma=4~{}\mathrm{dB}, ϕ1(0)=40°\phi^{(0)}_{1}=40\degree, ϕ2(0)=60°\phi^{(0)}_{2}=60\degree, 𝐒(0)=[𝟏𝟏]T\mathbf{S}^{(0)}=[\mathbf{1}~{}\mathbf{1}]^{T}, 𝝈(0)=[0.50.5]T\boldsymbol{\sigma}^{(0)}=[0.5~{}0.5]^{T}.
Refer to caption
Figure 3: Scatter plot of the DOA estimates obtained from EM, MEM, and SAGE for the deterministic signal model under 200 independent realizations with ϕ1=70°\phi_{1}=70\degree, ϕ2=78°\phi_{2}=78\degree, P1=2dBP_{1}=-2~{}\mathrm{dB}, P2=4dBP_{2}=4~{}\mathrm{dB}, σ=4dB\sigma=4~{}\mathrm{dB}, ϕ1(0)=50°\phi^{(0)}_{1}=50\degree, ϕ2(0)=58°\phi^{(0)}_{2}=58\degree, 𝐒(0)=[𝟏𝟏]T\mathbf{S}^{(0)}=[\mathbf{1}~{}\mathbf{1}]^{T}, 𝝈(0)=[0.50.5]T\boldsymbol{\sigma}^{(0)}=[0.5~{}0.5]^{T}.

According to these simulations, we can conclude that for the deterministic signal model, 1) EM has similar convergence with MEM, and 2) SAGE outperforms EM and MEM.

VII-B Stochastic Signal Model

For comparing the convergence of EM, MEM, and SAGE, Fig. 4 plots their (𝚯(k),σ(k))\mathcal{L}(\boldsymbol{\Theta}^{(k)},\sigma^{(k)})’s, ϕ1(k)\phi^{(k)}_{1}’s, and ϕ2(k)\phi^{(k)}_{2}’s as functions of kk. We can observe that EM has similar convergence with MEM while SAGE converges faster than EM and MEM.

Refer to caption
Figure 4: (𝚯(k),σ(k))\mathcal{L}(\boldsymbol{\Theta}^{(k)},\sigma^{(k)}), ϕ1(k)\phi^{(k)}_{1}, and ϕ2(k)\phi^{(k)}_{2} comparison of EM, MEM, and SAGE for the stochastic signal model under one realization versus kk with ϕ1=20°\phi_{1}=20\degree, ϕ2=80°\phi_{2}=80\degree, P1=4dBP_{1}=-4~{}\mathrm{dB}, P2=4dBP_{2}=4~{}\mathrm{dB}, σ=4dB\sigma=4~{}\mathrm{dB}, ϕ1(0)=24°\phi^{(0)}_{1}=24\degree, ϕ2(0)=84°\phi^{(0)}_{2}=84\degree, 𝐏(0)=𝟏\mathbf{P}^{(0)}=\mathbf{1}, 𝝈(0)=[0.50.5]T\boldsymbol{\sigma}^{(0)}=[0.5~{}0.5]^{T}, and σ(0)=1\sigma^{(0)}=1.

Figs. 5 and 6 show two scatter plots of the DOA estimates obtained by the algorithms under 200 independent realizations. The same samples of each realization are processed by the algorithms. In Fig. 5, the total numbers of wanted points from EM, MEM, and SAGE are 185, 186, and 175, respectively. In Fig. 6, the total numbers of wanted points from EM, MEM, and SAGE are 161, 161, and 172, respectively. Figs. 5 and 6 indicate that EM has similar convergence with MEM but compared to EM and MEM, SAGE is less and more efficient for avoiding the convergence to an unwanted stationary point of (𝚯,σ)\mathcal{L}(\boldsymbol{\Theta},\sigma) in Figs. 5 and 6, respectively. The algorithms are likely to mix up closely spaced sources, so the wanted points in Fig. 5 are centered around (25°,75°)(25\degree,75\degree) and the wanted points in Fig. 6 are centered around (78°,70°)(78\degree,70\degree) or (70°,78°)(70\degree,78\degree).

From Figs. 4–6, we can conclude that for the stochastic signal model, 1) EM has similar convergence with MEM, and 2) SAGE cannot always outperform EM and MEM.

Refer to caption
Figure 5: Scatter plot of the DOA estimates obtained from EM, MEM, and SAGE for the stochastic signal model under 200 independent realizations with ϕ1=25°\phi_{1}=25\degree, ϕ2=75°\phi_{2}=75\degree, P1=4dBP_{1}=-4~{}\mathrm{dB}, P2=2dBP_{2}=2~{}\mathrm{dB}, σ=4dB\sigma=4~{}\mathrm{dB}, ϕ1(0)=40°\phi^{(0)}_{1}=40\degree, ϕ2(0)=60°\phi^{(0)}_{2}=60\degree, 𝐏(0)=𝟏\mathbf{P}^{(0)}=\mathbf{1}, 𝝈(0)=[0.50.5]T\boldsymbol{\sigma}^{(0)}=[0.5~{}0.5]^{T}, and σ(0)=1\sigma^{(0)}=1.
Refer to caption
Figure 6: Scatter plot of the DOA estimates obtained from EM, MEM, and SAGE for the stochastic signal model under 200 independent realizations with ϕ1=70°\phi_{1}=70\degree, ϕ2=78°\phi_{2}=78\degree, P1=2dBP_{1}=-2~{}\mathrm{dB}, P2=1dBP_{2}=-1~{}\mathrm{dB}, σ=4dB\sigma=4~{}\mathrm{dB}, ϕ1(0)=55°\phi^{(0)}_{1}=55\degree, ϕ2(0)=63°\phi^{(0)}_{2}=63\degree, 𝐏(0)=𝟏\mathbf{P}^{(0)}=\mathbf{1}, 𝝈(0)=[0.50.5]T\boldsymbol{\sigma}^{(0)}=[0.5~{}0.5]^{T}, and σ(0)=1\sigma^{(0)}=1.

VII-C Deterministic and Stochastic Signal Models

The EM, MEM, and SAGE algorithms for the deterministic signal model can process samples from the stochastic signal model, which means that the algorithms for the stochastic signal model can be compared to these for the deterministic signal model. The above numerical results have shown that EM has similar convergence with MEM, so we only compare EM and SAGE for both models in this subsection for simplicity.

Since both models have the same DOA parameter 𝜽\boldsymbol{\theta}, the stopping criterion 𝜽(k)𝜽(k1)ϵ\|\boldsymbol{\theta}^{(k)}-\boldsymbol{\theta}^{(k-1)}\|\leq\epsilon is suitable. Fig. 7 shows a scatter plot of the DOA estimates obtained by EM and SAGE under 50 independent realizations. The same samples of each realization are processed by both algorithms for both models. From Fig. 7, we can observe that both algorithms for both models obtain consistent DOA estimates.

Based on Fig. 7, Fig. 8 compares the numbers of iterations. We can easily observe that EM for the deterministic signal model generally requires a larger number of iterations than EM for the stochastic signal model. Moreover, SAGE for the deterministic signal model generally requires a smaller number of iterations than SAGE for the stochastic signal model. More importantly, SAGE for the deterministic signal model always requires the smallest number of iterations for each realization. Thus, we can conclude that SAGE for the deterministic signal model is superior to the other algorithms in the computational cost.

Refer to caption
Figure 7: Scatter plot of the DOA estimates obtained from EM and SAGE under 50 independent realizations with ϕ1=50°\phi_{1}=50\degree, ϕ2=100°\phi_{2}=100\degree, P1=4dBP_{1}=-4~{}\mathrm{dB}, P2=4dBP_{2}=4~{}\mathrm{dB}, σ=4dB\sigma=4~{}\mathrm{dB}, ϕ1(0)=55°\phi^{(0)}_{1}=55\degree, ϕ2(0)=95°\phi^{(0)}_{2}=95\degree, 𝐒(0)=[𝟏𝟏]T\mathbf{S}^{(0)}=[\mathbf{1}~{}\mathbf{1}]^{T}, 𝐏(0)=𝟏\mathbf{P}^{(0)}=\mathbf{1}, and σ(0)=1\sigma^{(0)}=1.
Refer to caption
Figure 8: Numbers of iterations required by EM and SAGE under 50 independent realizations with ϕ1=50°\phi_{1}=50\degree, ϕ2=100°\phi_{2}=100\degree, P1=4dBP_{1}=-4~{}\mathrm{dB}, P2=4dBP_{2}=4~{}\mathrm{dB}, σ=4dB\sigma=4~{}\mathrm{dB}, ϕ1(0)=55°\phi^{(0)}_{1}=55\degree, ϕ2(0)=95°\phi^{(0)}_{2}=95\degree, 𝐒(0)=[𝟏𝟏]T\mathbf{S}^{(0)}=[\mathbf{1}~{}\mathbf{1}]^{T}, 𝐏(0)=𝟏\mathbf{P}^{(0)}=\mathbf{1}, and σ(0)=1\sigma^{(0)}=1.

VIII Conclusion

We have developed the EM and SAGE algorithms for DOA estimation in unknown uniform noise and proposed an MEM algorithm applicable to the noise assumption. Then, we improve the EM, MEM, and SAGE algorithms to ensure the stability when the powers of sources are unequal. After being improved, numerical results illustrate that the EM algorithm has similar convergence with the MEM algorithm, the SAGE algorithm outperforms the EM and MEM algorithms for the deterministic signal model, and the SAGE algorithm converges faster than the EM and MEM algorithms for the stochastic signal model. In addition, numerical results indicate that when these algorithms process the same samples from the stochastic signal model, the SAGE algorithm for the deterministic signal model requires the fewest iterations.

References

  • [1] H. Krim and M. Viberg, “Two decades of array signal processing research: the parametric approach,” IEEE Signal Processing Magazine, vol. 13, no. 4, pp. 67–94, Jul. 1996.
  • [2] L. C. Godara, “Application of antenna arrays to mobile communications. II. Beam-forming and direction-of-arrival considerations,” Proceeding of the IEEE, vol. 85, no. 8, pp. 1195–1245, Aug. 1997.
  • [3] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” Journal of the Royal Statistical Society. Series B (Methodological), vol. 39, no. 1, pp. 1–38, 1977.
  • [4] M. Feder and E. Weinstein, “Parameter estimation of superimposed signals using the EM algorithm,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 36, no. 4, pp. 477–489, Apr. 1988.
  • [5] M. I. Miller and D. R. Fuhrmann, “Maximum-likelihood narrow-band direction finding and the EM algorithm,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 9, pp. 1560–1577, Sep. 1990.
  • [6] J. A. Fessler and A. O. Hero, “Space-alternating generalized expectation-maximization algorithm,” IEEE Transactions on Signal Processing, vol. 42, no. 10, pp. 2664–2677, Oct. 1994.
  • [7] P. Chung and J. F. Bohme, “Comparative convergence analysis of EM and SAGE algorithms in DOA estimation,” IEEE Transactions on Signal Processing, vol. 49, no. 12, pp. 2940–2949, Dec. 2001.
  • [8] M. Gong and B. Lyu, “Alternating maximization and the EM algorithm in maximum-likelihood direction finding,” IEEE Transactions on Vehicular Technology, vol. 70, no. 10, pp. 9634–9645, Oct. 2021.
  • [9] I. Ziskind and M. Wax, “Maximum likelihood localization of multiple sources by alternating projection,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 36, no. 10, pp. 1553–1560, Oct. 1988.
  • [10] P. Stoica and A. Nehorai, “MUSIC, maximum likelihood, and Cramer-Rao bound,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 37, no. 5, pp. 720–741, May 1989.
  • [11] P. Stoica and A. Nehorai, “Performance study of conditional and unconditional direction-of-arrival estimation,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 38, no. 10, pp. 1783–1795, Oct. 1990.
  • [12] M. Pesavento and A. B. Gershman, “Maximum-likelihood direction-of-arrival estimation in the presence of unknown nonuniform noise,” IEEE Transactions on Signal Processing, vol. 49, no. 7, pp. 1310–1324, Jul. 2001.
  • [13] C. E. Chen, F. Lorenzelli, R. E. Hudson, and K. Yao, “Stochastic maximum-likelihood DOA estimation in the presence of unknown nonuniform noise,” IEEE Transactions on Signal Processing, vol. 56, no. 7, pp. 3038–3044, Jul. 2008.
  • [14] X. Meng and D. B. Rubin, “Maximum likelihood estimation via the ECM algorithm: A general framework,” Biometrika, vol. 80, no. 2, pp. 267–278, Jun. 1993.
  • [15] P. Chung and J. F. Bohme, “DOA estimation using fast EM and SAGE algorithms,” Signal Processing, vol. 82, no. 11, pp. 1753–1762, Nov. 2002.
  • [16] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory. Upper Saddle River, USA: Prentice-Hall PTR, 1993.
  • [17] I. B. Rhodes, “A tutorial introduction to estimation and filtering,” IEEE Transactions on Automatic Control, vol. 16, no. 6, pp. 688–706, Dec. 1971.
  • [18] A. G. Jaffer, “Maximum likelihood direction finding of stochastic sources: a separable solution,” in Proc. ICASSP, New York, USA, Apr. 1988.
  • [19] P. Stoica and A. Nehorai, “On the concentrated stochastic likelihood function in array signal processing,” Circuits, Systems Signal Processing, vol. 14, no. 5, pp. 669–674, Sep. 1995.
  • [20] C. F. Jeff Wu, “On the convergence properties of the EM algorithm,” Annals of Statistics, vol. 11, no. 1, pp. 95–103, Mar. 1983.
  • [21] S. Boyd and L. Vandenberghe, Convex Optimization. New York, USA: Cambridge University Press, 2004.
Ming-yan Gong received the B.Eng. degree in metal material engineering from the Jiangsu University of Science and Technology, Zhenjiang, China, in 2016 and the M.Eng. degree in signal and information processing from the Nanjing University of Posts and Telecommunications, Nanjing, China, in 2019. He is currently working toward the Ph.D. degree with the Beijing Institute of Technology, Beijing, China. His research interests include array signal processing and MIMO communications.
Bin Lyu received the B.E. and Ph.D. degrees from the Nanjing University of Posts and Telecommunications (NJUPT), Nanjing, China, in 2013 and 2018, respectively. He is currently an Associate Professor with NJUPT. His research interests include wireless communications and signal processing.