This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Alternating Direction Method of Multipliers Based on 2,0\ell_{2,0}-norm for Multiple Measurement Vector Problem

Zekun Liu, Siwei Yu
School of Mathematics, Harbin Institute of Technology
Corresponding author: [email protected]
Abstract

The multiple measurement vector (MMV) problem is an extension of the single measurement vector (SMV) problem, and it has many applications. Nowadays, most studies of the MMV problem are based on the 2,1\ell_{2,1}-norm relaxation, which will fail in recovery under some adverse conditions. We propose an alternating direction method of multipliers (ADMM)-based optimization algorithm to achieve a larger undersampling rate for the MMV problem. The key innovation is the introduction of an 2,0\ell_{2,0}-norm sparsity constraint to describe the joint-sparsity of the MMV problem; this differs from the 2,1\ell_{2,1}-norm constraint that has been widely used in previous studies. To illustrate the advantages of the 2,0\ell_{2,0}-norm, we first prove the equivalence of the sparsity of the row support set of a matrix and its 2,0\ell_{2,0}-norm. Then, the MMV problem based on the 2,0\ell_{2,0}-norm is proposed. Next, we give our algorithm called MMV-ADMM-2,0\ell_{2,0} by applying ADMM to the reformulated problem. Moreover, based on the Kurdyka-Lojasiewicz property of objective functions, we prove that the iteration generated by the proposed algorithm globally converges to the optimal solution of the MMV problem. Finally, the performance of the proposed algorithm and comparisons with other algorithms under different conditions are studied with simulated examples. The results show that the proposed algorithm can solve a larger range of MMV problems even under adverse conditions.

1 Introduction

The multiple measurement vector (MMV) problem [1] has received considerable attention in the field of compressed sensing [2, 3]. In the single measurement vector (SMV) problem, the purpose is to recover a vector xnx\in\mathbb{R}^{n} from y=Axy=Ax, where Am×n(m<n)A\in\mathbb{R}^{m\times n}(m<n) and ymy\in\mathbb{R}^{m} are given. Because m<nm<n, the problem is underdetermined. With the prior information that xx is sparse enough [4], we can solve

minxnx0\displaystyle\min_{x\in\mathbb{R}^{n}}\left\|x\right\|_{0} (1.1)
s.t.y=Ax,\displaystyle s.t.\enspace y=Ax,

to obtain the unique solution, where the 0\ell_{0}-norm of a vector is the number of its nonzero elements. However, problem (1.1) is NP-hard and is difficult to solve directly. A common approach to overcome this drawback is to relax the 0\ell_{0}-norm to the 1\ell_{1}-norm; this is known as the basis pursuit [5]:

minxnx1\displaystyle\min_{x\in\mathbb{R}^{n}}\left\|x\right\|_{1} (1.2)
s.t.y=Ax.\displaystyle s.t.\enspace y=Ax.

Problem (1.2) is convex, implying that it can be solved much more efficiently. By solving problem (1.2) we can obtain the same solution as that for problem (1.1) under suitable conditions [3].

The MMV problem is an extension of the SMV problem. It describes the case in which multiple signals are jointly measured by the same sensing matrix from signals that are jointly sparse, that is

yj=Axj,j=1,2,,J,y^{j}=Ax^{j},j=1,2,\cdots,J,

where all of xjx^{j} have the same locations of nonzero elements. The MMV problem occurs in many applications including hyperspectral imagery [6], computation of sparse solutions to linear inverse problems [7], neuromagnetic imaging [8], source localization [9], and equalization of sparse communication channels [10]. Given Y=(y1,y2,,yJ)m×JY=(y^{1},y^{2},\cdots,y^{J})\in\mathbb{R}^{m\times J}, the MMV problem is to recover X=(x1,x2,,xJ)X=(x^{1},x^{2},\cdots,x^{J}) from Y=AXY=AX where only a few rows of XX have nonzero elements. For this purpose, the most widely used approach is 2,1\ell_{2,1}-norm minimization [7, 11, 12]:

minXn×JX2,1\displaystyle\min_{X\in\mathbb{R}^{n\times J}}\left\|X\right\|_{2,1} (1.3)
s.t.Y=AX,\displaystyle\quad s.t.\enspace Y=AX,

and the definition of the generalized p,q\ell_{p,q}-norm of a matrix is

Xp,q=(i=1nXipq)1q,\left\|X\right\|_{p,q}=(\sum_{i=1}^{n}\left\|X_{i}\right\|_{p}^{q})^{\frac{1}{q}},

where XiX_{i} is the iith row of XX. Owing to the benefit of the additional joint-sparsity property, the recovery performance in the accuracy and total time spent for the MMV problem are better than those of the SMV problem [11, 12].

In recent decades, several recovery algorithms for the MMV problem have been proposed [11, 7, 13, 14]. [13] employed the greedy pursuit strategy to recover the signals because the joint-sparsity recovery problem is NP-hard. As a similar convex relaxation technique for the SMV problem (1.2), [7, 11, 12] used 2,1\ell_{2,1}-norm minimization to solve the MMV problem. For hyperspectral imagery applications, [6] applied the alternating direction method of multipliers (ADMM) to solve problem (1.3). Most theoretical results on the relaxation of the SMV problem can be extended to the MMV problem [11]. However, as (1.2) is a convex relaxation of (1.1) that yields the same solution only under suitable conditions [3], (1.3) is only a convex relaxation of the 2,0\ell_{2,0}-norm minimization problem for MMV. (1.3) cannot provide an accurate solution under certain unfavorable conditions, whereas the 2,0\ell_{2,0}-norm minimization problem can. Therefore, some drawbacks limit the use of previous algorithms.

In this study, instead of considering the widely used convex relaxation 2,1\ell_{2,1}-norm minimization problem, we directly study the original 2,0\ell_{2,0}-norm minimization in the MMV problem. We first note the equivalence of the joint-sparsity property and the 2,0\ell_{2,0}-norm in the MMV problem. Then, we reformulate the MMV problem via the sparsity constraint in [11]. Next, we propose our algorithm called MMV-ADMM-2,0\ell_{2,0} by applying ADMM [15] to the reformulated problem. Theoretical analysis shows that the proposed algorithm is globally convergent to the unique solution of the MMV problem. Compared with existing algorithms, the proposed algorithm can solve the MMV problem when the signals are not very sparse or the number of sensors is small, and it can achieve a larger undersampling rate.

The rest of this paper is organized as follows: In Section 2, we present some definitions used for the subsequent analysis. In Section 3, we propose the optimization problem by reformulating the MMV problem. In Section 4, we present the proposed MMV-ADMM-2,0\ell_{2,0} algorithm and discuss the subproblems in detail. In Section 5, we present the convergence results obtained with the proposed algorithm. In Section 6, we describe numerical experiments performed to verify the validity of the proposed algorithm and compare it with other algorithms. In Section 7, we present the conclusions of this study.

2 Preliminaries

We list some of the notations and definitions used for the subsequent analysis. For any vector x=(x1,x2,,xN)TNx=\left(x_{1},x_{2},\cdots,x_{N}\right)^{T}\in\mathbb{R}^{N}, its sparse support set is defined as

Supp(x)={i|xi0}{1,2,,N}.Supp(x)=\left\{i|x_{i}\neq 0\right\}\subseteq\left\{1,2,\cdots,N\right\}.

Note that the assumption of the MMV problem is that the set of all vectors {xj}j=1JN\left\{x^{j}\right\}_{j=1}^{J}\in\mathbb{R}^{N} has the same sparse support set, implying that

Supp(x1)=Supp(x2)==Supp(xJ).Supp(x^{1})=Supp(x^{2})=\cdots=Supp(x^{J}).
Definition 2.1.

For a set of vectors {xj}j=1JN\left\{x^{j}\right\}_{j=1}^{J}\in\mathbb{R}^{N}, the common sparse support set is defined as

Supp({xj}j=1J)=j=1JSupp(xj).Supp(\left\{x^{j}\right\}_{j=1}^{J})=\bigcup_{j=1}^{J}Supp(x^{j}).

In fact, although it is almost impossible to satisfy the above assumption with real datasets, the MMV problem works well when real data have sufficient joint-sparsity [11].

The following few definitions are used for the subsequent analysis.

Definition 2.2 ([16]).

Let ff be a generalized real function.

  1. (i)

    For a nonempty set 𝒳\mathcal{X}, we call ff proper to 𝒳\mathcal{X} if there exist x𝒳x\in\mathcal{X} such that f(x)<+f(x)<+\infty and for any x𝒳,f(x)>x\in\mathcal{X},f(x)>-\infty.

  2. (ii)

    ff is a lower semicontinuous function if for any xm×n,lim infyxf(y)f(x)x\in\mathbb{R}^{m\times n},\liminf_{y\to x}f(y)\geq f(x).

  3. (iii)

    ff is a closed function if its epigraph

    epif={(x,t)m×n×|f(x)t}epif=\left\{(x,t)\in\mathbb{R}^{m\times n}\times\mathbb{R}|f(x)\leq t\right\}

    is a closed set.

For a generalized real function ff, the lower semicontinuity of ff is equivalent to it being a closed function.

Definition 2.3 ([17]).

For a proper lower semicontinuous function ff, its critical point is one that satisfies 0f(x)0\in\partial f(x).

In keeping with the first-order optimal condition, if xx is a minimizer of ff, it must be a critical point of ff first.

Definition 2.4 ([18]).

Sm×nS\subseteq\mathbb{R}^{m\times n} is a semi-algebraic set if there exist polynomial functions fij,gij:m×nf_{ij},g_{ij}:\mathbb{R}^{m\times n}\to\mathbb{R} with 1ip,1jq1\leq i\leq p,1\leq j\leq q such that

S=i=1pj=1q{xm×n:fij(x)=0,gij(x)>0}.S=\bigcup_{i=1}^{p}\bigcap_{j=1}^{q}\left\{x\in\mathbb{R}^{m\times n}:f_{ij}(x)=0,g_{ij}(x)>0\right\}.

The proper function ff is semi-algebraic if its graph

graphf={(x,t)m×n×:f(x)=t}graphf=\left\{(x,t)\in\mathbb{R}^{m\times n}\times\mathbb{R}:f(x)=t\right\}

is a semi-algebraic set of m×n×\mathbb{R}^{m\times n}\times\mathbb{R}.

Semi-algebra has a few useful properties:

  1. (i)

    Semi-algebraic sets are closed under finite unions and Cartesian products.

  2. (ii)

    Many common functions are actually semi-algebraic, including real polynomials, compositions of semi-algebraic functions, indicator functions of semi-algebraic sets, and π(A)\pi(A) where Am×n×A\subseteq\mathbb{R}^{m\times n}\times\mathbb{R} is a semi-algebraic set and π:m×n×m×n\pi:\mathbb{R}^{m\times n}\times\mathbb{R}\to\mathbb{R}^{m\times n} is the projection operator.

The definitions of Gâteaux differentiable, subdifferential, Karush-Kuhn-Tucker (KKT) conditions, and Kurdyka-Lojasiewicz (KL) property can be referred elsewhere [19, 17, 16, 18, 20].

The following proposition states the connection between semi-algebraic functions and KL functions.

Proposition 2.1 ([20]).

If the proper lower semicontinuous function ff is semi-algebraic, it must be a KL function.

3 Problem formulation

First, we state a proposition that is useful to describe the MMV problem in mathematical language.

Proposition 3.1.

For a set of vectors {sj}j=1JN\left\{s^{j}\right\}_{j=1}^{J}\in\mathbb{R}^{N}, denote S=(s1,s2,,sJ)S=(s^{1},s^{2},\cdots,s^{J}). Then, {sj}j=1J\left\{s^{j}\right\}_{j=1}^{J} has the common sparse support set with sparsity kS2,0=kk\Longleftrightarrow\left\|S\right\|_{2,0}=k.

Proof.

Denote the common sparse support set of {sj}j=1J\left\{s^{j}\right\}_{j=1}^{J} as SuppSupp. SS can be expressed by its row vector as S=(α1,α2,,αN)TS=(\alpha_{1},\alpha_{2},\cdots,\alpha_{N})^{T}. Then, we can conclude that

S2,0=(α12,α22,,αN2)T0=k\displaystyle\left\|S\right\|_{2,0}=\left\|(\left\|\alpha_{1}\right\|_{2},\left\|\alpha_{2}\right\|_{2},\cdots,\left\|\alpha_{N}\right\|_{2})^{T}\right\|_{0}=k
\displaystyle\Longleftrightarrow {αj20forjSuppαj2=0forjSuppand|Supp|=k(definition of 0-norm)\displaystyle\begin{cases}\left\|\alpha_{j}\right\|_{2}\neq 0\quad\text{for}\enspace j\in Supp\\ \left\|\alpha_{j}\right\|_{2}=0\quad\text{for}\enspace j\notin Supp\end{cases}\quad and\enspace\left|Supp\right|=k\quad\text{(definition of $\ell_{0}$-norm)}
\displaystyle\Longleftrightarrow {αj0forjSuppαj=0forjSupp(positivity of 2-norm)\displaystyle\begin{cases}\alpha_{j}\neq 0\quad\text{for}\enspace j\in Supp\\ \alpha_{j}=0\quad\text{for}\enspace j\notin Supp\end{cases}\quad\text{(positivity of $\ell_{2}$-norm)}
\displaystyle\Longleftrightarrow Suppis actually the common sparse support set with|Supp|=k.\displaystyle Supp\enspace\text{is actually the common sparse support set with}\enspace\left|Supp\right|=k.

This completes the whole proof. \square

Next, we convert the MMV problem to an optimization problem by using Proposition 3.1.

Assuming there are JJ sensors, the sparse vectors after sparse representation are s1,s2,,sJs_{1},s_{2},\cdots,s_{J}, where sjNs_{j}\in\mathbb{R}^{N} for all j{1,2,,J}j\in\left\{1,2,\cdots,J\right\}; we denote S=(s1,s2,,sJ)N×JS=(s_{1},s_{2},\cdots,s_{J})\in\mathbb{R}^{N\times J}. Assuming the sensing matrix ΦM×N(M<N)\Phi\in\mathbb{R}^{M\times N}(M<N), the measurement vectors are y1,y2,,yJy_{1},y_{2},\cdots,y_{J}, where yj=ΦsjMy_{j}=\Phi s_{j}\in\mathbb{R}^{M} for all j{1,2,,J}j\in\left\{1,2,\cdots,J\right\}; we denote Y=(y1,y2,,yJ)M×JY=(y_{1},y_{2},\cdots,y_{J})\in\mathbb{R}^{M\times J}. By Proposition 3.1, minimizing the joint-sparsity of {sj}j=1J\left\{s_{j}\right\}_{j=1}^{J} is equivalent to minimizing S2,0\left\|S\right\|_{2,0}. Therefore, the MMV problem can be described as

minSN×JS2,0\displaystyle\min_{S\in\mathbb{R}^{N\times J}}\left\|S\right\|_{2,0} (3.1)
s.t.Y=ΦS,\displaystyle\quad s.t.\quad Y=\Phi S,

The following theorem offers the prerequisite to make a further conversion of problem (3.1).

Theorem 3.1 ([11]).

For ΦM×N\Phi\in\mathbb{R}^{M\times N} and SN×JS\in\mathbb{R}^{N\times J}, if Y=ΦSY=\Phi S and

S2,0<Spark(Φ)+Rank(Y)12,\left\|S\right\|_{2,0}<\frac{Spark(\Phi)+Rank(Y)-1}{2}, (3.2)

then problem (3.1) will only possess the unique solution SS.

Based on Theorem 3.1, we do not require all vectors to share the same sparse support set in the MMV problem. Instead, we just require that SS satisfies (3.2).

To consider the odevity of Spark(Φ)+Rank(Y)Spark(\Phi)+Rank(Y) and the measurement error between YY and the real ΦS\Phi S, set

s=Spark(Φ)+Rank(Y)22,s=\left\lfloor\frac{Spark(\Phi)+Rank(Y)-2}{2}\right\rfloor, (3.3)

where \left\lfloor\cdot\right\rfloor is the round down operator. Problem (3.1) can be converted to

minSN×JYΦSF2\displaystyle\min_{S\in\mathbb{R}^{N\times J}}\left\|Y-\Phi S\right\|_{F}^{2} (3.4)
s.t.S2,0s.\displaystyle\quad s.t.\quad\left\|S\right\|_{2,0}\leq s.

It is a nonconvex constrained optimization problem. The indicator function

(X)={0, if X+, if X\mathcal{I}_{\mathcal{M}}(X)=\begin{cases}0,&\text{ {if} }X\in\mathcal{M}\\ +\infty,&\text{ {if} }X\notin\mathcal{M}\end{cases}

is introduced to the set

={XN×J:X2,0s}\mathcal{M}=\left\{X\in\mathbb{R}^{N\times J}:\left\|X\right\|_{2,0}\leq s\right\}

to move the nonconvex constraint to the objective function, and matrix BN×JB\in\mathbb{R}^{N\times J} is introduced as an auxiliary variable of SS to reformulate problem (3.4) as

minB,SN×JYΦSF2+(B)\displaystyle\min_{B,S\in\mathbb{R}^{N\times J}}\left\|Y-\Phi S\right\|_{F}^{2}+\mathcal{I}_{\mathcal{M}}(B) (3.5)
s.t.BS=0.\displaystyle\quad s.t.\quad\quad B-S=0.

Now we have obtained the final problem formulation (3.5) to describe the MMV problem.

4 Algorithm

Problem (3.5) is a two block nonconvex optimization problem with a linear equality constraint; it can be solved using ADMM [15].

The augmented Lagrangian function associated with problem (3.5) is defined as

ρ(B,S,L)=YΦSF2+(B)+L,BS+ρ2BSF2,\mathcal{L}_{\rho}(B,S,L)=\left\|Y-\Phi S\right\|_{F}^{2}+\mathcal{I}_{\mathcal{M}}(B)+\left\langle L,B-S\right\rangle+\frac{\rho}{2}\left\|B-S\right\|_{F}^{2}, (4.1)

where LN×JL\in\mathbb{R}^{N\times J} is the Lagrangian multiplier of the equation constraint, ,\left\langle\cdot,\cdot\right\rangle is the inner product of matrices, and ρ>0\rho>0 is the penalty parameter.

By using the ADMM scheme, we can obtain an approximate solution of problem (3.5) by minimizing the variables one by one with others fixed, as follows:

{Bk+1=argminBN×Jρ(B,Sk,Lk),Sk+1=argminSN×Jρ(Bk+1,S,Lk),Lk+1=Lk+ρ(Bk+1Sk+1).\displaystyle\left\{\begin{matrix}B^{k+1}=&\mathop{\arg\min}\limits_{B\in\mathbb{R}^{N\times J}}\mathcal{L}_{\rho}(B,S^{k},L^{k}),\\ S^{k+1}=&\mathop{\arg\min}\limits_{S\in\mathbb{R}^{N\times J}}\mathcal{L}_{\rho}(B^{k+1},S,L^{k}),\\ L^{k+1}=&L^{k}+\rho(B^{k+1}-S^{k+1}).\end{matrix}\right. (4.2)

Next, the subproblems in Equation (4.2) are solved one by one.

4.1 Update B

Fix SS and LL,

Bk+1=argminBN×Jρ(B,Sk,Lk)=argminBN×J{(B)+Lk,BSk+ρ2BSkF2}=argminBBSk+LkρF2=𝒫(SkLkρ),\begin{split}B^{k+1}&=\mathop{\arg\min}\limits_{B\in\mathbb{R}^{N\times J}}\mathcal{L}_{\rho}(B,S^{k},L^{k})\\ &=\mathop{\arg\min}\limits_{B\in\mathbb{R}^{N\times J}}\left\{\mathcal{I}_{\mathcal{M}}(B)+\left\langle L^{k},B-S^{k}\right\rangle+\frac{\rho}{2}\left\|B-S^{k}\right\|_{F}^{2}\right\}\\ &=\mathop{\arg\min}\limits_{B\in\mathcal{M}}\left\|B-S^{k}+\frac{L^{k}}{\rho}\right\|_{F}^{2}\\ &=\mathcal{P}_{\mathcal{M}}(S^{k}-\frac{L^{k}}{\rho}),\end{split} (4.3)

where 𝒫()\mathcal{P}_{\mathcal{M}}(\cdot) is the projection operator on set ={XN×J:X2,0s}\mathcal{M}=\left\{X\in\mathbb{R}^{N\times J}:\left\|X\right\|_{2,0}\leq s\right\}.

Actually, 𝒫()\mathcal{P}_{\mathcal{M}}(\cdot) is the hard thresholding operator [21]. If SkLkρ2,0s\left\|S^{k}-\frac{L^{k}}{\rho}\right\|_{2,0}\leq s, then Bk+1=SkLkρB^{k+1}=S^{k}-\frac{L^{k}}{\rho}; otherwise, truncate SkLkρS^{k}-\frac{L^{k}}{\rho} with the rows whose 2\ell_{2}-norm is the top ss large preserved.

4.2 Update S

Fix BB and LL,

Sk+1=argminSN×Jρ(Bk+1,S,Lk)=argminSN×J{YΦSF2+Lk,Bk+1S+ρ2Bk+1SF2}=argminSN×J{YΦSF2+ρ2Bk+1S+LkρF2},\begin{split}S^{k+1}&=\mathop{\arg\min}\limits_{S\in\mathbb{R}^{N\times J}}\mathcal{L}_{\rho}(B^{k+1},S,L^{k})\\ &=\mathop{\arg\min}\limits_{S\in\mathbb{R}^{N\times J}}\left\{\left\|Y-\Phi S\right\|_{F}^{2}+\left\langle L^{k},B^{k+1}-S\right\rangle+\frac{\rho}{2}\left\|B^{k+1}-S\right\|_{F}^{2}\right\}\\ &=\mathop{\arg\min}\limits_{S\in\mathbb{R}^{N\times J}}\left\{\left\|Y-\Phi S\right\|_{F}^{2}+\frac{\rho}{2}\left\|B^{k+1}-S+\frac{L^{k}}{\rho}\right\|_{F}^{2}\right\},\end{split} (4.4)

Denote f(X)=YΦXF2+ρ2Bk+1X+LkρF2f(X)=\left\|Y-\Phi X\right\|_{F}^{2}+\frac{\rho}{2}\left\|B^{k+1}-X+\frac{L^{k}}{\rho}\right\|_{F}^{2}. The first-order optimal condition for unconstrained differentiable optimization problem gives

Sk+1=argminSN×Jf(S)f(Sk+1)=0.S^{k+1}=\mathop{\arg\min}\limits_{S\in\mathbb{R}^{N\times J}}f(S)\Longleftrightarrow\nabla f(S^{k+1})=0.

It is easy to calculate

f(Sk+1)=(2ΦTΦ+ρI)Sk+12ΦTYρBk+1Lk.\nabla f(S^{k+1})=(2\Phi^{T}\Phi+\rho I)S^{k+1}-2\Phi^{T}Y-\rho B^{k+1}-L^{k}.

Therefore, by solving the linear equation f(Sk+1)=0\nabla f(S^{k+1})=0, we obtain

Sk+1=(2ΦTΦ+ρI)1(2ΦTY+ρBk+1+Lk).S^{k+1}=(2\Phi^{T}\Phi+\rho I)^{-1}(2\Phi^{T}Y+\rho B^{k+1}+L^{k}). (4.5)

4.3 Update L

Fix BB and SS. The Lagrangian multiplier LL can be updated as

Lk+1=Lk+ρ(Bk+1Sk+1).L^{k+1}=L^{k}+\rho(B^{k+1}-S^{k+1}). (4.6)

4.4 Convergence criterion

The convergence criterion of the proposed algorithm is given below, and the reasons will be discussed in the following section.

Assuming {(Bk,Sk,Lk)}\left\{(B^{k},S^{k},L^{k})\right\} is the sequence generated using the ADMM procedure (4.2), the convergence criterion is given as

{BkSkF0,Sk+1SkF0,LkF0.\begin{cases}\left\|B^{k}-S^{k}\right\|_{F}\to 0,\\ \left\|S^{k+1}-S^{k}\right\|_{F}\to 0,\\ \left\|L^{k}\right\|_{F}\to 0.\end{cases} (4.7)

We call our algorithm for solving problem (3.5) the MMV-ADMM-2,0\ell_{2,0}; it is summarized in Algorithm 1. Section 6 describes the parameter initialization.

Input: The measurement data matrix YY, the sensing matrix Φ\Phi;
Output: The reconstructed sparse matrix S^\hat{S};
1 initialization:S0,L0,s,ρS^{0},L^{0},s,\rho, and let k=0k=0;
2do
3       Update BB by: Bk+1=𝒫(SkLkρ)B^{k+1}=\mathcal{P}_{\mathcal{M}}(S^{k}-\frac{L^{k}}{\rho});
4      Update SS by: Sk+1=(2ΦTΦ+ρI)1(2ΦTY+ρBk+1+Lk)S^{k+1}=(2\Phi^{T}\Phi+\rho I)^{-1}(2\Phi^{T}Y+\rho B^{k+1}+L^{k});
5      Update the Lagrangian multiplier LL: Lk+1=Lk+ρ(Bk+1Sk+1)L^{k+1}=L^{k}+\rho(B^{k+1}-S^{k+1});
6      Update k:k=k+1k:k=k+1;
7while not satisfy the convergence criterion;
return S^=Bk\hat{S}=B^{k}.
Algorithm 1 MMV-ADMM-2,0\ell_{2,0}

Notably, 2ΦTΦ+ρI2\Phi^{T}\Phi+\rho I is unchanged in the iteration; therefore, its inverse needs to be calculated only once. Algorithm 1 can be accelerated in an obvious way. When updating SS by (4.5), we need to calculate the inverse of an N×NN\times N matrix; the computational time required for this purpose will be high when NN is large. Because M<NM<N in compressed sensing, we can use the Sherman-Morrison-Woodbury (SMW) formula [22, 23] to simplify the calculation of the inverse. Specifically, based on the SMW-formula, we have

(2ΦTΦ+ρI)1=Iρ2ΦT(I+2ΦΦTρ)1Φρ2.(2\Phi^{T}\Phi+\rho I)^{-1}=\frac{I}{\rho}-\frac{2\Phi^{T}(I+\frac{2\Phi\Phi^{T}}{\rho})^{-1}\Phi}{\rho^{2}}. (4.8)

It converts the calculation of an N×NN\times N matrix inverse to the inverse of an M×MM\times M matrix. Because MM is much smaller than NN in compressed sensing, this will reduce the computational time. We call our algorithm MMV-ADMM-2,0\ell_{2,0}-SMW when using (4.8) to update SS.

5 Convergence analysis

Algorithm 1 is a two-block ADMM for a nonconvex problem; its convergence analysis remains an open problem. However, owing to the KL property of objective functions in problem (3.5), Algorithm 1 has global convergence.

First, we prove some properties of objective functions in problem (3.5).

Proposition 5.1.

The 2,0\ell_{2,0}-norm of a matrix is proper and lower semicontinuous.

Proof.

First, we prove that the 0\ell_{0}-norm of a vector is lower semicontinuous. Denote

f:\displaystyle f: N\displaystyle\mathbb{R}^{N}\to\mathbb{R}
xx0.\displaystyle x\mapsto\left\|x\right\|_{0}.

For any x=(x1,x2,,xN)TNx=(x_{1},x_{2},\cdots,x_{N})^{T}\in\mathbb{R}^{N}, assume

f(x)=x0=s,f(x)=\left\|x\right\|_{0}=s,
Supp(x)={l1,l2,,ls}{1,2,,N}.Supp(x)=\left\{l_{1},l_{2},\cdots,l_{s}\right\}\subseteq\left\{1,2,\cdots,N\right\}.

When yk=(y1k,y2k,,yNk)Tkxy^{k}=(y_{1}^{k},y_{2}^{k},\cdots,y_{N}^{k})^{T}\overset{k\to\infty}{\longrightarrow}x, for

ε=minj=1,2,,s|xlj|2>0,\varepsilon=\frac{\min_{j=1,2,\cdots,s}\left|x_{l_{j}}\right|}{2}>0,

there exist KK\in\mathbb{N} such that

|yikxi|<ε,i=1,2,,N for all k>K.\left|y_{i}^{k}-x_{i}\right|<\varepsilon,i=1,2,\cdots,N\text{ for all }k>K.

Therefore, when k>Kk>K,

yljk0,j=1,2,,s,y_{l_{j}}^{k}\neq 0,j=1,2,\cdots,s,

implying f(yk)=yk0s.f(y^{k})=\left\|y^{k}\right\|_{0}\geq s.

According to the sign-preserving property of limit, we have lim infkf(yk)s=f(x)\liminf_{k\to\infty}f(y^{k})\geq s=f(x); therefore, ff is lower semicontinuous.

For any SN×JS\in\mathbb{R}^{N\times J}, SS can be expressed by its row vector as S=(α1,α2,,αN)TS=(\alpha_{1},\alpha_{2},\cdots,\alpha_{N})^{T}. Denote

g:\displaystyle g: N×JN\displaystyle\mathbb{R}^{N\times J}\to\mathbb{R}^{N}
S(α12,α22,,αN2)T.\displaystyle S\mapsto(\left\|\alpha_{1}\right\|_{2},\left\|\alpha_{2}\right\|_{2},\cdots,\left\|\alpha_{N}\right\|_{2})^{T}.

Then, the 2,0\ell_{2,0}-norm of a matrix

h:\displaystyle h: N×J\displaystyle\mathbb{R}^{N\times J}\to\mathbb{R}
SS2,0\displaystyle S\mapsto\left\|S\right\|_{2,0}

is the composition of ff and gg, that is, h=fgh=f\circ g.

It is apparent that gg is continuous. Upon combining ff, a lower semicontinuous function, we can conclude that for any SN×JS\in\mathbb{R}^{N\times J}, SkSS^{k}\to S,

lim infkh(Sk)=lim infkf(g(Sk))f(limkg(Sk))=f(g(S))=h(S).\liminf_{k\to\infty}h(S^{k})=\liminf_{k\to\infty}f(g(S^{k}))\geq f(\lim_{k\to\infty}g(S^{k}))=f(g(S))=h(S).

Therefore, hh is lower semicontinuous.

Apparently, the 2,0\ell_{2,0}-norm is proper. Therefore, it is proper and lower semicontinuous. \square

Based on the equivalence of the semicontinuous function and closed function, Corollary 5.1 holds immediately.

Corollary 5.1.

The 2,0\ell_{2,0}-norm of a matrix is a closed function.

Corollary 5.2.

The indicator function

(X)={0, if X+, if X\mathcal{I}_{\mathcal{M}}(X)=\begin{cases}0,&\text{ {if} }X\in\mathcal{M}\\ +\infty,&\text{ {if} }X\notin\mathcal{M}\end{cases}

to the set ={XN×J:X2,0s}\mathcal{M}=\left\{X\in\mathbb{R}^{N\times J}:\left\|X\right\|_{2,0}\leq s\right\} is also proper and lower semicontinuous.

Proof.

We conclude that (X)\mathcal{I}_{\mathcal{M}}(X) is a closed function. In fact, for any sequence (Ak,tk)epi(A_{k},t_{k})\in epi\mathcal{I}_{\mathcal{M}}, we have tk(Ak)t_{k}\geq\mathcal{I}_{\mathcal{M}}(A_{k}). According to Definition 2.2, we only need to prove if (Ak,tk)(A,t)(A_{k},t_{k})\to(A,t); then, t(A)t\geq\mathcal{I}_{\mathcal{M}}(A).

Because tktt_{k}\to t, kk is large enough in only two cases:

If Ak2,0>s\left\|A_{k}\right\|_{2,0}>s, then tk=(Ak)=+t_{k}=\mathcal{I}_{\mathcal{M}}(A_{k})=+\infty. From tktt_{k}\to t, we have t=+t=+\infty; therefore, t(A)t\geq\mathcal{I}_{\mathcal{M}}(A).

If Ak2,0s\left\|A_{k}\right\|_{2,0}\leq s, then tk(Ak)=0t_{k}\geq\mathcal{I}_{\mathcal{M}}(A_{k})=0. From tktt_{k}\to t, we have t0t\geq 0. Through Corollary 5.1, we know that the 2,0\ell_{2,0}-norm of a matrix is a closed function. Therefore,

(Ak,s)epi2,0(Ak,s)(A,s)}(A,s)epi2,0,\left.\begin{aligned} &(A_{k},s)\in epi\left\|\cdot\right\|_{2,0}\\ &(A_{k},s)\to(A,s)\end{aligned}\right\}\Longrightarrow(A,s)\in epi\left\|\cdot\right\|_{2,0},

Implying that A2,0s\left\|A\right\|_{2,0}\leq s. Therefore, t0=(A)t\geq 0=\mathcal{I}_{\mathcal{M}}(A).

Further, t(A)t\geq\mathcal{I}_{\mathcal{M}}(A). This proves that (X)\mathcal{I}_{\mathcal{M}}(X) is closed. It is apparent that (X)\mathcal{I}_{\mathcal{M}}(X) is proper. Therefore, (X)\mathcal{I}_{\mathcal{M}}(X) is also proper and lower semicontinuous. \square

Proposition 5.2.

The 2,0\ell_{2,0}-norm of a matrix is a KL function.

Proof.

[20] proved that both 0\left\|\cdot\right\|_{0} and p\left\|\cdot\right\|_{p} with p+p\in\mathbb{Q}_{+} are semi-algebraic. Denote functions ff, gg, and hh in the same way as the definitions in Proposition 5.1. For any SN×JS\in\mathbb{R}^{N\times J}, SS can be expressed by its row vector as S=(α1,α2,,αN)TS=(\alpha_{1},\alpha_{2},\cdots,\alpha_{N})^{T}:

g(S)=(α12,α22,,αN2)T=i=1N{αi2}g(S)=(\left\|\alpha_{1}\right\|_{2},\left\|\alpha_{2}\right\|_{2},\cdots,\left\|\alpha_{N}\right\|_{2})^{T}=\prod_{i=1}^{N}\left\{\left\|\alpha_{i}\right\|_{2}\right\}

is the Cartesian product of the 2\ell_{2}-norm. Because the semi-algebra is stable under a Cartesian product, gg is also semi-algebraic. Therefore, the composition h=fgh=f\circ g is semi-algebraic.

Proposition 5.1 proves that hh is proper lower semicontinuous. Therefore, hh is a KL function according to Proposition 2.1. \square

Corollary 5.3.

The indicator function

(X)={0, if X+, if X\mathcal{I}_{\mathcal{M}}(X)=\begin{cases}0,&\text{ {if} }X\in\mathcal{M}\\ +\infty,&\text{ {if} }X\notin\mathcal{M}\end{cases}

to the set ={XN×J:X2,0s}\mathcal{M}=\left\{X\in\mathbb{R}^{N\times J}:\left\|X\right\|_{2,0}\leq s\right\} is also a KL function.

Proof.

First, we prove that \mathcal{M} is semi-algebraic. Because 2,0\left\|\cdot\right\|_{2,0}\in\mathbb{N}, \mathcal{M} can be expressed as

={XN×J:X2,0s}=t=0s{XN×J:X2,0=t}.\mathcal{M}=\left\{X\in\mathbb{R}^{N\times J}:\left\|X\right\|_{2,0}\leq s\right\}=\bigcup_{t=0}^{s}\left\{X\in\mathbb{R}^{N\times J}:\left\|X\right\|_{2,0}=t\right\}.

Proposition 5.2 proves that the 2,0\ell_{2,0}-norm is a semi-algebraic function. Therefore, we conclude through Definition 2.4 that

graph2,0={(X,t):X2,0=t}graph\left\|\cdot\right\|_{2,0}=\left\{(X,t):\left\|X\right\|_{2,0}=t\right\}

is a semi-algebraic set.

Denote the projection operator

π:\displaystyle\pi: N×J×N×J\displaystyle\mathbb{R}^{N\times J}\times\mathbb{R}\to\mathbb{R}^{N\times J}
(X,t)X.\displaystyle(X,t)\mapsto X.

Then,

=t=0sπ(graph2,0).\mathcal{M}=\bigcup_{t=0}^{s}\pi(graph\left\|\cdot\right\|_{2,0}).

Because the semi-algebra is stable under finite union and the projection operator π\pi, \mathcal{M} is semi-algebraic.

(X)\mathcal{I}_{\mathcal{M}}(X) is the indicator function to \mathcal{M}; it is also semi-algebraic. Corollary 5.2 proves that it is also proper lower semicontinuous. Therefore, by Proposition 2.1, (X)\mathcal{I}_{\mathcal{M}}(X) is a KL function. \square

Next, we prove that Algorithm 1 has global convergence based on [24].

In [24] the authors discuss the following optimization problem:

minx,yf(x)+g(y)\displaystyle\min_{x,y}f(x)+g(y) (5.1)
s.t.Ax+y=b,\displaystyle s.t.\quad Ax+y=b,

where ff is proper and lower semicontinuous, and gg is Gradient-LL-Lipschitz continuous.

The authors make the following assumptions:

  1. (i)

    The two minimization subproblems in (4.2) possess solutions, and

  2. (ii)

    The penalty parameter ρ>2L\rho>2L,

  3. (iii)

    ATAμIA^{T}A\succeq\mu I for some μ>0\mu>0.

Set A=IA=-I, x=Bx=B, y=Sy=S, b=0b=0, f(B)=(B)f(B)=\mathcal{I}_{\mathcal{M}}(B), and g(S)=YΦSF2g(S)=\left\|Y-\Phi S\right\|_{F}^{2} in (5.1); this gives problem (3.5). It is apparent that problem (3.5) satisfies the form of (5.1) and the above assumptions.

Next, we refer to the sufficient conditions in [24] to guarantee that the sequence {(Bk,Sk,Lk)}\left\{(B^{k},S^{k},L^{k})\right\} generated by Algorithm 1 is bounded.

Lemma 5.1 ([24]).

By applying the classic ADMM to (5.1), we obtain a sequence {wk}\left\{w^{k}\right\}. Suppose that

g¯:=infy{g(y)12Lg(y)2}>.\bar{g}:=\inf_{y}\left\{g(y)-\frac{1}{2L}\left\|\nabla g(y)\right\|^{2}\right\}>-\infty. (5.2)

{wk}\left\{w^{k}\right\} is bounded as long as one of the following two statements holds:

  1. (i)

    lim infx+f(x)=+\liminf_{\left\|x\right\|\to+\infty}f(x)=+\infty, and

  2. (ii)

    infxf(x)>\inf_{x}f(x)>-\infty and lim infy+g(y)=+\liminf_{\left\|y\right\|\to+\infty}g(y)=+\infty.

The main result in [24] is shown below.

Theorem 5.1 ([24]).

Suppose that Lemma 5.1 holds and that ff and gg are KL functions. Then, {wk}\left\{w^{k}\right\} converges to the KKT point of (5.1).

As an application of the above theorem, Algorithm 1 has global convergence.

Theorem 5.2.

Algorithm 1 has global convergence, and the generated sequence {(Bk,Sk,Lk)}\left\{(B^{k},S^{k},L^{k})\right\} converges to the KKT point of problem (3.5).

Proof.

First, we prove that {(Bk,Sk,Lk)}\left\{(B^{k},S^{k},L^{k})\right\} is bounded.

For g(S)=YΦSF2g(S)=\left\|Y-\Phi S\right\|_{F}^{2}, we have g(S)=2ΦT(ΦSY)\nabla g(S)=2\Phi^{T}(\Phi S-Y) and L=2ΦF2L=2\left\|\Phi\right\|_{F}^{2}. For any SN×JS\in\mathbb{R}^{N\times J},

YΦSF212L2ΦT(ΦSY)F2YΦSF24ΦF22LYΦSF2=0.\left\|Y-\Phi S\right\|_{F}^{2}-\frac{1}{2L}\left\|2\Phi^{T}(\Phi S-Y)\right\|_{F}^{2}\geq\left\|Y-\Phi S\right\|_{F}^{2}-\frac{4\left\|\Phi\right\|_{F}^{2}}{2L}\left\|Y-\Phi S\right\|_{F}^{2}=0.

Therefore, (5.2) is satisfied. Moreover, it is not hard to see that (ii) in Lemma 5.1 holds. Therefore, {(Bk,Sk,Lk)}\left\{(B^{k},S^{k},L^{k})\right\} is bounded.

Corollary 5.3 proves that f(B)=(B)f(B)=\mathcal{I}_{\mathcal{M}}(B) is a KL function. Because g(S)=YΦSF2g(S)=\left\|Y-\Phi S\right\|_{F}^{2} is a real polynomial, it is also a KL function.

All the conditions in Theorem 5.1 are satisfied; therefore, {(Bk,Sk,Lk)}\left\{(B^{k},S^{k},L^{k})\right\} globally converges to the KKT point of (3.5). \square

Finally, we state the theorem about settings for the convergence criterion (4.7).

Theorem 5.3.

Algorithm 1 is nearly convergent to the optimal solution of problem (3.5) if and only if its iteration sequence {(Bk,Sk,Lk)}\left\{(B^{k},S^{k},L^{k})\right\} satisfies the convergence criterion (4.7).

Proof.

In [25], the authors note that for the following problem

minxf(x)\displaystyle\min_{x}f(x) (5.3)
s.t.x𝒳,x0k,\displaystyle s.t.\enspace x\in\mathcal{X},\left\|x\right\|_{0}\leq k,

where 𝒳\mathcal{X} denotes a convex polyhedron, the optimal solution must also satisfy the KKT conditions.

It is easy to see that problem (3.4) satisfies the form of (5.3). Because problem (3.5) is equivalent to problem (3.4), the optimal point of (3.5) is also its KKT point.

Therefore, if {(Bk,Sk,Lk)}\left\{(B^{k},S^{k},L^{k})\right\} converges to the optimal solution (B,S,L)(B^{\ast},S^{\ast},L^{\ast}), then (B,S,L)(B^{\ast},S^{\ast},L^{\ast}) is a KKT point of (3.5). The Lagrangian function of (3.5) is

(B,S,L)=YΦSF2+(B)+L,BS,\mathcal{L}(B,S,L)=\left\|Y-\Phi S\right\|_{F}^{2}+\mathcal{I}_{\mathcal{M}}(B)+\left\langle L,B-S\right\rangle, (5.4)

it satisfies the KKT conditions at (B,S,L)(B^{\ast},S^{\ast},L^{\ast}). Therefore,

{0=S(B,S,L)=2ΦT(ΦSY)L,0B(B,S,L)=B(B)+L,0=BS.\begin{cases}0=\nabla_{S}\mathcal{L}(B^{\ast},S^{\ast},L^{\ast})=2\Phi^{T}(\Phi S^{\ast}-Y)-L^{\ast},\\ 0\in\partial_{B}\mathcal{L}(B^{\ast},S^{\ast},L^{\ast})=\partial_{B}\mathcal{I}_{\mathcal{M}}(B^{\ast})+L^{\ast},\\ 0=B^{\ast}-S^{\ast}.\end{cases} (5.5)

From (Bk,Sk,Lk)(B,S,L)(B^{k},S^{k},L^{k})\to(B^{\ast},S^{\ast},L^{\ast}) and the third formula of (5.5), we can conclude that BkSkF0\left\|B^{k}-S^{k}\right\|_{F}\to 0.

Denote ff in the same way as in the definition in Subsection 4.2. When updating SS, we have

f(Sk+1)=(2ΦTΦ+ρI)Sk+12ΦTYρBk+1Lk=0,\nabla f(S^{k+1})=(2\Phi^{T}\Phi+\rho I)S^{k+1}-2\Phi^{T}Y-\rho B^{k+1}-L^{k}=0,

Let k+k\to+\infty. We obtain 2ΦT(ΦSY)L=02\Phi^{T}(\Phi S^{\ast}-Y)-L^{\ast}=0, implying that the first formula of (5.5) is satisfied in Algorithm 1 without any further conditions.

From the rule of updating BB in (4.3), we know that (Bk)=0\mathcal{I}_{\mathcal{M}}(B^{k})=0 is always satisfied during iterations in Algorithm 1. According to Corollary 5.2, (X)\mathcal{I}_{\mathcal{M}}(X) is proper and lower semicontinuous. Thus,

0(B)lim infk+(Bk)=0.0\leq\mathcal{I}_{\mathcal{M}}(B^{\ast})\leq\liminf_{k\to+\infty}\mathcal{I}_{\mathcal{M}}(B^{k})=0.

Therefore, when limiting the space to all iteration points generated by Algorithm 1 and their accumulations, we have (X)0\mathcal{I}_{\mathcal{M}}(X)\equiv 0. Thus, B(B)=0\partial_{B}\mathcal{I}_{\mathcal{M}}(B^{\ast})=0. Therefore, the second formula of (5.5) holds if and only if L=0L^{\ast}=0, which is equivalent to LkF0\left\|L^{k}\right\|_{F}\to 0.

Thus far we have proved that the KKT conditions (5.5) are equivalent to the first and third formulas in (4.7). From SkSS^{k}\to S^{\ast}, we have Sk+1SkF0\left\|S^{k+1}-S^{k}\right\|_{F}\to 0, which is the second formula in (4.7). Therefore, the optimal point (B,S,L)(B^{\ast},S^{\ast},L^{\ast}) must satisfy the convergence criterion (4.7). This completes the necessity.

For sufficiency, the convergence criterion for a generalized constrained optimization problem [16] is that the distance between the adjacent iteration points tends to zero and the KKT conditions at the current iteration point are nearly satisfied. We have proved that the first and third formulas in (4.7) are equivalent to the KKT conditions. Therefore, the convergence criterion (4.7) ensures that the KKT conditions are satisfied. From combining the second formula in (4.7), which implies that the distance between the adjacent iteration points tends to zero, we know that the convergence criterion (4.7) guarantees {(Bk,Sk,Lk)}\left\{(B^{k},S^{k},L^{k})\right\} to converge to a KKT point of (3.5). Denote the KKT point as (B,S,L)(B^{\ast},S^{\ast},L^{\ast}); we say that it is also the optimal point of (3.5).

In fact, at the KKT point (B,S,L)(B^{\ast},S^{\ast},L^{\ast}), we have proved that L=0L^{\ast}=0 and (5.5) are satisfied. Therefore, from the first formula of (5.5), we can conclude that

ΦT(ΦSY)=0.\Phi^{T}(\Phi S^{\ast}-Y)=0. (5.6)

The sensing matrix ΦM×N\Phi\in\mathbb{R}^{M\times N} in the MMV problem is known to be row full rank. Thus, from (5.6), we know that ΦSY=0\Phi S^{\ast}-Y=0. We have proved (B)=0\mathcal{I}_{\mathcal{M}}(B^{\ast})=0, implying that B2,0s\left\|B^{\ast}\right\|_{2,0}\leq s. Because B=SB^{\ast}=S^{\ast}, S2,0s\left\|S^{\ast}\right\|_{2,0}\leq s too. Therefore, the KKT point (B,S,L)(B^{\ast},S^{\ast},L^{\ast}) satisfies Y=ΦS,S2,0sY=\Phi S^{\ast},\left\|S^{\ast}\right\|_{2,0}\leq s. From Theorem 3.1, SS^{\ast} is the unique optimal point of (3.1). According to the equivalence of (3.1) and (3.5), we can conclude that (B,S,L)(B^{\ast},S^{\ast},L^{\ast}) is also the unique optimal point of (3.5).

Therefore, when the convergence criterion is satisfied, Algorithm 1 converges to the optimal solution of (3.5).

This completes the proof. \square

In fact, it is not hard to find that all of the discussions above are also correct when we replace the 2,0\ell_{2,0}-norm with the p,0\ell_{p,0}-norm with p+p\in\mathbb{Q}_{+}, implying that the proposed algorithm is also applicable to the p,0\ell_{p,0}-norm case with p+p\in\mathbb{Q}_{+}. However, for the convenience of calculation, we just consider the 2,0\ell_{2,0}-norm case.

6 Numerical simulations

We designed numerical experiments to verify the validity of the proposed algorithm and theorems and then explored the performance and advantages of our MMV-ADMM-2,0\ell_{2,0} by comparing it with existing algorithms.

6.1 Experiment setup

All experiments were based on synthetic data and were performed on a PC with an Intel Core i7-8565U 1.8 GHz CPU and 20 GB RAM. The results in Subsections 6.4-6.7 are averaged over 100 independent experiments. Our datasets are generated as follows: sensing matrix ΦM×N\Phi\in\mathbb{R}^{M\times N} is an i.i.d.i.i.d. Gaussian random matrix with unit-norm columns, and the ground truth of the recovery problem SN×JS\in\mathbb{R}^{N\times J} is generated in two steps. First, KK rows were randomly selected as nonzero rows, with the remaining rows all being zero. Then, the elements of the selected KK rows were independently generated from 𝒩(0,1)\mathcal{N}(0,1). The solution obtained from the different recovery algorithms was denoted as S^\hat{S}. The root-mean-square error (RMSE), given as

RMSE=S^SFNJ,RMSE=\frac{\left\|\hat{S}-S\right\|_{F}}{\sqrt{NJ}},

was used to compare the recovery quality. The average running time was used to measure the computational cost. Recovery is considered successful when RMSE<105RMSE<10^{-5}.

For comparison, we test SOMP [13] and MFOCUSS [7], both of which are traditional algorithms that show good performance for the MMV problem. We also include the MMV-SPG algorithm based on the 2,1\ell_{2,1}-norm, which is derived from a solver for large-scale sparse reconstruction called SPGL1 [26], to solve the MMV problem. Because the projection operator 𝒫()\mathcal{P}_{\mathcal{M}}(\cdot) in (4.3) is a hard thresholding operator, we also check the SNIHT [14] for the MMV problem. As the most direct comparison, MMV-ADMM-2,1\ell_{2,1} [6] is the same ADMM scheme algorithm based on the 2,1\ell_{2,1}-norm. We introduce it in the experiments to compare the changes in performance after the 2,1\ell_{2,1}-norm is replaced by the 2,0\ell_{2,0}-norm.

The parameters of all algorithms are set as shown in Table 1. For MMV-ADMM-2,0\ell_{2,0} and MMV-ADMM-2,1\ell_{2,1}, S0S^{0} and L0L^{0} are separately initialized as an i.i.d.i.i.d. Gaussian random matrix and a null matrix. For MMV-ADMM-2,0\ell_{2,0}, if we have the prior information that the joint-sparsity of SS is about KK, then we set ss to be slightly larger than KK. Otherwise, we use (3.3) to initialize ss. The spark of the matrix Φ\Phi is not easy to calculate accurately; however, an estimation given by [4] is enough for the proposed algorithm.

Algorithm Parameters
SOMP[13] Sparsity KK
MFOCUSS[7] λ=1010\lambda=10^{-10}
MMV-SPG[26] Not required
SNIHT[14] Sparsity KK, MaxIter=103MaxIter=10^{3}
MMV-ADMM-2,1\ell_{2,1}[6] λ=106,ρ=105,MaxIter=103\lambda=10^{-6},\rho=10^{-5},MaxIter=10^{3}
MMV-ADMM-2,0\ell_{2,0} ρ=1,MaxIter=103\rho=1,MaxIter=10^{3}
Table 1: Parameter settings for the tested algorithms

6.2 Validity of MMV-ADMM-2,0\ell_{2,0}

In the first experiment, we observed the recovery quality of our MMV-ADMM-2,0\ell_{2,0}. MMV-ADMM-2,0\ell_{2,0} was applied without a convergence criterion (i.e.i.e., iterates to MaxIter=103MaxIter=10^{3}) on datasets of size N=500,M=150,K=50,J=10N=500,M=150,K=50,J=10 over 10 random repetitions. As shown in Table 2, the RMSERMSE of MMV-ADMM-2,0\ell_{2,0} is close to the computer precision, and the running time is less than 1 s; this indicates that the proposed algorithm can recover the sparse signals precisely and efficiently.

Data RMSE Time(s)
1 8.9904e-16 0.6610
2 9.1155e-16 0.6290
3 9.1562e-16 0.5604
4 8.4978e-16 0.5965
5 9.1531e-16 0.6480
6 9.0638e-16 0.6051
7 9.9356e-16 0.5715
8 9.1655e-16 0.5679
9 7.7866e-16 0.6154
10 7.6170e-16 0.6352
Mean 8.8484e-16 0.6090
Std 6.9661e-17 0.0350
Table 2: The average recovery quality over 10 random experiments (N=500,M=150,K=50,J=10N=500,M=150,K=50,J=10)

6.3 Test for convergence criterion

In the second experiment, we tested the convergence criterion (4.7) given by Theorem 5.3. We set N=500,M=150,K=50,J=10N=500,M=150,K=50,J=10. The aim was to observe the average change in the convergence criterion (4.7) in iterations over 10 random repetitions. The average change of the convergence criterion in 200 iterations is shown in Figure 1. All three variables in (4.7) tend to zero and reach an order of magnitude of 16-16 at MaxIter=103MaxIter=10^{3}; this satisfies our analysis in Theorem 5.3.

Refer to caption

Refer to caption

Refer to caption

Figure 1: The average change of convergence criterion in iterations over 10 experiments (N=500,M=150,K=50,J=10N=500,M=150,K=50,J=10)

Then, we studied the influence of the convergence criterion (4.7) on our algorithm. When all three variables in the convergence criterion (4.7) were less than 10610^{-6}, the proposed algorithm was considered to have found an approximate optimal solution, and the iteration was stopped. As shown in Table 3, with the convergence criterion, the proposed algorithm can successfully recover the sparse signals in a short time.

In the following experiments, we applied MMV-ADMM-2,0\ell_{2,0} with the proposed convergence criterion.

data Without convergence criterion With convergence criterion
RMSE Time(s) RMSE Time(s)
1 8.2155e-16 0.5765 9.4005e-10 0.1560
2 9.0799e-16 0.5193 7.4102e-10 0.1179
3 7.6515e-16 0.4940 6.9234e-10 0.1116
4 8.4516e-16 0.5109 8.8110e-10 0.1132
5 8.5083e-16 0.5278 7.4768e-10 0.1269
6 9.4163e-16 0.5697 8.0561e-10 0.1508
7 8.2227e-16 0.5152 6.5987e-10 0.1108
8 8.9286e-16 0.5053 7.4714e-10 0.1069
9 7.6474e-16 0.4863 8.1573e-10 0.0949
10 8.0768e-16 0.4796 7.8128e-10 0.1024
Mean 8.4199e-16 0.5185 7.8118e-10 0.1191
Std 5.8515e-17 0.0324 8.4008e-11 0.0200
Table 3: The average recovery quality over 10 random experiments with and without convergence criterion(N=500,M=150,K=50,J=10N=500,M=150,K=50,J=10)

6.4 Performance with different sparsity KK

In the third experiment, we studied how sparsity influenced the recovery quality of all these algorithms. Set N=500,M=150,J=10N=500,M=150,J=10 and let KK range from 25 to 150 with step size 25. Observe how the percentage of successful recovery changes with KN\frac{K}{N} when applying different algorithms. The experiment results are shown in Figure 2.

Refer to caption

Refer to caption

Figure 2: The recovery quality of different algorithms with different sparsity KK (N=500,M=150,J=10N=500,M=150,J=10)

It is not hard to see that all these algorithms fail to recover the original signals when the sparsity of signals is large (i.e.i.e., KN0.2\frac{K}{N}\geq 0.2). However, when the sparsity is not so large, that is, 0.1KN0.150.1\leq\frac{K}{N}\leq 0.15, the proposed algorithm can successfully recover all signals over 100 experiments, whereas the other algorithms have poor performance. In addition, in the case of successful recovery, all these algorithms are fast enough. Therefore, we can conclude that for the MMV problem, when the original signals are not sparse enough, the proposed algorithm is more suitable.

6.5 Performance with different undersampling rate NM\frac{N}{M}

The main purpose of compressed sensing is to reduce the number of measurements while ensuring good recovery quality. To illustrate that the proposed algorithm has better performance with a larger undersampling rate compared with that of other algorithms, we studied the percentage of successful recovery with different undersampling rates. We set N=500,K=50,J=10N=500,K=50,J=10 and let the undersampling rate NM\frac{N}{M} range from 1.6 to 8 with a step size of 1.6.

Refer to caption

Refer to caption

Figure 3: The recovery quality of different algorithms with different undersampling rate NM\frac{N}{M} (N=500,K=50,J=10N=500,K=50,J=10)

As shown in Figure 3, when the undersampling rate NM6.4\frac{N}{M}\geq 6.4, all algorithms fail to recover the original signals. However, when NM=4.8\frac{N}{M}=4.8, the proposed algorithm can still recover all signals, whereas MFOCUSS can only recover approximately 94% of the signals and other algorithms perform poorly. With regard to the running time, we just need to consider the case of successful recovery. We see that the proposed algorithm has average performance among all algorithms. This implies that the proposed algorithm has a larger undersampling rate compared with that of the other algorithms; this is the key of compressed sensing.

6.6 Performance with different number of sensors JJ

In this experiment, we studied how the recovery quality was affected by the number of sensors JJ. We set N=500,M=150,K=50N=500,M=150,K=50 and let JJ range from 1 (multiplied by 2 per step) to 32.

Refer to caption

Refer to caption

Figure 4: The recovery quality of different algorithms with different number of sensors JJ (N=500,M=150,K=50N=500,M=150,K=50)

Figure 4 shows that when the number of sensors is 1 (i.e.i.e., a SMV problem), all algorithms cannot successfully recover all signals. However, the proposed algorithm can recover approximately 82% of signals, whereas the other algorithms perform poorly. When J2J\geq 2, the proposed algorithm can successfully recover all signals. However, the other algorithms can only recover all signals when J4J\geq 4. The running time of the proposed algorithm is the average of that of all the other algorithms, all of which are fast enough to recover the original signals. This experiment proves that the proposed algorithm performs well even though with a small number of sensors, whereas the other algorithms cannot.

6.7 Performance with different sample dimensions NN

We tested the recovery quality of our algorithm for varying numbers of dimensions. We set J=10J=10 and tested N=100,500,1000,1500,3000N=100,500,1000,1500,3000 with NM=3\frac{N}{M}=3 and KN=0.1\frac{K}{N}=0.1.

Refer to caption

Refer to caption

Figure 5: The recovery quality of different algorithms with different sample dimensions NN (NM=3,KN=0.1,J=10\frac{N}{M}=3,\frac{K}{N}=0.1,J=10)

Figure 5 shows that MFOCUSS, MMV-ADMM-2,1\ell_{2,1}, and the proposed algorithm all perform well irrespective of how the dimension NN changes. MFOCUSS and the proposed algorithm have similar running times; however, the running time of MMV-ADMM-2,1\ell_{2,1} is higher when NN is large. This experiment proves that the proposed algorithm is also suitable for a large number of dimensions.

6.8 Performance with SMW-formula

Finally, we designed an experiment to illustrate the advantage of MMV-ADMM-2,0\ell_{2,0} with the SMW formula when NN is large. We set N=5000,M=1500,K=500,J=10N=5000,M=1500,K=500,J=10 and observe the RMSE and running time over 10 random experiments.

As shown in Table 4, with the SMW-formula, MMV-ADMM-2,0\ell_{2,0}-SMW requires less time to obtain solutions with the same precision as that of MMV-ADMM-2,0\ell_{2,0}.

data MMV-ADMM-2,0\ell_{2,0} MMV-ADMM-2,0\ell_{2,0}-SMW
RMSE Time(s) RMSE Time(s)
1 2.5689e-10 12.5324 2.3941e-10 11.5070
2 2.6148e-10 13.4957 2.4949e-10 12.0098
3 2.5017e-10 13.3999 2.6850e-10 12.7097
4 2.5103e-10 13.8419 2.5707e-10 12.2158
5 2.4778e-10 13.6656 2.4574e-10 12.1301
6 2.4846e-10 13.5132 2.5370e-10 12.0958
7 2.4829e-10 13.5649 2.4438e-10 12.4199
8 2.5644e-10 14.2626 2.5308e-10 12.4762
9 2.6028e-10 13.6527 2.5655e-10 12.3948
10 2.4507e-16 13.8025 2.5998e-10 12.3597
Mean 2.5259e-10 13.5732 2.5279e-10 12.2319
Std 5.7252e-12 0.4394 8.4480e-12 0.3283
Table 4: The average recovery quality over 10 random experiments with and without SMW-formula(N=5000,M=1500,K=500,J=10N=5000,M=1500,K=500,J=10)

7 Conclusions

In this study, we propose MMV-ADMM-2,0\ell_{2,0}, an ADMM algorithm based on 2,0\ell_{2,0}-norm for the MMV problem. Instead of using the relaxation 2,1\ell_{2,1}-norm, we directly solve the 2,0\ell_{2,0}-norm minimization problem; this is the main difference between our study and other works [7, 11, 12]. We prove that the proposed algorithm has global convergence and determine its convergence criterion (4.7). Through numerical simulations, we test the validity of the proposed algorithm and its convergence criterion. The experiment results are consistent with Theorems 5.2 and 5.3. In comparisons against other algorithms, MMV-ADMM-2,0\ell_{2,0} shows better performance with high sparsity KK and few sensors JJ, and it is insensitive to the dimension NN. Moreover, MMV-ADMM-2,0\ell_{2,0} has a larger undersampling rate compared with those of other algorithms, especially MMV-ADMM-2,1\ell_{2,1}, which is the key of compressed sensing.

The proposed algorithm has some drawbacks. Although its running time is roughly the average of those of the other algorithms, its efficiency remains lower than those of SOMP and SNIHT. We propose MMV-ADMM-2,0\ell_{2,0}-SMW to reduce the computational cost, and it runs faster. However, this improvement is not significant until the dimension NN becomes large.

The research in this paper can still be studied through a deep dive. First, Algorithm 1 can be extended to the generalised JSM-2 problem, where multiple vectors are measured by different sensing matrices. And the derivation of the algorithm for the generalized JSM-2 problem is nearly the same as discussions in Section 3 and 4 of this paper. Second, Algorithm 1 can accelerate even further. The most time cost step in the iterations of Algorithm 1 is updating SS by (4.5). The key is to solve a high-dimensional system of positive-definite linear equations. Drawing on recent advances in the field of numerical algebra, SS can be updated more efficiently, thus Algorithm 1 can converge faster. We will focus on the above two points in the future research; give a more efficient algorithm for the MMV problem, and extend it to the generalized JSM-2 problem.

References

  • [1] Ewout van den Berg and Michael P. Friedlander. Theoretical and empirical results for recovery from multiple measurements. IEEE Transactions on Information Theory, 56(5):2516–2527, 2010.
  • [2] D.L. Donoho. Compressed sensing. IEEE Transactions on Information Theory, 52(4):1289–1306, 2006.
  • [3] E.J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Transactions on Information Theory, 52(2):489–509, 2006.
  • [4] David L. Donoho and Michael Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via 1\ell_{1} minimization. Proceedings of the National Academy of Sciences of the United States of America, 100:2197 – 2202, 2003.
  • [5] Scott Saobing Chen, David L. Donoho, and Michael A. Saunders. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput., 20:33–61, 1998.
  • [6] Qing Qu, Nasser M. Nasrabadi, and Trac D. Tran. Abundance estimation for bilinear mixture models via joint sparse and low-rank representation. IEEE Transactions on Geoscience and Remote Sensing, 52(7):4404–4423, 2014.
  • [7] Shane F. Cotter, Bhaskar D. Rao, Kjersti Engan, and Kenneth Kreutz-Delgado. Sparse solutions to linear inverse problems with multiple measurement vectors. IEEE Transactions on Signal Processing, 53:2477–2488, 2005.
  • [8] Irina F. Gorodnitsky, John S. George, and B.D. Rao. Neuromagnetic source imaging with focuss: a recursive weighted minimum norm algorithm. Electroencephalography and clinical neurophysiology, 95 4:231–51, 1995.
  • [9] Dmitry M. Malioutov, Müjdat Çetin, and Alan S. Willsky. Source localization by enforcing sparsity through a laplacian prior: an svd-based approach. IEEE Workshop on Statistical Signal Processing, 2003, pages 573–576, 2004.
  • [10] S.F. Cotter and B.D. Rao. Sparse channel estimation via matching pursuit with application to equalization. IEEE Transactions on Communications, 50(3):374–377, 2002.
  • [11] Jie Chen and Xiaoming Huo. Theoretical results on sparse representations of multiple-measurement vectors. IEEE Transactions on Signal Processing, 54(12):4634–4643, 2006.
  • [12] Yonina C. Eldar and Moshe Mishali. Robust recovery of signals from a structured union of subspaces. IEEE Transactions on Information Theory, 55(11):5302–5316, 2009.
  • [13] Joel A. Tropp, Anna C. Gilbert, and Martin Strauss. Algorithms for simultaneous sparse approximation. part i: Greedy pursuit. Signal Process., 86:572–588, 2006.
  • [14] Jeffrey D. Blanchard, Michael Cermak, David Hanle, and Yirong Jing. Greedy algorithms for joint sparse recovery. IEEE Transactions on Signal Processing, 62(7):1694–1704, 2014.
  • [15] Stephen P. Boyd, Neal Parikh, Eric King wah Chu, Borja Peleato, and Jonathan Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn., 3:1–122, 2011.
  • [16] Wen Zaiwen, Hu Jiang, Li Yongfeng, and Liu Haoyang. Optimization: Modeling, Algorithm and Theory(in Chinese). Higher Education Press, 2020.
  • [17] Stephen P. Boyd and Lieven Vandenberghe. Convex optimization. IEEE Transactions on Automatic Control, 51:1859–1859, 2004.
  • [18] Attouch H, Bolte J, and Svaiter B. F. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized gauss-seidel methods. Mathematical Programming, 2013.
  • [19] Boris S. Mordukhovich. Variational analysis and generalized differentiation I: Basic theory. Grundlehren der mathematischen Wissenschaften. Springer Berlin, Heidelberg, 2006.
  • [20] Bolte J, Sabach S, and Teboulle M. Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Math. Program. 146, 459–494, 2014.
  • [21] Simon Foucart. Hard thresholding pursuit: An algorithm for compressive sensing. SIAM J. Numer. Anal., 49:2543–2563, 2011.
  • [22] Jack Sherman and Winifred J. Morrison. Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Annals of Mathematical Statistics, 21:124–127, 1950.
  • [23] M Woodbury. Inverting Modified Matrices. Statistical Research Group, Princeton University, Princeton, 1950.
  • [24] Ke Guo, Deren Han, and Tingting Wu. Convergence of alternating direction method for minimizing sum of two nonconvex functions with linear constraints. International Journal of Computer Mathematics, 94:1653 – 1669, 2017.
  • [25] Oleg P. Burdakov, Christian Kanzow, and Alexandra Schwartz. Mathematical programs with cardinality constraints: Reformulation by complementarity-type conditions and a regularization method. SIAM J. on Optimization, 26(1):397–425, jan 2016.
  • [26] Ewout van den Berg and Michael P. Friedlander. Sparse optimization with least-squares constraints. SIAM J. Optim., 21:1201–1229, 2011.