This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Group Sparse Optimization for Inpainting of Random Fields on the Sphere 111This work was partly supported by Hong Kong Research Grant Council PolyU15300120 and the Shenzhen Research Institute of Big Data Grant 2019ORF01002.

Chao Li222Department of Applied Mathematics, The Hong Kong Polytechnic University([email protected]).  and Xiaojun Chen 333Department of Applied Mathematics, The Hong Kong Polytechnic University([email protected]).
Abstract

We propose a group sparse optimization model for inpainting of a square-integrable isotropic random field on the unit sphere, where the field is represented by spherical harmonics with random complex coefficients. In the proposed optimization model, the variable is an infinite-dimensional complex vector and the objective function is a real-valued function defined by a hybrid of the 2\ell_{2} norm and non-Liptchitz p(0<p<1)\ell_{p}(0<p<1) norm that preserves rotational invariance property and group structure of the random complex coefficients. We show that the infinite-dimensional optimization problem is equivalent to a convexly-constrained finite-dimensional optimization problem. Moreover, we propose a smoothing penalty algorithm to solve the finite-dimensional problem via unconstrained optimization problems. We provide an approximation error bound of the inpainted random field defined by a scaled KKT point of the constrained optimization problem in the square-integrable space on the sphere with probability measure. Finally, we conduct numerical experiments on band-limited random fields on the sphere and images from Earth topography data to show the promising performance of the smoothing penalty algorithm for inpainting of random fields on the sphere.
Keywords: group sparse optimization; exact penalty; smoothing method; random field.

1 Introduction

Let (Ω,,P)(\Omega,\mathcal{F},P) be a probability space, and let 𝔅(𝕊2)\mathfrak{B}(\mathbb{S}^{2}) be the Borel sigma algebra on the unit sphere 𝕊2:={x3:x=1}\mathbb{S}^{2}:=\{{\rm{x}}\in\mathbb{R}^{3}:\|\rm{x}\|=1\}, where \|\cdot\| is the 2\ell_{2} norm. A real-valued random field on 𝕊2\mathbb{S}^{2} is an 𝔅(𝕊2)\mathcal{F}\otimes\mathfrak{B}(\mathbb{S}^{2})-measurable function T:Ω×𝕊2T:\Omega\times\mathbb{S}^{2}\rightarrow\mathbb{R}, and it is said to be 22-weakly isotropic if the expected value and covariance of TT are rotationally invariant (see [14]). It is known that a 22-weakly isotropic random field on the sphere has the following Karhunen-Loève (K-L) expansion [23, 25],

T(ω,x)=l=0m=llαl,m(ω)Yl,m(x),x𝕊2,ωΩ,T(\omega,{\rm{x}})=\sum_{l=0}^{\infty}\sum_{m=-l}^{l}\alpha_{l,m}(\omega)Y_{l,m}({\rm{x}}),\quad{\rm{x}}\in\mathbb{S}^{2},\quad\omega\in\Omega, (1.1)

where αl,m(ω)=𝕊2T(ω,x)Yl,m(x)¯𝑑σ(x)\alpha_{l,m}(\omega)=\int_{\mathbb{S}^{2}}T(\omega,{\rm{x}})\overline{Y_{l,m}({\rm{x}})}d\sigma({\rm{x}})\in\mathbb{C} are random spherical harmonic coefficients of T(ω,x)T(\omega,{\rm{x}}), Yl,mY_{l,m}, for m=l,,lm=-l,\ldots,l, l=0,1,2,l=0,1,2,\ldots are spherical harmonics with degree ll and order mm, and σ\sigma is the surface measure on 𝕊2\mathbb{S}^{2} satisfying σ(𝕊2)=4π.\sigma(\mathbb{S}^{2})=4\pi. For convenient, let

α(ω)=(α0,0(ω),α1,1(ω),α1,0(ω),α1,1(ω)3,,α1,l(ω),,αl,l(ω)2l+1,)T\alpha(\omega)=(\alpha_{0,0}(\omega),\underbrace{\alpha_{1,-1}(\omega),\alpha_{1,0}(\omega),\alpha_{1,1}(\omega)}_{3},\ldots,\underbrace{\alpha_{1,-l}(\omega),\ldots,\alpha_{l,l}(\omega)}_{2l+1},\ldots)^{T}

denote the coefficient vector of an isotropic random field T(ω,x)T(\omega,\mathrm{x}). The infinite-dimensional coefficient vector α(ω)\alpha(\omega) can be grouped according to degrees ll and written in the following group structure

α(ω)=(α0T(ω),α1T(ω),,αlT(ω),)T,\alpha(\omega)=(\alpha_{0\cdot}^{T}(\omega),\alpha_{1\cdot}^{T}(\omega),\ldots,\alpha^{T}_{l\cdot}(\omega),\ldots)^{T}, (1.2)

where

αl(ω)=(αl,l(ω),,αl,0(ω),,αl,l(ω))T2l+1,l0.\alpha_{l\cdot}(\omega)=(\alpha_{l,-l}(\omega),\ldots,\alpha_{l,0}(\omega),\ldots,\alpha_{l,l}(\omega))^{T}\in\mathbb{C}^{2l+1},\quad\quad l\geq 0.

From [14], we know that for any given degree l1l\geq 1 and any ωΩ\omega\in\Omega, the sum

m=ll|αl,m(ω)|2=αl(ω)2\sum_{m=-l}^{l}|\alpha_{l,m}(\omega)|^{2}=\|\alpha_{l\cdot}(\omega)\|^{2}

is rotationally invariant, while m=ll|αl,m(ω)|\sum_{m=-l}^{l}|\alpha_{l,m}(\omega)| is not invariant under rotation of the coordinate axes. Moreover, in [14] the angular power spectrum is given by

Cl=12l+1𝔼[αl(ω)2]C_{l}=\frac{1}{2l+1}\mathbb{E}[\|\alpha_{l\cdot}(\omega)\|^{2}]

if the expected value 𝔼[T(ω,x)]=0\mathbb{E}[T(\omega,{\rm{x}})]=0 for all x𝕊2{\rm{x}}\in\mathbb{S}^{2}. In the study of isotropic random fields [12, 23, 25], the angular power spectrum plays an important role since it contains full information of the covariance of the field and provides characterization of the field. Hence, considering the group structure (1.2) of the coefficients of an isotropic random field is essential.

Sparse representation of a random field TT is an approximation of TT with few non-zero elements αl,m(ω)\alpha_{l,m}(\omega) of α(ω)\alpha(\omega). In group sparse representation, instead of considering elements individually, we seek an approximation of TT with few non-zero groups αl(ω)\alpha_{l\cdot}(\omega) of α(ω)\alpha(\omega), i.e., few non-zero entries of the following vector

(α0(ω),α1(ω),,αl(ω),,)T.(\|\alpha_{0\cdot}(\omega)\|,\,\|\alpha_{1\cdot}(\omega)\|,\ldots,\|\alpha_{l\cdot}(\omega)\|,\ldots,)^{T}.

In recent years, sparse representation of isotropic random fields has been extensively studied. In [6] the authors studied the sparse representation of random fields via an 1\ell_{1}-regularized minimization problem. In [14], the authors considered isotropic group sparse representation of Gaussian and isotropic random fields on the sphere through an unconstrained optimization problem with a weighted 2,1\ell_{2,1} norm, which is an example of group Lasso [34]. In [24], the authors proposed a non-Lipschitz regularization problem with a weighted 2,p\ell_{2,p} (0<p<10<p<1) norm for the group sparse representation of isotropic random fields. The non-Lipschitz regularizer also preserves the rotational invariance property of αl(ω)2\|\alpha_{l\cdot}(\omega)\|^{2} for any given degree l1l\geq 1 and any ωΩ\omega\in\Omega.

Isotropic random fields on the sphere have many applications [5, 20, 26, 28, 31], especially in the study of the Cosmic Microwave Background (CMB) analysis [1, 30]. However, the true spherical random field usually presents masked regions and missing observations. Many inpainting methods [1, 13, 30, 32] have been proposed for recovering the true field based on sparse representation. However, they did not take the group structure (1.2) of the coefficients into consideration.

Group sparse optimization models have been successfully used in signal processing, imaging sciences and predictive analysis [3, 11, 18, 17, 27]. In this paper, we propose a constrained group sparse optimization model for inpainting of isotropic random fields on the sphere based on group structure (1.2). Moreover, to recover the true random field by using information only on a subset Γ𝕊2\Gamma\subseteq\mathbb{S}^{2} which contains an open set, we need the unique continuation property [19] of any realization of a random field, that is, for any fixed ωΩ\omega\in\Omega, if the value of a field equals to zero on Γ\Gamma, then the random field is identically zero on the sphere (see Appendix A for more details).

Let L2(Ω×𝕊2)L_{2}(\Omega\times\mathbb{S}^{2}) be the L2L_{2} space on the product space Ω×𝕊2\Omega\times\mathbb{S}^{2} with product measure PσP\otimes\sigma and the induced norm L2(Ω×𝕊2)=𝔼[L2(𝕊2)]\|\cdot\|_{L_{2}(\Omega\times\mathbb{S}^{2})}=\mathbb{E}[\|\cdot\|_{L_{2}(\mathbb{S}^{2})}], where L2(𝕊2)L_{2}(\mathbb{S}^{2}) is the space of square-integrable functions over 𝕊2\mathbb{S}^{2} endowed with the inner product

f,gL2(𝕊2)=𝕊2f(x)g(x)¯𝑑σ(x),f,gL2(𝕊2),\langle f,g\rangle_{L_{2}(\mathbb{S}^{2})}=\int_{\mathbb{S}^{2}}f({\rm{x}})\overline{g({\rm{x}})}d\sigma({\rm{x}}),\quad\quad\forall f,g\in L_{2}(\mathbb{S}^{2}),

and the induced L2L_{2}-norm fL2(𝕊2)=(f,fL2(𝕊2))12\|f\|_{L_{2}(\mathbb{S}^{2})}=(\langle f,f\rangle_{L_{2}(\mathbb{S}^{2})})^{\frac{1}{2}}. By Fubini’s theorem, for ωΩ\omega\in\Omega, T(ω,x)L2(𝕊2)T(\omega,{\rm{x}})\in L_{2}(\mathbb{S}^{2}), PP-a.s., where “PP-a.s.” stands for almost surely with probability 11. For brevity, we write T(ω,x)T(\omega,{\rm{x}}) as T(x)T({\rm{x}}) or TT if no confusion arises.

Let the observed field be given by

T:=𝒜(T)+Δ,T^{\circ}:=\mathcal{A}(T^{*})+\Delta, (1.3)

where TL2(Ω×𝕊2)T^{*}\in L_{2}(\Omega\times\mathbb{S}^{2}) is the true isotropic random field that we aim to recover, Δ:Ω×𝕊2\Delta:\Omega\times\mathbb{S}^{2}\rightarrow\mathbb{R} is observational noise and 𝒜:L2(Ω×𝕊2)L2(Ω×𝕊2)\mathcal{A}:L_{2}(\Omega\times\mathbb{S}^{2})\rightarrow L_{2}(\Omega\times\mathbb{S}^{2}) is an inpainting operator defined by

𝒜(T(x))={T(x)ifxΓ0ifx𝕊2Γ,\mathcal{A}(T({\rm{x}}))=\left\{\begin{array}[]{cl}T({\rm{x}})&\,{\rm if}\ {\rm{x}}\in\Gamma\\ 0&\,{\rm if}\ {\rm{x}}\in\mathbb{S}^{2}\setminus\Gamma,\\ \end{array}\right. (1.4)

where 𝕊2Γ𝕊2\mathbb{S}^{2}\setminus\Gamma\subset\mathbb{S}^{2} is a nonempty inpainting area and the set Γ𝕊2\Gamma\subset\mathbb{S}^{2} has an open subset. In our optimization model, we consider the observed field TT^{\circ} as one realization of a random field. For notational simplicity, let

α=(α0T,α1T,,αlT,)T\alpha=(\alpha^{T}_{0\cdot},\alpha^{T}_{1\cdot},\ldots,\alpha^{T}_{l\cdot},\ldots)^{T} (1.5)

denote the spherical harmonic coefficient vector of an isotropic random field TT for a fixed ωΩ\omega\in\Omega, where αl=(αl,l,,αl,0,,αl,l)T2l+1\alpha_{l\cdot}=(\alpha_{l,-l},\ldots,\alpha_{l,0},\ldots,\alpha_{l,l})^{T}\in\mathbb{C}^{2l+1}, l0:={0,1,2,}l\in\mathbb{N}_{0}:=\{0,1,2,\ldots\}.

By Parseval’s theorem, we have

TL2(𝕊2)2=l=0m=ll|αl,m|2=l=0αl2<.\|T\|^{2}_{L_{2}(\mathbb{S}^{2})}=\sum\limits_{l=0}^{\infty}\sum\limits_{m=-l}^{l}|\alpha_{l,m}|^{2}=\sum\limits_{l=0}^{\infty}\|\alpha_{l\cdot}\|^{2}<\infty.

Hence the sequence {αl}l02(0)\{\|\alpha_{l\cdot}\|\}_{l\in\mathbb{N}_{0}}\in\ell^{2}(\mathbb{N}_{0}), where 2(0)\ell^{2}(\mathbb{N}_{0}) is the space of square-summable sequences. We write α2\alpha\in\ell^{2} to indicate {αl}l02(0)\{\|\alpha_{l\cdot}\|\}_{l\in\mathbb{N}_{0}}\in\ell^{2}(\mathbb{N}_{0}).

The number of non-zero groups of α\alpha with group structure (1.5) is calculated by

α2,0=l=0αl0,\|\alpha\|_{2,0}=\sum\limits_{l=0}^{\infty}\|\alpha_{l\cdot}\|^{0},

where αl0={1ifαl>00otherwise,\|\alpha_{l\cdot}\|^{0}=\left\{\begin{array}[]{cc}1&\textrm{if}\,\,\|\alpha_{l\cdot}\|>0\\ 0&\textrm{otherwise,}\end{array}\right. and α2,0\|\alpha\|_{2,0} is called 2,0\ell_{2,0} norm of α\alpha. The 2,0\ell_{2,0} norm is discontinuous and TL2(𝕊2)2<\|T\|^{2}_{L_{2}(\mathbb{S}^{2})}<\infty does not ensure α2,0<\|\alpha\|_{2,0}<\infty. In [14], the authors used a weighted 2,1\ell_{2,1} norm of α\alpha defined by

α2,1=l=0βlαl,\|\alpha\|_{2,1}=\sum\limits_{l=0}^{\infty}\beta_{l}\|\alpha_{l\cdot}\|,

where βl>0\beta_{l}>0. It is easy to see that limp0αlp=αl0\lim_{p\downarrow 0}\|\alpha_{l\cdot}\|^{p}=\|\alpha_{l\cdot}\|^{0}. The 2,p\ell_{2,p} norm (0<p<1)(0<p<1) has been widely used in sparse approximation [11, 18]. In this paper, we use a weighted 2,p\ell_{2,p} norm of α\alpha. Let

βp:={α2:α2,pp:=l=0βlαlp<}\ell^{p}_{\beta}:=\left\{\alpha\in\ell^{2}:\|\alpha\|_{2,p}^{p}:=\sum_{l=0}^{\infty}\beta_{l}{{\|\alpha_{l\cdot}\|^{p}}}<\infty\right\}

be a weighted p\ell^{p} (0<p<10<p<1) space with positive weights β0=1\beta_{0}=1 and βl=ηllp\beta_{l}=\eta^{l}l^{p} for l1l\geq 1, where η>1\eta>1 is a constant. For a fixed ωΩ\omega\in\Omega, we consider the following constrained optimization problem

minαβpα2,pps.t.𝒜(T(x))T(x)L2(𝕊2)2ϱ,\begin{array}[]{cl}\min\limits_{\alpha\in\ell^{p}_{\beta}}&\|\alpha\|_{2,p}^{p}\\ {\rm{s.t.}}&\|\mathcal{A}(T({\rm{x}}))-T^{\circ}({\rm{x}})\|^{2}_{L_{2}(\mathbb{S}^{2})}\leq\varrho,\end{array} (1.6)

where ϱ>𝕊2Γ|T(x)|2𝑑σ(x)\varrho>\int_{\mathbb{S}^{2}\setminus\Gamma}|T^{\circ}({\rm{x}})|^{2}d\sigma(\rm{x}).

A feasible field of problem (1.6) takes the form

T(x)=l=0m=llαl,mYl,m(x),x𝕊2,αβp.T({\rm{x}})=\sum_{l=0}^{\infty}\sum_{m=-l}^{l}\alpha_{l,m}Y_{l,m}({\rm{x}}),\quad{\rm{x}}\in\mathbb{S}^{2},\quad\alpha\in\ell_{\beta}^{p}.

Since TL2(𝕊2)T^{\circ}\in L_{2}(\mathbb{S}^{2}), for ϵ=ϱ𝕊2Γ|T(x)|2𝑑σ(x)\epsilon=\varrho-\int_{\mathbb{S}^{2}\setminus\Gamma}|T^{\circ}({\rm{x}})|^{2}d\sigma(\rm{x}), there is a finite number LL such that T(x)TL(x)L2(𝕊2)2<ϵ\|T^{\circ}(\mathrm{x})-T^{\circ}_{L}(\mathrm{x})\|^{2}_{L_{2}(\mathbb{S}^{2})}<\epsilon, where TL(x)=l=0Lm=llbl,mYl,m(x)T^{\circ}_{L}(\mathrm{x})=\sum_{l=0}^{L}\sum_{m=-l}^{l}b_{l,m}Y_{l,m}(\mathrm{x}) and bl,m=𝕊2T(x)Yl,m(x)¯𝑑σ(x)b_{l,m}=\int_{\mathbb{S}^{2}}T^{\circ}(\mathrm{x})\overline{Y_{l,m}(\mathrm{x})}d\sigma(\mathrm{x}). Let bl,m=0b_{l,m}=0 for l=L+1,l=L+1,\ldots, m=l,,lm=-l,\ldots,l, then (b0,0,,bL,L,0,0,0,)Tβp(b_{0,0},\ldots,b_{L,L},0,0,0,\ldots)^{T}\in\ell_{\beta}^{p}. From the definition of 𝒜\mathcal{A}, we have

𝒜(TL(x))T(x)L2(𝕊2)2\displaystyle\|\mathcal{A}(T_{L}^{\circ}({\rm{x}}))-T^{\circ}({\rm{x}})\|^{2}_{L_{2}(\mathbb{S}^{2})} =\displaystyle= 𝕊2Γ|T(x)|2𝑑σ(x)+Γ|T(x)TL(x)|2𝑑σ(x)\displaystyle\int_{\mathbb{S}^{2}\setminus\Gamma}|T^{\circ}({\rm{x}})|^{2}d\sigma(\mathrm{x})+\int_{\Gamma}|T^{\circ}({\rm{x}})-T_{L}^{\circ}({\rm{x}})|^{2}d\sigma(\rm{x})
<\displaystyle< 𝕊2Γ|T(x)|2𝑑σ(x)+ϵ=ϱ.\displaystyle\int_{\mathbb{S}^{2}\setminus\Gamma}|T^{\circ}({\rm{x}})|^{2}d\sigma(\mathrm{x})+\epsilon=\varrho.

Thus, the feasible set of problem (1.6) is nonempty. Moreover the objective function of problem (1.6) is continuous and level-bounded. Hence an optimal solution of problem (1.6) exists.

Since α=0\alpha=0 implies that α2,pp=0\|\alpha\|_{2,p}^{p}=0, we assume T(x)L2(𝕊2)2>ϱ\|T^{\circ}({\rm{x}})\|^{2}_{L_{2}(\mathbb{S}^{2})}>\varrho, which implies that T(x)=0T({\rm{x}})=0 is not a feasible field of problem (1.6).

Our main contributions are summarized as follows.

  • Based on the K-L expansion, the constraint of problem (1.6) can be written in a discrete form with variables αβp\alpha\in\ell^{p}_{\beta}. We derive a lower bound for the 2\ell_{2} norm of nonzero groups of scaled KKT points of (1.6). Based on the lower bound, we prove that the infinite-dimensional problem (1.6) is equivalent to a finite-dimensional optimization problem.

  • We propose a penalty method for solving the finite-dimensional optimization problem via unconstrained optimization problems. We establish the exact penalization results regarding local minimizers and ε\varepsilon-minimizers. Moreover, we propose a smoothing penalty algorithm and prove that the sequence generated by the algorithm is bounded and any accumulation point of the sequence is a scaled KKT point of the finite-dimensional optimization problem.

  • We give the approximation error of the random field represented by scaled KKT points of the finite-dimensional optimization problem in L2(Ω×𝕊2).L_{2}(\Omega\times\mathbb{S}^{2}).

The rest of this paper is organised as follows. In Section 2, we prove that the infinite-dimensional discrete optimization problem is equivalent to a finite-dimensional problem. In Section 3, we present the penalty method and give exact penalization results. In Section 4, we discuss optimality conditions of the finite-dimensional optimization problem and its penalty problem. Moreover, we propose a smoothing penalty algorithm and establish its convergence. In Section 5, we give the approximation error in L2(Ω×𝕊2).L_{2}(\Omega\times\mathbb{S}^{2}). In Section 6, we conduct numerical experiments on band-limited random fields and images from Earth topography data to compare our approach with some existing methods on the quality of the solutions and inpainted images. Finally, we give conclusion remarks in Section 7.

2 Discrete formulation of problem (1.6)

In this section, we propose the discrete formulation of problem (1.6) and prove that the discrete problem is equivalent to a finite-dimensional problem (2.5).

Based on the definition of 𝒜\mathcal{A} and the spherical harmonic expansion of TT, we obtain that

𝒜(T(x))T(x)L2(𝕊2)2=𝕊2|𝒜(T(x))T(x)|2𝑑σ(x)=Γ|T(x)T(x)|2𝑑σ(x)+𝕊2Γ|T(x)|2𝑑σ(x)=Γ|l=0m=llαl,mYl,m(x)|2𝑑σ(x)2Re(ΓT(x)(l=0m=llα¯l,mYl,m(x)¯)𝑑σ(x))+𝕊2|T(x)|2𝑑σ(x)=αTYα¯2Re(l=0m=llα¯l,mΓT(x)Yl,m(x)¯𝑑σ(x))+𝕊2|T(x)|2𝑑σ(x)=αHYα2Re(αHα)+c,\displaystyle\begin{split}&\|\mathcal{A}(T({\rm{x}}))-T^{\circ}({\rm{x}})\|^{2}_{L_{2}(\mathbb{S}^{2})}\\ &=\int_{\mathbb{S}^{2}}|\mathcal{A}(T({\rm{x}}))-T^{\circ}({\rm{x}})|^{2}d\sigma({\rm{x}})=\int_{\Gamma}|T({\rm{x}})-T^{\circ}({\rm{x}})|^{2}d\sigma({\rm{x}})+\int_{\mathbb{S}^{2}\setminus\Gamma}|T^{\circ}({\rm{x}})|^{2}d\sigma({\rm{x}})\\ &=\int_{\Gamma}\bigg{|}\sum\limits_{l=0}^{\infty}\sum\limits_{m=-l}^{l}\alpha_{l,m}Y_{l,m}({\rm{x}})\bigg{|}^{2}d\sigma({\rm{x}})-2\textrm{Re}\left(\int_{\Gamma}T^{\circ}({\rm{x}})\left(\sum\limits_{l=0}^{\infty}\sum\limits_{m=-l}^{l}\bar{\alpha}_{l,m}\overline{Y_{l,m}({\rm{x}})}\right)d\sigma({\rm{x}})\right)\\ &\quad+\int_{\mathbb{S}^{2}}|T^{\circ}({\rm{x}})|^{2}d\sigma({\rm{x}})\\ &=\alpha^{T}Y\bar{\alpha}-2\textrm{Re}\left(\sum\limits_{l=0}^{\infty}\sum\limits_{m=-l}^{l}\bar{\alpha}_{l,m}\int_{\Gamma}T^{\circ}({\rm{x}})\overline{Y_{l,m}({\rm{x}})}d\sigma({\rm{x}})\right)+\int_{\mathbb{S}^{2}}|T^{\circ}({\rm{x}})|^{2}d\sigma({\rm{x}})\\ &=\alpha^{H}Y\alpha-2\textrm{Re}(\alpha^{H}\alpha^{\circ})+c,\end{split} (2.1)

where YY is an infinite-dimensional matrix with (Y)l2+l+m+1,l2+l+m+1=ΓYl,m(x)Yl,m(x)¯𝑑σ(x),(Y)_{l^{2}+l+m+1,l^{\prime 2}+l^{\prime}+m^{\prime}+1}=\int_{\Gamma}Y_{l,m}({\rm{x}})\overline{Y_{l^{\prime},m^{\prime}}({\rm{x}})}d\sigma({\rm{x}})\in\mathbb{C}, α=((α0)T,,(αl)T,)T\alpha^{\circ}=((\alpha_{0\cdot}^{\circ})^{T},\ldots,(\alpha^{\circ}_{l\cdot})^{T},\ldots)^{T} with αl2l+1\alpha^{\circ}_{l\cdot}\in\mathbb{C}^{2l+1}, αl,m=ΓT(x)Yl,m(x)¯𝑑σ(x)\alpha^{\circ}_{l,m}=\int_{\Gamma}T^{\circ}({\rm{x}})\overline{Y_{l,m}({\rm{x}})}d\sigma({\rm{x}}) and c=𝕊2|T(x)|2𝑑σ(x).c=\int_{\mathbb{S}^{2}}|T^{\circ}({\rm{x}})|^{2}d\sigma({\rm{x}}).

The minimization problem (1.6) can be written as the following optimization problem

minαβpα2,pps.t.αHYα2Re(αHα)+cϱ.\begin{array}[]{cl}\min\limits_{\alpha\in\ell_{\beta}^{p}}&\|\alpha\|_{2,p}^{p}\vspace{1mm}\\ {\rm{s.t.}}&\alpha^{H}Y\alpha-2\textrm{Re}(\alpha^{H}\alpha^{\circ})+c\leq\varrho.\end{array} (2.2)

From the setting of ϱ\varrho in problem (1.6) and Theorem 8, the feasible set of (2.2) has an interior point and bounded. The following assumption follows from our assumption on problem (1.6).

Assumption 1.

The feasible set of problem (2.2) does not contain α=0\alpha=0.

Note that the objective function and constraint function in (2.2) are real-valued functions with complex variables. Thus, following [24], we apply the Wirtinger calculus (See Appendix B for more details) in this paper.

Since the objective function in problem (2.2) is not Lipschitz continuous at points containing zero groups. We shall consider first-order stationary points of (2.2) and extend the definition of scaled KKT points for finite-dimensional optimization problem with real variables in [8, 10, 29].

Definition 1.

We call αβp\alpha^{*}\in\ell_{\beta}^{p} a scaled KKT point of (2.2), if there exists ν\nu\in\mathbb{R} such that

pβlαlpαl+2ναl2(Ylααl)=0,l0,ν((α)HYα2Re((α)Hα)+cϱ)=0,(α)HYα2Re((α)Hα)+cϱ,ν0.\begin{split}p\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p}\alpha^{*}_{l\cdot}+2\nu\|\alpha^{*}_{l\cdot}\|^{2}(Y_{l\cdot}\alpha^{*}-\alpha_{l\cdot}^{\circ})=0,&\quad\forall\,l\in\mathbb{N}_{0},\\ \nu((\alpha^{*})^{H}Y\alpha^{*}-2{\rm{Re}}((\alpha^{*})^{H}\alpha^{\circ})+c-\varrho)=0,&\\ (\alpha^{*})^{H}Y\alpha^{*}-2{\rm{Re}}((\alpha^{*})^{H}\alpha^{\circ})+c\leq\varrho,&\\ \nu\geq 0.&\end{split} (2.3)

From (2.3), for any nonzero vector αl2l+1\alpha_{l\cdot}^{*}\in\mathbb{C}^{2l+1}, we have

pβlαlp2αl+2ν(Ylααl)=0,p\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p-2}\alpha^{*}_{l\cdot}+2\nu(Y_{l\cdot}\alpha^{*}-\alpha_{l\cdot}^{\circ})=0,

and ν>0\nu>0. Then,

pβlαlp1=2νYlααl2νYαα.p\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p-1}=2\nu\|Y_{l\cdot}\alpha^{*}-\alpha_{l\cdot}^{\circ}\|\leq 2\nu\|Y\alpha^{*}-\alpha^{\circ}\|.

By definition of YY and feasibility of α\alpha^{*}, there exists c~>0\tilde{c}>0 such that Yααc~\|Y\alpha^{*}-\alpha^{\circ}\|\leq\tilde{c}. Hence for any nonzero vector αl2l+1\alpha_{l\cdot}^{*}\in\mathbb{C}^{2l+1}, we have

αl(pβl2νc~)11p.\|\alpha^{*}_{l\cdot}\|\geq\left(\frac{p\beta_{l}}{2\nu\tilde{c}}\right)^{\frac{1}{1-p}}. (2.4)

By definition of βl\beta_{l}, l0l\in\mathbb{N}_{0}, we obtain that

>α2,pp\displaystyle\infty>\|\alpha^{*}\|_{2,p}^{p} =\displaystyle= l=0βlαlp={l0:αl0}βlαlp(p2νc~)p1p{l0:αl0}(ηllp)11p,\displaystyle\sum\limits_{l=0}^{\infty}\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p}\,=\sum\limits_{\{l\in\mathbb{N}_{0}:\,\|\alpha^{*}_{l\cdot}\|\neq 0\}}\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p}\geq\left(\frac{p}{2\nu\tilde{c}}\right)^{\frac{p}{1-p}}\sum\limits_{\{l\in\mathbb{N}_{0}:\,\|\alpha^{*}_{l\cdot}\|\neq 0\}}(\eta^{l}l^{p})^{\frac{1}{1-p}},

which implies that {l0:αl0}\{l\in\mathbb{N}_{0}:\|\alpha^{*}_{l\cdot}\|\neq 0\} is a finite set. Thus, the number of nonzero vectors αl2l+1\alpha_{l\cdot}^{*}\in\mathbb{C}^{2l+1} of a scaled KKT point of problem (2.2) is finite and there exists an L0L\in\mathbb{N}_{0} such that L=max{l0:αl0}L=\max\{l\in\mathbb{N}_{0}:\|\alpha^{*}_{l\cdot}\|\neq 0\}. Therefore, the infinite-dimensional problem (2.2) can be truncated to a finite-dimensional problem and approximated in a finite-dimensional space.

For notational simplicity, we truncate αβp\alpha\in\ell_{\beta}^{p} to α=(α0,0,α1T,,αLT)Td\alpha=(\alpha_{0,0},\alpha_{1\cdot}^{T},\ldots,\alpha_{L\cdot}^{T})^{T}\in\mathbb{C}^{d}, where d:=(L+1)2d:=(L+1)^{2}. We use α^d\hat{\alpha}^{\circ}\in\mathbb{C}^{d} to denote the truncated vector whose elements are the first dd elements of α\alpha^{\circ} and Y^d×d\hat{Y}\in\mathbb{C}^{d\times d} to denote the leading principal submatrix of order dd of YY. Since Γ\Gamma has an open subset, we have zHY^z>0z^{H}\hat{Y}z>0 for any nonzero zdz\in\mathbb{C}^{d}. Thus, the matrix Y^\hat{Y} is positive definite.

The truncated finite-dimensional problem of problem (2.2) has the following version

minαdΦ(α):=l=0Lβlαlps.t.αHY^α2Re(αHα^)+cϱ.\begin{array}[]{cl}\min\limits_{\alpha\in\mathbb{C}^{d}}&\Phi(\alpha):=\sum_{l=0}^{L}\beta_{l}\|\alpha_{l\cdot}\|^{p}\vspace{1mm}\\ {\rm{s.t.}}&\alpha^{H}\hat{Y}\alpha-2\textrm{Re}(\alpha^{H}\hat{\alpha}^{\circ})+c\leq\varrho.\end{array} (2.5)

By the definition of ϱ\varrho, the feasible set of (2.5) is nonempty and has an interior point. The objective function Φ(α)\Phi(\alpha) is continuous, nonnegative with Φ(0)=0\Phi(0)=0, and differentiable except at points containing zero groups. The penalty formulation for (2.5) is

minαdFλ(α):=Φ(α)+λ(αHY^α2Re(αHα^)+cϱ)+,\min\limits_{\alpha\in\mathbb{C}^{d}}\quad F_{\lambda}(\alpha):=\Phi(\alpha)+\lambda(\alpha^{H}\hat{Y}\alpha-2\textrm{Re}(\alpha^{H}\hat{\alpha}^{\circ})+c-\varrho)_{+}, (2.6)

for some λ>0\lambda>0, where ()+:=max{,0}(\cdot)_{+}:=\max\{\cdot,0\}. For any nontrivial solution α\alpha^{*} of (2.6), we have

0<Fλ(α)=minαdFλ(α)<Fλ(0)=λ(cϱ)+=λ(cϱ)<λc.0<F_{\lambda}(\alpha^{*})=\min\limits_{\alpha\in\mathbb{C}^{d}}F_{\lambda}(\alpha)<F_{\lambda}(0)=\lambda(c-\varrho)_{+}=\lambda(c-\varrho)<\lambda c. (2.7)

In Sections 3-4, we focus on problems (2.5) and (2.6).

3 Exact penalization

In this section, we consider the relationship between problems (2.5) and (2.6). We first give some notations. For a closed set SnS\subset\mathbb{C}^{n}, dist(z,S)=infzSzz{\rm{dist}}(z,S)=\inf_{z^{\prime}\in S}\|z-z^{\prime}\| denotes the distance from a point znz\in\mathbb{C}^{n} to SS and 𝐁(b;r)={zn:zbr}\mathbf{B}(b;r)=\{z\in\mathbb{C}^{n}:{{\|z-b\|}}\leq r\} denotes a closed ball with radius r>0r>0 and center bnb\in\mathbb{C}^{n}. Let g:dg:\mathbb{C}^{d}\rightarrow\mathbb{R} be defined as

g(α):=αHY^α2Re(αHα^)+cϱ,g(\alpha):=\alpha^{H}\hat{Y}\alpha-2\textrm{Re}(\alpha^{H}\hat{\alpha}^{\circ})+c-\varrho,

S(α):={δ:g(α)δ}S(\alpha):=\{\delta\in\mathbb{R}:g(\alpha)\leq\delta\} for αd\alpha\in\mathbb{C}^{d}, and S1(δ):={αd:g(α)δ}S^{-1}(\delta):=\{\alpha\in\mathbb{C}^{d}:g(\alpha)\leq\delta\}. We denote the feasible set of problem (2.5) by e:={αd:g(α)0}.\mathcal{F}_{e}:=\{\alpha\in\mathbb{C}^{d}:g(\alpha)\leq 0\}. Let 𝕃:={0,1,,L}\mathbb{L}:=\{0,1,\ldots,L\}.

Since Y^\hat{Y} is positive definite, gg is strongly convex and has a unique global minimizer Y^1α^α^\hat{Y}^{-1}\hat{\alpha}^{\circ}\neq\hat{\alpha}^{\circ} which implies that g(α^)(infαdg(α),).g(\hat{\alpha}^{\circ})\in(\inf_{\alpha\in\mathbb{C}^{d}}g(\alpha),\infty). Since gg is strongly convex and quadratic, there is CC such that αα¯C|g(α)g(α¯)|\|\alpha-\bar{\alpha}\|\leq C|g(\alpha)-g(\bar{\alpha})| for α,α¯d\alpha,\bar{\alpha}\in\mathbb{C}^{d}. Choosing α¯\bar{\alpha} such that g(α¯)=0g(\bar{\alpha})=0, from [29, Theorem 9.48], we obtain the following lemma.

Lemma 1.

There exists a constant C>0C>0 such that for any αd\alpha\in\mathbb{C}^{d},

dist(α,e)=dist(α,S1(0))Cdist(0,S(α))=C(g(α))+.{\rm{dist}}(\alpha,\mathcal{F}_{e}){={\rm{dist}}(\alpha,S^{-1}(0))}\leq C{\rm{dist}}(0,S(\alpha))=C(g(\alpha))_{+}.
Theorem 1.

There exists a λ>0\lambda^{*}>0 such that a local minimizer αd\alpha^{*}\in\mathbb{C}^{d} of problem (2.5) is a local minimizer of problem (2.6) whenever λλ\lambda\geq\lambda^{*}.

Proof.

Let αd\alpha^{*}\in\mathbb{C}^{d} be a local minimizer of problem (2.5), that is there exists a neighborhood 𝒩\mathcal{N} of α\alpha^{*} such that Φ(α)Φ(α)\Phi(\alpha^{*})\leq\Phi(\alpha) for α𝒩e\alpha\in\mathcal{N}\cap\mathcal{F}_{e}. We denote the group support set of α\alpha^{*} by γ:={l𝕃:αl0}\gamma:=\{l\in\mathbb{L}:\|\alpha_{l\cdot}^{*}\|\neq 0\}, and the complement set of γ\gamma in 𝕃\mathbb{L} by τ:={l𝕃:αl=0}\tau:=\{l\in\mathbb{L}:\|\alpha_{l\cdot}^{*}\|=0\}. Let αγ\alpha_{\gamma} and ατ\alpha_{\tau} denote the restrictions of α\alpha onto γ\gamma and τ\tau, respectively. We obtain that αγ\alpha^{*}_{\gamma} is a local minimizer of the following problem

minαγlγβlαlps.t.αγHY^γαγ2Re(αγHα^γ)+cϱ.\begin{array}[]{cl}\min\limits_{\alpha_{\gamma}}&\sum_{l\in\gamma}\beta_{l}\|\alpha_{l\cdot}\|^{p}\\ {\rm{s.t.}}&\alpha_{\gamma}^{H}\hat{Y}_{\gamma}\alpha_{\gamma}-2\textrm{Re}(\alpha_{\gamma}^{H}\hat{\alpha}_{\gamma}^{\circ})+c\leq\varrho.\end{array} (3.1)

Let ϵ¯=12min{αl:lγ}>0\bar{\epsilon}=\tfrac{1}{2}\min\{\|\alpha^{*}_{l\cdot}\|:l\in\gamma\}>0. Then, there exists a small δ\delta^{*} such that αγ\alpha^{*}_{\gamma} is a local minimizer of (3.1) and min{αl:lγ}>ϵ¯\min\{\|\alpha_{l\cdot}\|:l\in\gamma\}>\bar{\epsilon} for all αγ𝐁(αγ;δ)\alpha_{\gamma}\in\mathbf{B}(\alpha^{*}_{\gamma};\delta^{*}). Let

gγ(αγ)=αγHY^γαγ2Re(αγHαγ)+cϱandΩ1:={αγ:gγ(αγ)0}.g_{\gamma}(\alpha_{\gamma})=\alpha_{\gamma}^{H}\hat{Y}_{\gamma}\alpha_{\gamma}-2{\textrm{Re}}(\alpha_{\gamma}^{H}\alpha_{\gamma}^{\circ})+c-\varrho\quad{\rm and}\quad\Omega_{1}:=\{\alpha_{\gamma}:g_{\gamma}(\alpha_{\gamma})\leq 0\}.

Let [αγ;0τ][\alpha_{\gamma};0_{\tau}] be the vector with ([αγ;0τ])l=αl,lγ([\alpha_{\gamma};0_{\tau}])_{l\cdot}=\alpha_{l\cdot},l\in\gamma and ([αγ;0τ])l=0,lτ.([\alpha_{\gamma};0_{\tau}])_{l\cdot}=0,l\in\tau. It is easy to see that g([αγ;0τ])=gγ(αγ)g([\alpha_{\gamma};0_{\tau}])=g_{\gamma}(\alpha_{\gamma}) and dist(αγ,Ω1)=dist([αγ;0τ],e).{\rm{dist}}(\alpha_{\gamma},\Omega_{1})={\rm{dist}}([\alpha_{\gamma};0_{\tau}],{\cal F}_{e}). By Lemma 1, dist(αγ,Ω1)C(gγ(αγ))+{\rm{dist}}(\alpha_{\gamma},\Omega_{1})\leq C(g_{\gamma}(\alpha_{\gamma}))_{+} for all αγ𝐁(αγ;δ)\alpha_{\gamma}\in\mathbf{B}(\alpha^{*}_{\gamma};\delta^{*}).

The objective function of (3.1) is Lipschitz continuous on 𝐁(αγ;δ)\mathbf{B}(\alpha^{*}_{\gamma};\delta^{*}). Then by [8, Lemma 3.1], there exists a λ>0\lambda^{*}>0 such that, for any λλ\lambda\geq\lambda^{*}, αγ\alpha^{*}_{\gamma} is a local minimizer of the following problem

minαγFλγ(αγ):=lγβlαlp+λ(gγ(αγ))+,\min\limits_{\alpha_{\gamma}}F_{\lambda}^{\gamma}(\alpha_{\gamma}):=\sum_{l\in\gamma}\beta_{l}\|\alpha_{l\cdot}\|^{p}+\lambda(g_{\gamma}(\alpha_{\gamma}))_{+},

that is, there exists a neighborhood UγU_{\gamma} of 0 with Uγ𝐁(0;δ)U_{\gamma}\subseteq\mathbf{B}(0;\delta^{*}) such that

Fλγ(αγ)Fλγ(αγ),αγαγ+Uγ.F_{\lambda}^{\gamma}(\alpha_{\gamma})\geq F_{\lambda}^{\gamma}(\alpha^{*}_{\gamma}),\quad\forall\,\alpha_{\gamma}\in\alpha_{\gamma}^{*}+U_{\gamma}.

We now show that α\alpha^{*} is a local minimizer of (2.6) with λλ\lambda\geq\lambda^{*}.

For fixed ϵ(0,ϵ¯)\epsilon\in(0,\bar{\epsilon}) and λλ\lambda\geq\lambda^{*}, we consider problem (2.6) in the neighborhood U:=Uγ×(ϵ,ϵ)dτU:=U_{\gamma}\times(-\epsilon,\epsilon)^{d_{\tau}}, where dτ=lτ(2l+1)d_{\tau}=\sum_{l\in\tau}(2l+1). Let κ~\tilde{\kappa} be a Lipschitz constant of the function λ(g(α))+\lambda(g(\alpha))_{+} over α+U\alpha^{*}+U. There exists an ϵ0(0,ϵ)\epsilon_{0}\in(0,\epsilon) such that whenever αl<ϵ0\|\alpha_{l\cdot}\|<\epsilon_{0}, lτl\in\tau,

Φ(α)=lγβlαlp+lτβlαlplγβlαlp+κ~lταl.\Phi(\alpha)=\sum\limits_{l\in\gamma}\beta_{l}\|\alpha_{l\cdot}\|^{p}+\sum\limits_{l\in\tau}\beta_{l}\|\alpha_{l\cdot}\|^{p}\geq\sum\limits_{l\in\gamma}\beta_{l}\|\alpha_{l\cdot}\|^{p}+\tilde{\kappa}\sum\limits_{l\in\tau}\|\alpha_{l\cdot}\|. (3.2)

Hence for any yUγ×(ϵ0,ϵ0)dτy\in U_{\gamma}\times(-\epsilon_{0},\epsilon_{0})^{d_{\tau}}, we have

Fλ(α+y)\displaystyle F_{\lambda}(\alpha^{*}+y) =\displaystyle= λ(g(α+y))++lγβlαl+ylp+lτβlylp\displaystyle\lambda(g(\alpha^{*}+y))_{+}+\sum\limits_{l\in\gamma}\beta_{l}\|\alpha^{*}_{l\cdot}+y_{l\cdot}\|^{p}+\sum\limits_{l\in\tau}\beta_{l}\|y_{l\cdot}\|^{p}
\displaystyle\geq λ(g([αγ+yγ;0τ]))+κ~yτ+lγβlαl+ylp+κ~yτ\displaystyle\lambda(g([\alpha_{\gamma}^{*}+y_{\gamma};0_{\tau}]))_{+}-\tilde{\kappa}\|y_{\tau}\|+\sum\limits_{l\in\gamma}\beta_{l}\|\alpha^{*}_{l\cdot}+y_{l\cdot}\|^{p}+\tilde{\kappa}\|y_{\tau}\|
\displaystyle\geq Fλγ(αγ)=Fλ(α),\displaystyle F^{\gamma}_{\lambda}(\alpha_{\gamma}^{*})=F_{\lambda}(\alpha^{*}),

where the first inequality follows from the Lipschitz continuity of gλg_{\lambda} with Lipschitz constant κ~\tilde{\kappa} and (3.2), and the last inequality follows from the local optimality of αγ\alpha_{\gamma}^{*}. Thus, α\alpha^{*} is a local minimizer of problem (2.6) with λλ\lambda\geq\lambda^{*}. This completes the proof. ∎

Theorem 2.

Let α~=Y^1α^\tilde{\alpha}=\hat{Y}^{-1}\hat{\alpha}^{\circ}, ε>0\varepsilon>0 and λCβ¯1p(ε(L+1)p21)1pΦ(α~)\lambda\geq C\bar{\beta}^{\frac{1}{p}}(\varepsilon(L+1)^{\frac{p}{2}-1})^{-\frac{1}{p}}\Phi(\tilde{\alpha}), where CC is defined in Lemma 1 and β¯=ηLLp\bar{\beta}=\eta^{L}L^{p}. Then for any global minimizer α\alpha^{*} of problem (2.6), the projection αε:=𝒫e(α)\alpha_{\varepsilon}:=\mathcal{P}_{\mathcal{F}_{e}}(\alpha^{*}) is an ε\varepsilon-minimizer of (2.5), that is, Φ(αε)min{Φ(α):αe}+ε\Phi(\alpha_{\varepsilon})\leq\min\{\Phi(\alpha):\alpha\in\mathcal{F}_{e}\}+\varepsilon.

Proof.

Note that gg is a strongly convex function and α~\tilde{\alpha} is a minimizer of gg. Since the feasible set e{\cal F}_{e} of (2.5) has an interior point, we have α~inte\tilde{\alpha}\in{\rm int}{\cal F}_{e}.

By the global optimality of α\alpha^{*}, we have Fλ(α)Fλ(α~)F_{\lambda}(\alpha^{*})\leq F_{\lambda}(\tilde{\alpha}) and

((α)HY^α2Re((α)Hα^)+cϱ)+1λFλ(α)1λFλ(α~)=1λΦ(α~).((\alpha^{*})^{H}\hat{Y}\alpha^{*}-2{\rm{Re}}((\alpha^{*})^{H}\hat{\alpha}^{\circ})+c-\varrho)_{+}\leq\tfrac{1}{\lambda}F_{\lambda}(\alpha^{*})\leq\tfrac{1}{\lambda}F_{\lambda}(\tilde{\alpha})=\tfrac{1}{\lambda}\Phi(\tilde{\alpha}). (3.3)

Hence for any αe\alpha\in\mathcal{F}_{e}, we obtain

Φ(αε)Φ(α)\displaystyle\Phi(\alpha_{\varepsilon})-\Phi(\alpha) \displaystyle\leq l=0Lβl((αε)lpαlp)l=0Lβl(αε)lαlpβ¯l=0L((αε)lαl2)p2\displaystyle\sum_{l=0}^{L}\beta_{l}(\|(\alpha_{\varepsilon})_{l\cdot}\|^{p}-\|\alpha^{*}_{l\cdot}\|^{p})\leq\sum\limits_{l=0}^{L}\beta_{l}\|(\alpha_{\varepsilon})_{l\cdot}-\alpha^{*}_{l\cdot}\|^{p}\leq\bar{\beta}\sum\limits_{l=0}^{L}\left(\|(\alpha_{\varepsilon})_{l\cdot}-\alpha^{*}_{l\cdot}\|^{2}\right)^{\frac{p}{2}}
\displaystyle\leq β¯(L+1)(1L+1l=0L(αε)lαl2)p2β¯(L+1)1p2(dist(α,e))p\displaystyle\bar{\beta}(L+1)\left(\tfrac{1}{L+1}\sum\limits_{l=0}^{L}\|(\alpha_{\varepsilon})_{l\cdot}-\alpha^{*}_{l\cdot}\|^{2}\right)^{\frac{p}{2}}\leq\bar{\beta}(L+1)^{1-\frac{p}{2}}({\rm{dist}}(\alpha^{*},\mathcal{F}_{e}))^{p}
\displaystyle\leq β¯(L+1)1p2(C((α)HY^α2Re((α)Hα^)+cϱ)+)p\displaystyle\bar{\beta}(L+1)^{1-\frac{p}{2}}(C((\alpha^{*})^{H}\hat{Y}\alpha^{*}-2{\rm{Re}}((\alpha^{*})^{H}\hat{\alpha}^{\circ})+c-\varrho)_{+})^{p}
\displaystyle\leq β¯(L+1)1p2(CλΦ(α~))pε,\displaystyle\bar{\beta}(L+1)^{1-\frac{p}{2}}\left(\tfrac{C}{\lambda}\Phi(\tilde{\alpha})\right)^{p}\leq\varepsilon,

where the first inequality is from the global optimality of α\alpha^{*}, the second inequality is from Lemma 2.4 in [8], the fifth inequality is from the concavity of function ttp2t\rightarrow t^{\frac{p}{2}} for t0t\geq 0, the sixth inequality is from Lemma 1, the seventh inequality is from (3.3) and the last inequality follows from the choice of λ\lambda. Thus, the projection 𝒫e(α)\mathcal{P}_{\mathcal{F}_{e}}(\alpha^{*}) is an ε\varepsilon-minimizer of (2.5). This completes the proof. ∎

4 Optimality conditions and a smoothing penalty algorithm

In this section, we first define first-order optimality conditions of (2.5) and (2.6), which are necessary conditions for local optimality. We also derive lower bounds for the 2\ell_{2} norm of nonzero groups of first-order stationary points of (2.5) and (2.6). Next we propose a smoothing penalty algorithm for solving (2.5) and prove its convergence to a first-order stationary point of (2.5).

4.1 First-order optimality conditions

We first present a first-order optimality condition of problem (2.6).

Definition 2.

We call αd\alpha^{*}\in\mathbb{C}^{d} a scaled first-order stationary point of problem (2.6) if

pβlαlpαl+2λξαl2(Y^lααl)=0,l𝕃,p\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p}\alpha^{*}_{l\cdot}+2\lambda\xi\|\alpha^{*}_{l\cdot}\|^{2}(\hat{Y}_{l\cdot}\alpha^{*}-\alpha^{\circ}_{l\cdot})=0,\quad\forall\,l\in\mathbb{L}, (4.1)

for some ξ\xi satisfying ξ{=0if(α)HY^α2Re((α)Hα^)+c<ϱ[0,1]if(α)HY^α2Re((α)Hα^)+c=ϱ=1otherwise.\xi\left\{\begin{array}[]{ll}=0&{\rm{if}}\,(\alpha^{*})^{H}\hat{Y}\alpha^{*}-2{\rm{Re}}((\alpha^{*})^{H}\hat{\alpha}^{\circ})+c<\varrho\\ \in[0,1]&{\rm{if}}\,(\alpha^{*})^{H}\hat{Y}\alpha^{*}-2{\rm{Re}}((\alpha^{*})^{H}\hat{\alpha}^{\circ})+c=\varrho\\ =1&{\rm{otherwise}}.\end{array}\right.

Theorem 3.

Let αd\alpha^{*}\in\mathbb{C}^{d} be a local minimizer of (2.6). Then α\alpha^{*} is a scaled first-order stationary point of (2.6).

Proof.

Let αd\alpha^{*}\in\mathbb{C}^{d} be a local minimizer of (2.6). We obtain that αγ\alpha^{*}_{\gamma} is a local minimizer of

minαγlγβlαlp+λ(αγHY^γαγ2Re(αγHα^γ)+cϱ)+.\min\limits_{\alpha_{\gamma}}\quad\sum\limits_{l\in\gamma}\beta_{l}\|\alpha_{l\cdot}\|^{p}+\lambda(\alpha_{\gamma}^{H}\hat{Y}_{\gamma}\alpha_{\gamma}-2\textrm{Re}(\alpha_{\gamma}^{H}\hat{\alpha}_{\gamma}^{\circ})+c-\varrho)_{+}. (4.2)

Note that αl0\|\alpha^{*}_{l\cdot}\|\neq 0, lγl\in\gamma, the first term of the objective function of (4.2) is continuously differentiable at αγ\alpha^{*}_{\gamma}. The first-order necessary optimality condition for problem (4.2) holds at αγ\alpha^{*}_{\gamma}, that is,

pβlαlp2αl+2λξ((Y^γαγ)lαl)=0,lγ,p\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p-2}\alpha^{*}_{l\cdot}+2\lambda\xi((\hat{Y}_{\gamma}\alpha_{\gamma}^{*})_{l\cdot}-\alpha^{\circ}_{l\cdot})=0,\quad\forall\,l\in\gamma, (4.3)

for some ξ\xi satisfying ξ{=0if(αγ)HY^γαγ2Re((αγ)Hα^γ)+c<ϱ[0,1]if(αγ)HY^γαγ2Re((αγ)Hα^γ)+c=ϱ=1otherwise.\xi\left\{\begin{array}[]{ll}=0&{\rm{if}}\,(\alpha_{\gamma}^{*})^{H}\hat{Y}_{\gamma}\alpha_{\gamma}^{*}-2\textrm{Re}((\alpha_{\gamma}^{*})^{H}\hat{\alpha}_{\gamma}^{\circ})+c<\varrho\\ \in[0,1]&{\rm{if}}\,(\alpha_{\gamma}^{*})^{H}\hat{Y}_{\gamma}\alpha_{\gamma}^{*}-2\textrm{Re}((\alpha_{\gamma}^{*})^{H}\hat{\alpha}_{\gamma}^{\circ})+c=\varrho\\ =1&{\rm{otherwise}}.\end{array}\right.

Multiplying αl2\|\alpha^{*}_{l\cdot}\|^{2} on both sides of (4.3), we obtain

pβlαlpαl+2λξαl2((Y^γαγ)lαl)=0,lγ.p\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p}\alpha^{*}_{l\cdot}+2\lambda\xi\|\alpha^{*}_{l\cdot}\|^{2}((\hat{Y}_{\gamma}\alpha_{\gamma}^{*})_{l\cdot}-\alpha^{\circ}_{l\cdot})=0,\quad\forall\,l\in\gamma.

Since αl=0\|\alpha^{*}_{l\cdot}\|=0 for lτl\in\tau, we have

pβlαlpαl+2λξαl2(Y^lααl)=0,l𝕃.p\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p}\alpha^{*}_{l\cdot}+2\lambda\xi\|\alpha^{*}_{l\cdot}\|^{2}(\hat{Y}_{l\cdot}\alpha^{*}-\alpha^{\circ}_{l\cdot})=0,\quad\forall\,l\in\mathbb{L}.

Hence (4.1) holds at α\alpha^{*}. ∎

The next theorem gives a lower bound for the l2l_{2} norm of nonzero groups of stationary points of (2.6).

Theorem 4.

Let αd\alpha^{*}\in\mathbb{C}^{d} be a scaled first-order stationary point of (2.6) and Y^αα^c~\|\hat{Y}\alpha^{*}-\hat{\alpha}^{\circ}\|\leq\tilde{c} for some c~>0\tilde{c}>0. Then,

αl(pβl2λc~)11p,lγ.\|\alpha^{*}_{l\cdot}\|\geq\left(\frac{p\beta_{l}}{2\lambda\tilde{c}}\right)^{\frac{1}{1-p}},\forall\,l\in\gamma.

The proof is similar with that of (2.4), and thus we omit it here. According to Theorems 1 and 3, Theorem 4 gives a lower bound for the 2\ell_{2} norm of nonzero groups of local minimizers of problem (2.5).

Next, we consider the first-order optimality condition of problem (2.5). Similar to Definition 1, we call α\alpha^{*} is a scaled first-order stationary point or a scaled KKT point of (2.5) if the following conditions hold,

pβlαlpαl+2ναl2(Y^lααl)=0,l𝕃,νg(α)=0,g(α)0,ν0.\begin{split}p\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p}\alpha^{*}_{l\cdot}+2\nu\|\alpha^{*}_{l\cdot}\|^{2}(\hat{Y}_{l\cdot}\alpha^{*}-\alpha_{l\cdot}^{\circ})=0,&\,\,\,\forall\,l\in\mathbb{L},\\ \nu g(\alpha^{*})=0,\quad g(\alpha^{*})\leq 0,\quad\nu\geq 0.&\end{split} (4.4)

Since α=0\alpha=0 is not a feasible point, thus ν0\nu\neq 0. Hence, replacing λ\lambda by ν\nu in Theorem 4, we can obtain a lower bound for the 2\ell_{2} norm of nonzero groups of the scaled KKT point of problem (2.5).

Theorem 5.

Let α\alpha^{*} be a local minimizer of problem (2.5). If there exists l𝕃l\in\mathbb{L} such that αl(Ylααl)0\|\alpha_{l\cdot}^{*}\|(Y_{l\cdot}\alpha^{*}-\alpha_{l\cdot}^{\circ})\neq 0, then α\alpha^{*} is a scaled KKT point of (2.5).

Proof.

Let α\alpha^{*} be a local minimizer of problem (2.5), then αγ\alpha_{\gamma}^{*} is a local minimizer of problem (3.1). Recall gγ(αγ)=αγHY^γαγ2Re(αγHα^γ)+cϱg_{\gamma}(\alpha_{\gamma})=\alpha_{\gamma}^{H}\hat{Y}_{\gamma}\alpha_{\gamma}-2\textrm{Re}(\alpha_{\gamma}^{H}\hat{\alpha}_{\gamma}^{\circ})+c-\varrho, then gγ(αγ)=g(α)g_{\gamma}(\alpha^{*}_{\gamma})=g(\alpha^{*}).

The condition that there exists l𝕃l\in\mathbb{L} such that αl(Ylααl)0\|\alpha_{l\cdot}^{*}\|(Y_{l\cdot}\alpha^{*}-\alpha_{l\cdot}^{\circ})\neq 0 implies that Y^γαγα^γ0\hat{Y}_{\gamma}\alpha^{*}_{\gamma}-\hat{\alpha}_{\gamma}^{\circ}\neq 0 if gγ(αγ)=0g_{\gamma}(\alpha_{\gamma}^{*})=0. Thus the LICQ (linear independence constraint qualification) holds at αγ\alpha_{\gamma}^{*} for problem (3.1).

Note that, the objective function of problem (3.1) and gγg_{\gamma} are continuously differentiable at αγ\alpha_{\gamma}^{*} and α¯γgγ(αγ)=Y^γαγα^γ.\partial_{\bar{\alpha}_{\gamma}}g_{\gamma}(\alpha_{\gamma}^{*})=\hat{Y}_{\gamma}\alpha_{\gamma}^{*}-\hat{\alpha}_{\gamma}^{\circ}. Hence αγ\alpha_{\gamma}^{*} is a KKT point of (3.1), that is, there exists ν\nu such that

pβlαlp2αl+2ν(Y^γαγαγ)l=0,lγ,νgγ(αγ)=0,gγ(αγ)0,ν0.\begin{split}p\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p-2}\alpha^{*}_{l\cdot}+2\nu(\hat{Y}_{\gamma}\alpha^{*}_{\gamma}-\alpha_{\gamma}^{\circ})_{l\cdot}=0,&\,\,\,\forall\,l\in\gamma,\\ \nu g_{\gamma}(\alpha_{\gamma}^{*})=0,\quad g_{\gamma}(\alpha_{\gamma}^{*})\leq 0,\quad\nu\geq 0.&\end{split} (4.5)

Multiplying αl2\|\alpha^{*}_{l\cdot}\|^{2} on both sides of the first equality in (4.5), we obtain

pβlαlpαl+2ναl2(Y^γαγαγ)l=0,lγ.p\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p}\alpha^{*}_{l\cdot}+2\nu\|\alpha^{*}_{l\cdot}\|^{2}(\hat{Y}_{\gamma}\alpha^{*}_{\gamma}-\alpha_{\gamma}^{\circ})_{l\cdot}=0,\quad\forall\,l\in\gamma.

Since αl=0\alpha^{*}_{l\cdot}=0 for lτl\in\tau, we obtain

pβlαlpαl+2ναl2(Y^lααl)=0,l𝕃.p\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p}\alpha^{*}_{l\cdot}+2\nu\|\alpha^{*}_{l\cdot}\|^{2}(\hat{Y}_{l\cdot}\alpha^{*}-\alpha_{l\cdot}^{\circ})=0,\quad\forall\,l\in\mathbb{L}.

Combining this with (4.5) and g(α)=gγ(αγ)g(\alpha^{*})=g_{\gamma}(\alpha_{\gamma}^{*}), we find that α\alpha^{*} is a scaled KKT point of (2.5). The proof is completed. ∎

From the definitions we can see that for some λ>0\lambda>0, any scaled KKT point of problem (2.5) is a scaled first order stationary point of (2.6). Moreover, any scaled first order stationary point of problem (2.6) which belongs to the feasible set e\mathcal{F}_{e} is a scaled KKT point of problem (2.5).

Following the proof of Theorem 5, we can show the existence of a KKT point of problem (2.2) by replacing 𝕃\mathbb{L} by 0\mathbb{N}_{0}.

Corollary 1.

Let α~\tilde{\alpha} be a local minimizer of problem (2.2). If there exists l0l\in\mathbb{\mathbb{N}}_{0} such that α~l(Ylα~αl)0\|\tilde{\alpha}_{l\cdot}\|(Y_{l\cdot}\tilde{\alpha}-\alpha_{l\cdot}^{\circ})\neq 0, then α~\tilde{\alpha} is a scaled KKT point of (2.2).

4.2 A smoothing penalty algorithm for problem (2.5)

We define a smoothing function of the nonsmooth function λ(g(α))+\lambda(g(\alpha))_{+} as follows

fλ,μ(α)=ψλ,μ(g(α))f_{\lambda,\mu}(\alpha)=\psi_{\lambda,\mu}(g(\alpha))

with ψλ,μ(s):=λmax0t1{stμ2t2}\psi_{\lambda,\mu}(s):=\lambda\max\limits_{0\leq t\leq 1}\{st-\frac{\mu}{2}t^{2}\} and μ>0\mu>0 is a smoothing parameter. It is easy to verify that ψλ,μ(s)=λmin{max{sμ,0},1}0\psi^{\prime}_{\lambda,\mu}(s)=\lambda\min\{\max\{\frac{s}{\mu},0\},1\}\geq 0 and |ψλ,μ(s1)ψλ,μ(s2)|λμ|s1s2||\psi^{\prime}_{\lambda,\mu}(s_{1})-\psi^{\prime}_{\lambda,\mu}(s_{2})|\leq\frac{\lambda}{\mu}|s_{1}-s_{2}|, s1,s2.\forall\,s_{1},s_{2}\in\mathbb{R}. It is not hard to show that

fλ,μ(α)={0ifg(α)0λ2μ(g(α))2if 0g(α)μλg(α)λμ2ifg(α)μf_{\lambda,\mu}(\alpha)=\left\{\begin{array}[]{ll}0&{\rm{if}}\,g(\alpha)\leq 0\\ \frac{\lambda}{2\mu}(g(\alpha))^{2}&{\rm{if}}\,0\leq g(\alpha)\leq\mu\\ \lambda g(\alpha)-\frac{\lambda\mu}{2}&{\rm{if}}\,g(\alpha)\geq\mu\end{array}\right.

and

0λ(g(α))+fλ,μ(α)λμ2.0\leq\lambda(g(\alpha))_{+}-f_{\lambda,\mu}(\alpha)\leq\tfrac{\lambda\mu}{2}. (4.6)

By Wirtinger calculus, we obtain fλ,μ(α)=[αfλ,μ(α)α¯fλ,μ(α)],\nabla f_{\lambda,\mu}(\alpha)=\small\left[\begin{array}[]{c}\partial_{\alpha}f_{\lambda,\mu}(\alpha)\\ \partial_{\bar{\alpha}}f_{\lambda,\mu}(\alpha)\end{array}\right], where α¯fλ,μ(α)=αfλ,μ(α)¯\partial_{\bar{\alpha}}f_{\lambda,\mu}(\alpha)=\overline{\partial_{\alpha}f_{\lambda,\mu}(\alpha)} and

α¯fλ,μ(α)={0ifg(α)0λμg(α)(Y^αα^)if 0g(α)μλ(Y^αα^)ifg(α)μ.\partial_{\bar{\alpha}}f_{\lambda,\mu}(\alpha)=\left\{\begin{array}[]{ll}0&{\rm{if}}\,g(\alpha)\leq 0\\ \frac{\lambda}{\mu}g(\alpha)(\hat{Y}\alpha-\hat{\alpha}^{\circ})&{\rm{if}}\,0\leq g(\alpha)\leq\mu\\ \lambda(\hat{Y}\alpha-\hat{\alpha}^{\circ})&{\rm{if}}\,g(\alpha)\geq\mu.\end{array}\right.

More details about the smoothing function can be found in [7] and references therein.

We consider the following optimization problem

minαd\displaystyle\min\limits_{\alpha\in\mathbb{C}^{d}} Fλ,μ(α):=Φ(α)+fλ,μ(α).\displaystyle F_{\lambda,\mu}(\alpha):=\Phi(\alpha)+f_{\lambda,\mu}(\alpha). (4.7)

For fixed positive parameters λ\lambda and μ\mu, Fλ,μF_{\lambda,\mu} is continuous and level-bounded since Φ\Phi is level-bounded and fλ,μf_{\lambda,\mu} is nonnegative. Moreover, the gradient of fλ,μf_{\lambda,\mu} is Lipschitz continuous.

Now, we propose a smoothing penalty algorithm for solving problem (2.5).

A smoothing penalty algorithm for problem (2.5)

   Choose λ0>0\lambda^{0}>0, μ0>0\mu^{0}>0, ε0>0\varepsilon^{0}>0, ς1>1\varsigma_{1}>1, and 0<ς2<10<\varsigma_{2}<1. Set k=0k=0 and α0=α~:=Y^1α\alpha^{0}=\tilde{\alpha}:=\hat{Y}^{-1}\alpha^{{\circ}}.
  (1) If Fλk,μk(αk)>Fλk,μk(α~)F_{\lambda^{k},\mu^{k}}(\alpha^{k})>F_{\lambda^{k},\mu^{k}}(\tilde{\alpha}), set αk=α~\alpha^{k}=\tilde{\alpha}; otherwise αk=αk\alpha^{k}=\alpha^{k}.
  (2) Solve problem (4.7) with initial point αk\alpha^{k}, λ=λk\lambda=\lambda^{k}, μ=μk\mu=\mu^{k}, and find an αk+1\alpha^{k+1} satisfying
pβlαlk+1pαlk+1+2αlk+12(α¯fλk,μk(αk+1))lεk,l𝕃.\|p\beta_{l}\|\alpha^{k+1}_{l\cdot}\|^{p}\alpha^{k+1}_{l\cdot}+2\|\alpha^{k+1}_{l\cdot}\|^{2}(\partial_{\bar{\alpha}}f_{\lambda^{k},\mu^{k}}(\alpha^{k+1}))_{l\cdot}\|\leq\varepsilon^{k},\quad\forall\,l\in\mathbb{L}. (4.8)
  (3) Set λk+1=ς1λk\lambda^{k+1}=\varsigma_{1}\lambda^{k}, μk+1=ς2μk+1\mu^{k+1}=\varsigma_{2}\mu^{k+1}, εk+1=ς2εk\varepsilon^{k+1}=\varsigma_{2}\varepsilon^{k}.
  (4) Set k=k+1k=k+1 and go to (1).

We give the convergence of Algorithm 4.2 in the following theorem.

Theorem 6.

Let {αk}\{\alpha^{k}\} be generated by Algorithm 4.2. Then, the following statements hold.

  1. (i)

    {αk}\{\alpha^{k}\} is bounded.

  2. (ii)

    Any accumulation point α\alpha^{*} of {αk}\{\alpha^{k}\} is a scaled KKT point of problem (2.5).

Proof.

(i) We can see that

Φ(αk+1)Fλk,μk(αk+1)Fλk,μk(α~)=Φ(α~),\Phi(\alpha^{k+1})\leq F_{\lambda^{k},\mu^{k}}(\alpha^{k+1})\leq F_{\lambda^{k},\mu^{k}}(\tilde{\alpha})=\Phi(\tilde{\alpha}),

where the first inequality follows from that fλk,μk(αk+1)0f_{\lambda^{k},\mu^{k}}(\alpha^{k+1})\geq 0, the second inequality follows from step (1) of Algorithm 4.2, and the equality is from α~=Y^1αe\tilde{\alpha}=\hat{Y}^{-1}\alpha^{{\circ}}\in\mathcal{F}_{e} and fλk,μk(α~)=0f_{\lambda^{k},\mu^{k}}(\tilde{\alpha})=0. Since Φ\Phi is level-bounded, {αk}\{\alpha^{k}\} is bounded.

(ii) Let α\alpha^{*} be an accumulation point of {αk}\{\alpha^{k}\} and {αk}k𝒦\{\alpha^{k}\}_{k\in\mathcal{K}} be a subsequence of {αk}\{\alpha^{k}\} such that {αk}α\{\alpha^{k}\}\rightarrow\alpha^{*} as kk\rightarrow\infty, k𝒦k\in\mathcal{K}. Note that

λk1(g(αk))+λk1μk12fλk1,μk1(αk)Fλk1,μk1(αk)Fλk1,μk1(α~)Φ(α~),\lambda^{k-1}(g(\alpha^{k}))_{+}-\tfrac{\lambda^{k-1}\mu^{k-1}}{2}\leq f_{\lambda^{k-1},\mu^{k-1}}(\alpha^{k})\leq F_{\lambda^{k-1},\mu^{k-1}}(\alpha^{k})\leq F_{\lambda^{k-1},\mu^{k-1}}(\tilde{\alpha})\leq\Phi(\tilde{\alpha}),

where the first inequality follows from (4.6). Then, we have

(g(αk))+Φ(α~)λk1+μk12.(g(\alpha^{k}))_{+}\leq\frac{\Phi(\tilde{\alpha})}{\lambda^{k-1}}+\frac{\mu^{k-1}}{2}. (4.9)

From step (3) in Algorithm 4.2, λk1\lambda^{k-1}\rightarrow\infty and μk10\mu^{k-1}\rightarrow 0, as kk\rightarrow\infty, k𝒦k\in\mathcal{K}. Taking limits in (4.9) as kk\rightarrow\infty, k𝒦k\in\mathcal{K}, we obtain that (g(α))+0(g(\alpha^{*}))_{+}\leq 0. Hence, αe\alpha^{*}\in\mathcal{F}_{e}.

From (4.8), we have

pβlαlkpαlk+2αlk2(α¯fλk,μk(αk))lεk1,l𝕃.\|p\beta_{l}\|\alpha^{k}_{l\cdot}\|^{p}\alpha^{k}_{l\cdot}+2\|\alpha^{k}_{l\cdot}\|^{2}(\partial_{\bar{\alpha}}f_{\lambda^{{k}},\mu^{{k}}}(\alpha^{k}))_{l\cdot}\|\leq\varepsilon^{{k-1}},\quad\forall\,l\in\mathbb{L}. (4.10)

We first assume that g(α)<0g(\alpha^{*})<0. For all sufficiently large kk, we obtain g(αk)<0g(\alpha^{k})<0 and (4.10) becomes

pβlαlkp+1εk1,l𝕃.p\beta_{l}\|\alpha^{k}_{l\cdot}\|^{p+1}\leq\varepsilon^{{k-1}},\quad\forall\,l\in\mathbb{L}.

Taking limits on both sides of the above relation, we obtain α=0\alpha^{*}=0 which contradicts to Assumption 1. Thus, g(α)=0g(\alpha^{*})=0.

Let tk:=ψλk,ϵk(g(αk))t^{k}:=\psi^{\prime}_{\lambda^{k},\epsilon^{k}}(g(\alpha^{k})) for notational simplicity, we have tk0t^{k}\geq 0. Then (4.10) reduces to

pβlαlkpαlk+2tkαlk2(Y^lαkαl)εk1,l𝕃.\|p\beta_{l}\|\alpha^{k}_{l\cdot}\|^{p}\alpha^{k}_{l\cdot}+2t^{k}\|\alpha^{k}_{l\cdot}\|^{2}(\hat{Y}_{l\cdot}\alpha^{k}-\alpha_{l\cdot}^{\circ})\|\leq\varepsilon^{{k-1}},\quad\forall\,l\in\mathbb{L}. (4.11)

Now, we prove that {tk}𝒦\{t^{k}\}_{\mathcal{K}} is bounded. On the contrary, we assume {tk}𝒦\{t^{k}\}_{\mathcal{K}} is unbounded and {tk}𝒦\{t^{k}\}_{\mathcal{K}}\rightarrow\infty, then,

pβltkαlkpαlk+2αlk2(Y^lαkαl)εk1tk,l𝕃.\left\|\tfrac{p\beta_{l}}{t^{k}}\|\alpha^{k}_{l\cdot}\|^{p}\alpha^{k}_{l\cdot}+2\|\alpha^{k}_{l\cdot}\|^{2}(\hat{Y}_{l\cdot}\alpha^{k}-\alpha_{l\cdot}^{\circ})\right\|\leq\tfrac{\varepsilon^{{k-1}}}{t^{k}},\quad\forall\,l\in\mathbb{L}.

Passing to the limit in the above relation gives

αl2(Y^lααl)=0,l𝕃.\|\alpha^{*}_{l\cdot}\|^{2}(\hat{Y}_{l\cdot}\alpha^{*}-\alpha_{l\cdot}^{\circ})=0,\quad\forall\,l\in\mathbb{L}.

Since g(α)=0g(\alpha^{*})=0 implies α\alpha^{*} is in e{\cal F}_{e}, but is not a minimizer of gg, we have α0\alpha^{*}\neq 0 and Y^αα^0\hat{Y}\alpha^{*}-\hat{\alpha}^{\circ}\neq 0. Moreover, g(α)=gγ([αγ;0τ])=0g(\alpha^{*})=g_{\gamma}([\alpha^{*}_{\gamma};0_{\tau}])=0 implies that αl0,\alpha_{l\cdot}^{*}\neq 0, lγ{\forall}l\in\gamma and (Y^αα^)γ0(\hat{Y}\alpha^{*}-\hat{\alpha}^{\circ})_{\gamma}\neq 0. Thus, {tk}𝒦\{t^{k}\}_{\mathcal{K}} is bounded. Let {tk}𝒦t\{t^{k}\}_{\mathcal{K}}\rightarrow t^{*}. Taking limits on both sides of (4.11) gives

pβlαlpαl+2tαl2(Y^lααl)=0,l𝕃.\|p\beta_{l}\|\alpha^{*}_{l\cdot}\|^{p}\alpha^{*}_{l\cdot}+2t^{*}\|\alpha^{*}_{l\cdot}\|^{2}(\hat{Y}_{l\cdot}\alpha^{*}-\alpha_{l\cdot}^{\circ})\|=0,\quad\forall\,l\in\mathbb{L}.

Therefore, α\alpha^{*} is a scaled KKT point of (2.5). ∎

5 Approximation error

In [14], the authors gave the approximation error for random field using regularized 2,1\ell_{2,1} model based on the observed random field T(ω,x)L2(Ω×𝕊2)T^{\circ}(\omega,{\rm{x}})\in L_{2}(\Omega\times\mathbb{S}^{2}). In this section, we estimate the approximation error of the inpainted random field in the space L2(Ω×𝕊2)L_{2}(\Omega\times\mathbb{S}^{2}) based on the observed random field T(ω,x)L2(Ω×𝕊2)T^{\circ}(\omega,{\rm{x}})\in L_{2}(\Omega\times\mathbb{S}^{2}).

For any fixed ωΩ\omega\in\Omega, let α^(ω):=(α0,0(ω),,αLω,Lω(ω))T(Lω+1)2\hat{\alpha}^{*}(\omega):=(\alpha^{*}_{0,0}(\omega),\ldots,\alpha^{*}_{L_{\omega},L_{\omega}}(\omega))^{T}\in\mathbb{C}^{(L_{\omega}+1)^{2}} with the group support set γω\gamma_{\omega} be a scaled KKT point of problem (2.5). By our results in the previous sections, LωL_{\omega} is a finite number. Moreover, α(ω):=((α^(ω))T,0,)T\alpha^{*}(\omega):=((\hat{\alpha}^{*}(\omega))^{T},0,\ldots)^{T} is a scaled KKT point of problem (2.2) in the infinite-dimensional space βp\ell^{p}_{\beta}.

Let the random field defined by a scaled KKT point αβp\alpha^{*}\in\ell^{p}_{\beta} of problem (2.2) be

T(ω,x)=l=0m=llαl,m(ω)Yl,m(x),T^{*}(\omega,{\rm{x}})=\sum\limits_{l=0}^{\infty}\sum_{m=-l}^{l}\alpha_{l,m}^{*}(\omega)Y_{l,m}(\rm{x}), (5.1)

where αl,m(ω)=0\alpha_{l,m}^{*}(\omega)=0, l=Lω+1,l=L_{\omega}+1,\ldots, m=l,,lm=-l,\ldots,l.

Lemma 2.

If the random variable ωΩ\omega\in\Omega has finite second order moment that is 𝔼[ω2]<\mathbb{E}[\|\omega\|^{2}]<\infty and there is κ\kappa such that T(ω1,x)T(ω2,x)L2(𝕊2)κω1ω2\|T^{*}(\omega_{1},\mathrm{x})-T^{*}(\omega_{2},\mathrm{x})\|_{L_{2}(\mathbb{S}^{2})}\leq\kappa\|\omega_{1}-\omega_{2}\|, ω1,ω2Ω\forall\omega_{1},\omega_{2}\in\Omega, then T(ω,x)L2(Ω×𝕊2).T^{*}(\omega,{\rm{x}})\in L_{2}(\Omega\times\mathbb{S}^{2}).

Proof.

Let ω~Ω\tilde{\omega}\in\Omega be fixed. Since for any ωΩ\omega\in\Omega,

T(ω,x)L2(𝕊2)T(ω~,x)L2(𝕊2)T(ω,x)T(ω~,x)L2(𝕊2)κωω~,\|T^{*}(\omega,\mathrm{x})\|_{L_{2}(\mathbb{S}^{2})}-\|T^{*}(\tilde{\omega},\mathrm{x})\|_{L_{2}(\mathbb{S}^{2})}\leq\|T^{*}(\omega,\mathrm{x})-T^{*}(\tilde{\omega},\mathrm{x})\|_{L_{2}(\mathbb{S}^{2})}\leq\kappa\|\omega-\tilde{\omega}\|,

we have

T(ω,x)L2(𝕊2)2(κωω~+T(ω~,x)L2(𝕊2))22κ2ωω~2+2T(ω~,x)L2(𝕊2)2.\|T^{*}(\omega,\mathrm{x})\|^{2}_{L_{2}(\mathbb{S}^{2})}\leq\left(\kappa\|\omega-\tilde{\omega}\|+\|T^{*}(\tilde{\omega},\mathrm{x})\|_{L_{2}(\mathbb{S}^{2})}\right)^{2}\leq 2\kappa^{2}\|\omega-\tilde{\omega}\|^{2}+2\|T^{*}(\tilde{\omega},\mathrm{x})\|^{2}_{L_{2}(\mathbb{S}^{2})}.

Hence, we obtain

𝔼[T(ω,x)L2(𝕊2)2]2T(ω~,x)L2(𝕊2)2+2κ2𝔼[ωω~2]<,\mathbb{E}[\|T^{*}(\omega,\mathrm{x})\|^{2}_{L_{2}(\mathbb{S}^{2})}]\leq 2\|T^{*}(\tilde{\omega},\mathrm{x})\|^{2}_{L_{2}(\mathbb{S}^{2})}+2\kappa^{2}\mathbb{E}[\|\omega-\tilde{\omega}\|^{2}]<\infty,

where the last inequality follows from 𝔼[ω2]<\mathbb{E}[\|\omega\|^{2}]<\infty. Thus, T(ω,x)L2(Ω×𝕊2).T^{*}(\omega,{\rm{x}})\in L_{2}(\Omega\times\mathbb{S}^{2}).

Theorem 7.

Let T(ω,x)L2(Ω×𝕊2)T^{\circ}(\omega,{\rm{x}})\in L_{2}(\Omega\times\mathbb{S}^{2}) be the observed random field. Then for any ϵ>0\epsilon>0 there exists LL such that

0𝒜(TL(ω,x))T(ω,x)L2(Ω×𝕊2)2ϱ<ϵ,0\leq\|\mathcal{A}(T_{L}^{*}(\omega,{\rm{x}}))-T^{\circ}(\omega,{\rm{x}})\|^{2}_{L_{2}(\Omega\times\mathbb{S}^{2})}-{\varrho}<\epsilon,

where TL(ω,x)=l=0Lm=llαl,m(ω)Yl,m(x)T_{L}^{*}(\omega,{\rm{x}})=\sum_{l=0}^{L}\sum_{m=-l}^{l}\alpha^{*}_{l,m}(\omega)Y_{l,m}({\rm{x}}).

Proof.

Since T(ω,x)L2(Ω×𝕊2)T^{\circ}(\omega,{\rm{x}})\in L_{2}(\Omega\times\mathbb{S}^{2}), by Fubini’s theorem, for ωΩ\omega\in\Omega, T(ω,x)L2(𝕊2)T^{\circ}(\omega,{\rm{x}})\in L_{2}(\mathbb{S}^{2}), PP-a.s., in which case T(ω,x)T^{\circ}(\omega,{\rm{x}}) admits an expansion in terms of spherical harmonics, PP-a.s., that is, T(ω,x)=l=0m=llαl,mobs(ω)Yl,m(x),T^{\circ}(\omega,{\rm{x}})=\sum_{l=0}^{\infty}\sum_{m=-l}^{l}\alpha_{l,m}^{\rm{obs}}(\omega)Y_{l,m}({\rm{x}}), PP-a.s., where αobs(ω)=(α0,0obs(ω),α1,1obs(ω),)T\alpha^{\rm{obs}}(\omega)=(\alpha_{0,0}^{\rm{obs}}(\omega),\alpha_{1,-1}^{\rm{obs}}(\omega),\ldots)^{T} is the Fourier coefficient vector of T(ω,x)T^{\circ}(\omega,{\rm{x}}).

By Definition 1 for any ωΩ\omega\in\Omega, α(ω)0\alpha^{*}(\omega)\neq 0, we have νω>0\nu_{\omega}>0. Now, we prove that there exists a positive scalar ν¯\bar{\nu} such that νων¯\nu_{\omega}\geq\bar{\nu} for any ωΩ\omega\in\Omega. On the contrary, if there exists a ωΩ\omega\in\Omega such that νω0\nu_{\omega}\rightarrow 0, then from

α(ω)2,pp={l0:αl(ω)0}βlαl(ω)p\displaystyle\|\alpha^{*}(\omega)\|_{2,p}^{p}=\sum\limits_{\{l\in\mathbb{N}_{0}:\,\|\alpha^{*}_{l\cdot}(\omega)\|\neq 0\}}\beta_{l}\|\alpha^{*}_{l\cdot}(\omega)\|^{p} \displaystyle\geq (p2νωc~)p1p{l0:αl(ω)0}(ηllp)11p,\displaystyle\left(\frac{p}{2\nu_{\omega}\tilde{c}}\right)^{\frac{p}{1-p}}\sum\limits_{\{l\in\mathbb{N}_{0}:\,\|\alpha^{*}_{l\cdot}(\omega)\|\neq 0\}}(\eta^{l}l^{p})^{\frac{1}{1-p}},

we have α(ω)2,pp\|\alpha^{*}(\omega)\|_{2,p}^{p}\rightarrow\infty which is a contradiction with α(ω)βp\alpha^{*}(\omega)\in\ell_{\beta}^{p}. Thus, from α(ω)βp\alpha^{*}(\omega)\in\ell_{\beta}^{p} and (2.4), there exists a positive scalar ν¯\bar{\nu} such that νων¯\nu_{\omega}\geq\bar{\nu} for any ωΩ\omega\in\Omega. Thus, for any ϵ1>0\epsilon_{1}>0 there exists L1L_{1} such that pν¯l=L1+1βlαl(ω)p<ϵ12\frac{p}{\bar{\nu}}\sum_{l=L_{1}+1}^{\infty}\beta_{l}\|\alpha_{l\cdot}^{*}(\omega)\|^{p}<\frac{\epsilon_{1}}{2} for any ωΩ\omega\in\Omega, which implies that

𝔼[pνωl=L1+1βlαl(ω)p]<pν¯𝔼[l=L1+1βlαl(ω)p]<ϵ12.\mathbb{E}\left[\frac{p}{\nu_{\omega}}\sum\limits_{l=L_{1}+1}^{\infty}\beta_{l}\|\alpha_{l\cdot}^{*}(\omega)\|^{p}\right]<\frac{p}{\bar{\nu}}\mathbb{E}\left[\sum\limits_{l=L_{1}+1}^{\infty}\beta_{l}\|\alpha_{l\cdot}^{*}(\omega)\|^{p}\right]<\frac{\epsilon_{1}}{2}.

By Lemma 2, for any ϵ2>0\epsilon_{2}>0 there exists L2L_{2} such that l=L2+1𝔼[αl(ω)2]<ϵ22\sum_{l=L_{2}+1}^{\infty}\mathbb{E}[\|\alpha_{l\cdot}^{*}(\omega)\|^{2}]<\frac{\epsilon_{2}}{2}. Let ϵ=max{ϵ1,ϵ2}\epsilon=\max\{\epsilon_{1},\epsilon_{2}\}, L=max{L1,L2}L=\max\{L_{1},L_{2}\} and d=(L+1)2d=(L+1)^{2}.

For notational simplicity, let α(ω)=((α^(ω))T,(α~(ω))T)Tβp\alpha^{*}(\omega)=((\hat{\alpha}^{*}(\omega))^{T},(\tilde{\alpha}^{*}(\omega))^{T})^{T}\in\ell_{\beta}^{p}, where α^(ω)d\hat{\alpha}^{*}(\omega)\in\mathbb{C}^{d}, ωΩ\omega\in\Omega and Y=[Y^XXHY~]Y=\scriptsize\left[\begin{array}[]{cc}\hat{Y}&X\\ X^{H}&\tilde{Y}\\ \end{array}\right], where Y^d×d\hat{Y}\in\mathbb{C}^{d\times d}. Let T~L(ω,x)=l=L+1m=llαl,m(ω)Yl,m(x)\tilde{T}_{L}^{*}(\omega,{\rm{x}})=\sum_{l=L+1}^{\infty}\sum_{m=-l}^{l}\alpha_{l,m}^{*}(\omega)Y_{l,m}({\rm{x}}), we have

ϱ=𝒜(T(ω,x))T(ω,x)L2(Ω×𝕊2)2=𝒜(TL(ω,x)+T~L(ω,x))T(ω,x)L2(Ω×𝕊2)2=𝒜(TL(ω,x))T(ω,x)L2(Ω×𝕊2)2+𝔼[𝕊2|𝒜(T~L(ω,x))|2𝑑σ(x)+2𝕊2(𝒜(TL(ω,x))T(ω,x))𝒜(T~L(ω,x))𝑑σ(x)]=𝒜(TL(ω,x))T(ω,x)L2(Ω×𝕊2)2+𝔼[Γ|T~L(ω,x)|2𝑑σ(x)+2Γ(TL(ω,x)T(ω,x))T~L(ω,x)𝑑σ(x)]=𝒜(TL(ω,x))T(ω,x)L2(Ω×𝕊2)2+𝔼[(α~(ω))HY~α~(ω)]+𝔼[2(α~(ω))HXHα^(ω)2(α~(ω))H[XHY~]αobs(ω)],\displaystyle\begin{split}\varrho&=\|\mathcal{A}(T^{*}(\omega,{\rm{x}}))-T^{\circ}(\omega,{\rm{x}})\|^{2}_{L_{2}(\Omega\times\mathbb{S}^{2})}=\|\mathcal{A}(T_{L}^{*}(\omega,{\rm{x}})+\tilde{T}_{L}^{*}(\omega,{\rm{x}}))-T^{\circ}(\omega,{\rm{x}})\|^{2}_{L_{2}(\Omega\times\mathbb{S}^{2})}\\ &=\|\mathcal{A}(T_{L}^{*}(\omega,{\rm{x}}))-T^{\circ}(\omega,{\rm{x}})\|^{2}_{L_{2}(\Omega\times\mathbb{S}^{2})}\\ &\quad+\mathbb{E}\left[\int_{\mathbb{S}^{2}}|\mathcal{A}(\tilde{T}_{L}^{*}(\omega,{\rm{x}}))|^{2}d\sigma({\rm{x}})+2\int_{\mathbb{S}^{2}}(\mathcal{A}(T_{L}^{*}(\omega,{\rm{x}}))-T^{\circ}(\omega,{\rm{x}}))\mathcal{A}(\tilde{T}_{L}^{*}(\omega,{\rm{x}}))d\sigma({\rm{x}})\right]\\ &=\|\mathcal{A}(T_{L}^{*}(\omega,{\rm{x}}))-T^{\circ}(\omega,{\rm{x}})\|^{2}_{L_{2}(\Omega\times\mathbb{S}^{2})}\\ &\quad+\mathbb{E}\left[\int_{\Gamma}|\tilde{T}_{L}^{*}(\omega,{\rm{x}})|^{2}d\sigma({\rm{x}})+2\int_{\Gamma}(T_{L}^{*}(\omega,{\rm{x}})-T^{\circ}(\omega,{\rm{x}}))\tilde{T}_{L}^{*}(\omega,{\rm{x}})d\sigma({\rm{x}})\right]\\ &=\|\mathcal{A}(T_{L}^{*}(\omega,{\rm{x}}))-T^{\circ}(\omega,{\rm{x}})\|^{2}_{L_{2}(\Omega\times\mathbb{S}^{2})}+\mathbb{E}[(\tilde{\alpha}^{*}(\omega))^{H}\tilde{Y}\tilde{\alpha}^{*}(\omega)]\\ &\quad+\mathbb{E}[2(\tilde{\alpha}^{*}(\omega))^{H}X^{H}\hat{\alpha}^{*}(\omega)-2(\tilde{\alpha}^{*}(\omega))^{H}[\begin{array}[]{cc}X^{H}&\tilde{Y}\\ \end{array}]\alpha^{\rm{obs}}(\omega)],\end{split} (5.2)

where the first equality follows from the second equality in (2.3) with νω>0\nu_{\omega}>0 for any ωΩ\omega\in\Omega. For any fixed ωΩ\omega\in\Omega, by Definition 1 for a scaled KKT point αβp\alpha^{*}\in\ell_{\beta}^{p} of problem (2.2), there is νω>0\nu_{\omega}>0 such that

pβlαl(ω)p2αl(ω)+2νω(Ylα(ω)αl(ω))=0,lγω.p\beta_{l}\|\alpha^{*}_{l\cdot}(\omega)\|^{p-2}\alpha^{*}_{l\cdot}(\omega)+2\nu_{\omega}(Y_{l\cdot}\alpha^{*}(\omega)-\alpha_{l\cdot}^{\circ}(\omega))=0,\quad l\in\gamma_{\omega}.

From (2.1), for ωΩ\omega\in\Omega, α(ω)=Yαobs(ω)\alpha^{\circ}(\omega)=Y\alpha^{\rm{obs}}(\omega). Thus, for any fixed ωΩ\omega\in\Omega,

pβlαl(ω)p+2νω(αl(ω))H(Ylα(ω)Ylαobs(ω))=0,lγω.p\beta_{l}\|\alpha^{*}_{l\cdot}(\omega)\|^{p}+2\nu_{\omega}(\alpha^{*}_{l\cdot}(\omega))^{H}(Y_{l\cdot}\alpha^{*}(\omega)-Y_{l\cdot}\alpha^{\rm{obs}}(\omega))=0,\quad l\in\gamma_{\omega}. (5.3)

We obtain

l=L+1pβlαl(ω)p+2νω(α~(ω))H[XHY~](α(ω)αobs(ω))=0,ωΩ.\sum\limits_{l=L+1}^{\infty}p\beta_{l}\|\alpha^{*}_{l\cdot}(\omega)\|^{p}+2\nu_{\omega}(\tilde{\alpha}^{*}(\omega))^{H}[\begin{array}[]{cc}X^{H}&\tilde{Y}\\ \end{array}](\alpha^{*}(\omega)-\alpha^{\rm{obs}}(\omega))=0,\,\omega\in\Omega. (5.4)

Thus,

𝔼[(α~(ω))HY~α~(ω)]+𝔼[2(α~(ω))HXHα^(ω)2(α~(ω))H[XHY~]αobs(ω)]=𝔼[(α~(ω))HY~α~(ω)+2(α~(ω))HXHα^(ω)2(α~(ω))H[XHY~]α(ω)]𝔼[pνωl=L+1βlαl(ω)p]=𝔼[(α~(ω))HY~α~(ω)pνωl=L+1βlαl(ω)p]𝔼[l=L+1αl(ω)2pνωl=L+1βlαl(ω)p]>ϵ,\displaystyle\begin{split}&\mathbb{E}[(\tilde{\alpha}^{*}(\omega))^{H}\tilde{Y}\tilde{\alpha}^{*}(\omega)]+\mathbb{E}[2(\tilde{\alpha}^{*}(\omega))^{H}X^{H}\hat{\alpha}^{*}(\omega)-2(\tilde{\alpha}^{*}(\omega))^{H}[\begin{array}[]{cc}X^{H}&\tilde{Y}\\ \end{array}]\alpha^{\rm{obs}}(\omega)]\\ &=\mathbb{E}\left[(\tilde{\alpha}^{*}(\omega))^{H}\tilde{Y}\tilde{\alpha}^{*}(\omega)+2(\tilde{\alpha}^{*}(\omega))^{H}X^{H}\hat{\alpha}^{*}(\omega)-2(\tilde{\alpha}^{*}(\omega))^{H}[\begin{array}[]{cc}X^{H}&\tilde{Y}\end{array}]\alpha^{*}(\omega)\right]\\ &\quad-\mathbb{E}\left[\frac{p}{\nu_{\omega}}\sum\limits_{l=L+1}^{\infty}\beta_{l}\|\alpha^{*}_{l\cdot}(\omega)\|^{p}\right]\\ &=\mathbb{E}\left[-(\tilde{\alpha}^{*}(\omega))^{H}\tilde{Y}\tilde{\alpha}^{*}(\omega)-\frac{p}{\nu_{\omega}}\sum\limits_{l=L+1}^{\infty}\beta_{l}\|\alpha^{*}_{l\cdot}(\omega)\|^{p}\right]\\ &\geq\mathbb{E}\left[-\sum\limits_{l=L+1}^{\infty}\|{\alpha}_{l\cdot}^{*}(\omega)\|^{2}-\frac{p}{\nu_{\omega}}\sum\limits_{l=L+1}^{\infty}\beta_{l}\|\alpha^{*}_{l\cdot}(\omega)\|^{p}\right]>-\epsilon,\end{split} (5.5)

where the first equality follows from (5.4) and the first inequality follows from that Y~1\|\tilde{Y}\|\leq 1. Combining (5.2) and (5.5), we obtain

0𝒜(TL(ω,x))T(ω,x)L2(Ω×𝕊2)2ϱ<ϵ.0\leq\|\mathcal{A}(T_{L}^{*}(\omega,{\rm{x}}))-T^{\circ}(\omega,{\rm{x}})\|^{2}_{L_{2}(\Omega\times\mathbb{S}^{2})}-\varrho<\epsilon.

The proof is completed. ∎

6 Numerical experiments

In this section, we conduct numerical experiments to compare the p\ell_{p}-2\ell_{2} optimization model (2.5) with the 1\ell_{1} optimization model (31) in [32] on the inpainting of banded-limited random fields and images from Earth topography data to show the efficiency of problem (2.5) and Algorithm 4.2.

Following [8], we adapt the nonmonotone proximal gradient (NPG) method to solve subproblem (4.7) in Algorithm 4.2. For completeness, we present the NPG method as follows.

NPG method for problem (4.7)

  Given α0e\alpha^{0}\in\mathcal{F}_{e}. Choose MmaxMmin>0M_{{\rm{max}}}\geq M_{{\rm{min}}}>0, η~>1\tilde{\eta}>1, b>0b>0 and an integer N0N\geq 0. Set n=0n=0.
  (1) Choose Mn0[Mmin,Mmax]M_{n}^{0}\in[M_{{\rm{min}}},M_{{\rm{max}}}]. Set Mn=Mn0M_{n}=M_{n}^{0}. (a) Solve the subproblem
yargminαd{Φ(α)+2Reα¯fλ,μ(αn),ααn+Mnααn2}.y\in\arg\min\limits_{\alpha\in\mathbb{C}^{d}}\left\{\Phi(\alpha)+2{\rm{Re}}\langle\partial_{\bar{\alpha}}f_{\lambda,\mu}(\alpha^{n}),\alpha-\alpha^{n}\rangle+M_{n}\|\alpha-\alpha^{n}\|^{2}\right\}.\vspace{-2mm}
 (b) If Fλ,μ(y)max[nN]+jnFλ,μ(αj)byαn2F_{\lambda,\mu}(y)\leq\max\limits_{[n-N]_{+}\leq j\leq n}F_{\lambda,\mu}(\alpha^{j})-b\|y-\alpha^{n}\|^{2} is satisfied, go to (2). Otherwise set Mn=η~MnM_{n}=\tilde{\eta}M_{n}, and go to step (a).
  (2) Set αn+1=y\alpha^{n+1}=y, n=n+1n=n+1 and go to (1).

For the NPG method to solve (4.7) in Algorithm 4.2 at λ=λk\lambda=\lambda_{k} μ=μk\mu=\mu_{k}, we set Mmin=1M_{{\rm{min}}}=1, Mmax=106M_{{\rm{max}}}=10^{6}, η~=2\tilde{\eta}=2, b=104b=10^{-4}, N=4N=4, M00=1M_{0}^{0}=1, and for any n1n\geq 1,

Mn0=min{max{|(αnαn1)H(α¯Fλ,μ(αn)α¯Fλ,μ(αn1))|αnαn12,1},106}.M_{n}^{0}=\min\left\{\max\left\{\frac{|(\alpha^{n}-\alpha^{n-1})^{H}(\partial_{\bar{\alpha}}F_{\lambda,\mu}(\alpha^{n})-\partial_{\bar{\alpha}}F_{\lambda,\mu}(\alpha^{n-1}))|}{\|\alpha^{n}-\alpha^{n-1}\|^{2}},1\right\},10^{6}\right\}.

Algorithm 6 is terminated when

αnαn1εkand|Fλ,μ(αn)Fλ,μ(αn1)|max{1,|Fλ,μ(αn)|}min{(εk)2.2,104}.\|\alpha^{n}-\alpha^{n-1}\|_{\infty}\leq\sqrt{\varepsilon^{k}}\quad{\rm{and}}\quad\frac{|F_{\lambda,\mu}(\alpha^{n})-F_{\lambda,\mu}(\alpha^{n-1})|}{\max\{1,|F_{\lambda,\mu}(\alpha^{n})|\}}\leq\min\{(\varepsilon^{k})^{2.2},10^{-4}\}.

In Algorithm 4.2, we set λ0=20\lambda^{0}=20, μ0=ε0=1\mu^{0}=\varepsilon^{0}=1, ς1=2\varsigma_{1}=2, ς2=12\varsigma_{2}=\frac{1}{2}, α0=Y^1α^\alpha^{0}=\hat{Y}^{-1}\hat{\alpha}^{\circ}. The smoothing penalty algorithm is terminated when max{g(αk)+,0.01εk}106,\max\{g(\alpha^{k})_{+},0.01\varepsilon^{k}\}\leq 10^{-6}, where εk\varepsilon^{k} is updated by εk+1=max{ς2εk,106}\varepsilon^{k+1}=\max\{\varsigma_{2}\varepsilon^{k},10^{-6}\} instead of ς2εk\varsigma_{2}\varepsilon^{k} in the experiments. All codes were written in MATLAB and the realizations were implemented in Python.

6.1 Random data

In the following experiments, we consider randomly generated instances which are generated as follows. First, we choose a subset D{0,1,,L1}D\subset\{0,1,\ldots,L-1\} and generate a group sparse coefficient vector αLtrue(L+1)2×(L+1)2\alpha_{L}^{{\rm{true}}}\in\mathbb{C}^{(L+1)^{2}\times(L+1)^{2}} such that αltrue=0\alpha^{{\rm{true}}}_{l\cdot}=0 if lDl\in D and αltrue=αlcmb/αLcmb1.5\alpha^{{\rm{true}}}_{l\cdot}={\alpha_{l\cdot}^{{\rm{cmb}}}}/{\|\alpha_{L}^{{\rm{cmb}}}\|^{1.5}} if lDcl\in D^{c}, where αLcmb\alpha_{L}^{{\rm{cmb}}} is the coefficient vector with maximum degree LL of the CMB 2018 map computed by the HEALPy package [15].

We assume that the noise Δ\Delta also has the K-L expansion and set ϱ=ΔL2(𝕊2)2\varrho=\|\Delta\|_{L_{2}(\mathbb{S}^{2})}^{2}. We generate the data for the noise on the HEALPix points with Nside=2048N_{\rm{side}}=2048 by the MATLAB command: δrandn(Npix,1)\delta{\rm{randn}}(N_{\rm{pix}},1), where Npix=12×Nside2N_{\rm{pix}}=12\times N_{\rm{side}}^{2} and δ>0\delta>0 is a scaling parameter. Then we use the Python HEALPy package to compute the coefficients of the noise. The masks denoted by Γc=𝕊2Γ\Gamma^{c}=\mathbb{S}^{2}{{\setminus}}\Gamma are shown in Figure 1.

Refer to caption
(a) Γ1c\Gamma^{c}_{1}
Refer to caption
(b) Γ2c\Gamma^{c}_{2}
Refer to caption
(c) Γ3c\Gamma^{c}_{3}
Refer to caption
(d) Γ4c\Gamma^{c}_{4}
Figure 1: Masks (grey part).

In the experiments, we set p=0.5p=0.5, βl=(1+104)llp\beta_{l}=(1+10^{-4})^{l}l^{p} for l1l\geq 1 and β0=1\beta_{0}=1. We compare the p\ell_{p}-2\ell_{2} optimization model (2.5) by using Algorithm 4.2 with the 1\ell_{1} optimization model (31) in [32] using the YALL1 method [33] under the KKM sampling scheme [21]. We also compare our method with the group SPGl1 method (https://friedlander.io/spgl1/). We present the results in Tables 1 and 2. Following [9], nnz:=αL&αLtrue2,0{\rm{nnz}}:=\|\alpha_{L}^{*}\,\verb"&"\,\alpha^{{\rm{true}}}_{L}\|_{2,0} denotes the number of nonzero groups that αL\alpha_{L}^{*} and αLtrue\alpha_{L}^{{\rm{true}}} have in common and false:=αL2,0αL&αLtrue2,0{\rm{false}}:=\|\alpha_{L}^{*}\|_{2,0}-\|\alpha_{L}^{*}\,\verb"&"\,\alpha^{{\rm{true}}}_{L}\|_{2,0} denotes the number of “false positives” where αltrue\alpha^{{\rm{true}}}_{l\cdot} is a zero vector, but αl\alpha_{l\cdot}^{*} is a nonzero vector. The signal-to-noise ratio (SNR) and relative error are defined by

SNR=20×log10xtruextruex,RelErr:=αLαLtrueαLtrue,{\rm{SNR}}=20\times\log_{10}\tfrac{\|x^{\rm{true}}\|}{\|x^{\rm{true}}-x^{*}\|},\quad{\rm{RelErr}}:=\tfrac{\|\alpha_{L}^{*}-\alpha_{L}^{{\rm{true}}}\|}{\|\alpha_{L}^{{\rm{true}}}\|},

respectively, where xtrue=(Ttrue(x1),,Ttrue(xn))Tx^{\rm{true}}=(T^{\rm{true}}(\mathrm{x}_{1}),\ldots,T^{\rm{true}}(\mathrm{x}_{n}))^{T} and x=(T(x1),,T(xn))Tx^{*}=(T^{*}(\mathrm{x}_{1}),\ldots,T^{*}(\mathrm{x}_{n}))^{T} are estimated on the HEALPix points with n=12×20482n=12\times 2048^{2} and αL\alpha_{L}^{*} is the terminating solution,.

From Tables 1 and 2, we can see that our optimization model (2.5) using Algorithm 4.2 achieves smaller relative errors and higher SNR values than the 1\ell_{1} optimization model (31) in [32]. Although the group SPGl1 method archives small relative errors for some experiments, the SNR values is smaller than ours. Moreover, compared with the 1\ell_{1} optimization model and group SPGl1, our optimization model (2.5) can exactly recover the number and positions of nonzero groups of αL\alpha_{L}^{*}.

We select some numerical results from Table 1 with L=50L=50 and αLtrue2,0=26\|\alpha_{L}^{{\rm{true}}}\|_{2,0}=26 to show the inpainting quality in Figure 2. We observe that our model using Algorithm 4.2 achieves smaller pointwise errors than the 1\ell_{1} optimization model (31) in [32] and group SPGl1 method.

Table 1: Numerical results for the inpainting of band-limited random fields with δ=1\delta=1.
Algorithm 4.2 YALL1 Group SPGl1
Mask αLtrue2,0\|\alpha^{{\rm{true}}}_{L}\|_{2,0} RelErr αL2,0\|\alpha^{*}_{L}\|_{2,0} nnz{\rm{nnz}} false{\rm{false}} SNR RelErr αL2,0\|\alpha^{*}_{L}\|_{2,0} nnz{\rm{nnz}} false{\rm{false}} SNR RelErr αL2,0\|\alpha^{*}_{L}\|_{2,0} nnz{\rm{nnz}} false{\rm{false}} SNR
L=35L=35
Γ1c\Gamma_{1}^{c} 7 0.0017 7 7 0 55.54 0.511 14 6 8 5.82 0.0023 12 7 5 52.65
11 0.0012 11 11 0 58.28 0.575 13 9 4 4.80 0.0788 23 11 12 56.19
19 0.0013 19 19 0 57.73 0.374 25 17 8 8.55 0.3965 29 19 10 46.25
Γ2c\Gamma_{2}^{c} 7 0.0023 7 7 0 52.47 0.668 19 7 12 3.08 0.0043 17 7 10 47.29
11 0.0022 11 11 0 53.19 0.651 25 11 14 2.67 0.0788 32 11 21 22.07
19 0.0059 19 19 0 44.54 0.683 26 19 7 1.87 0.3965 33 19 14 8.03
Γ3c\Gamma^{c}_{3} 7 0.0019 7 7 0 54.60 0.378 21 7 14 8.50 0.0035 16 7 9 49.01
11 0.0014 11 11 0 57.30 0.357 13 11 2 8.89 0.0024 26 11 15 52.57
19 0.0015 19 19 0 56.28 0.321 25 19 6 9.70 0.1195 34 19 15 18.45
Γ4c\Gamma^{c}_{4} 7 0.0017 7 7 0 55.33 0.351 16 6 10 8.91 0.0022 12 7 5 52.65
11 0.0013 11 11 0 57.72 0.321 11 9 2 10.78 0.0019 23 11 12 54.29
19 0.0014 19 19 0 56.84 0.273 22 17 5 11.25 0.0073 32 19 13 42.76
L=50L=50
Γ1c\Gamma^{c}_{1} 6 0.0027 6 6 0 51.43 0.551 28 5 23 5.18 0.0030 13 6 7 50.40
16 0.0024 16 16 0 52.43 0.591 33 13 20 4.57 0.0024 33 16 17 52.29
26 0.0031 26 26 0 50.21 0.511 35 23 12 5.84 0.0038 37 26 11 48.50
Γ2c\Gamma_{2}^{c} 6 0.0032 6 6 0 49.84 0.676 39 6 33 2.83 0.0044 15 6 9 47.04
16 0.0044 16 16 0 47.15 0.643 43 16 27 2.68 0.0729 45 16 29 22.74
26 0.0090 26 26 0 40.93 0.664 42 26 16 2.73 0.3549 47 26 21 8.99
Γ3c\Gamma_{3}^{c} 6 0.0027 6 6 0 51.43 0.212 41 6 35 13.21 0.0036 14 6 8 48.89
16 0.0024 16 16 0 52.43 0.191 37 16 21 14.61 0.0041 38 16 22 47.72
26 0.0031 26 26 0 50.21 0.298 38 26 12 10.50 0.1393 48 26 22 17.11
Γ4c\Gamma_{4}^{c} 6 0.0027 6 6 0 51.25 0.197 37 6 31 16.42 0.0031 14 6 8 50.14
16 0.0023 16 16 0 52.47 0.193 37 16 21 14.21 0.0030 32 16 16 50.51
26 0.0030 26 26 0 50.55 0.239 37 25 12 13.05 0.0078 44 26 18 42.21
Table 2: Numerical results for the inpainting of band-limited random fields with δ=0.1\delta=0.1.
Algorithm 4.2 YALL1 Group SPGl1
Mask αLtrue2,0\|\alpha^{{\rm{true}}}_{L}\|_{2,0} RelErr αL2,0\|\alpha^{*}_{L}\|_{2,0} nnz{\rm{nnz}} false{\rm{false}} SNR RelErr αL2,0\|\alpha^{*}_{L}\|_{2,0} nnz{\rm{nnz}} false{\rm{false}} SNR RelErr αL2,0\|\alpha^{*}_{L}\|_{2,0} nnz{\rm{nnz}} false{\rm{false}} SNR
L=35L=35
Γ1c\Gamma^{c}_{1} 7 1.90e-4 7 7 0 74.42 0.551 6 6 0 5.83 2.88e-4 11 7 4 70.78
11 1.27e-4 11 11 0 77.89 0.591 12 9 3 4.80 1.61e-4 21 11 10 75.82
19 1.39e-4 19 19 0 77.13 0.511 25 17 8 8.55 0.0035 33 19 14 49.18
Γ2c\Gamma^{c}_{2} 7 2.58e-4 7 7 0 71.73 0.676 19 19 0 3.08 5.24e-4 17 7 10 65.60
11 2.35e-4 11 11 0 72.57 0.643 21 11 10 2.67 0.0859 34 11 23 21.32
19 6.89e-4 19 19 0 63.22 0.664 10 7 3 1.87 0.3949 33 19 14 8.06
Γ3c\Gamma^{c}_{3} 7 2.08e-4 7 7 0 73.60 0.212 16 7 9 8.50 4.05e-4 15 7 8 67.84
11 1.40e-4 11 11 0 77.05 0.191 14 11 3 8.89 2.54e-4 25 11 14 71.89
19 1.70e-4 19 19 0 75.37 0.298 21 19 2 9.70 0.1172 35 19 16 18.62
Γ4c\Gamma^{c}_{4} 7 1.97e-4 7 7 0 74.09 0.197 11 6 5 8.91 2.65e-4 11 7 4 71.50
11 1.28e-4 11 11 0 77.83 0.193 9 9 0 10.78 1.93e-4 22 11 11 74.25
19 1.56e-4 19 19 0 76.13 0.239 20 17 3 11.26 0.0046 35 19 16 46.66
L=50L=50
Γ1c\Gamma^{c}_{1} 6 2.56e-4 6 6 0 71.81 0.551 13 5 8 5.18 3.10e-4 11 6 5 70.17
16 2.15e-4 16 16 0 73.32 0.591 18 13 5 4.57 2.67e-4 27 16 11 71.44
26 2.46e-4 26 26 0 72.17 0.511 35 22 12 5.84 3.94e-4 35 26 9 68.08
Γ2c\Gamma^{c}_{2} 6 3.18e-4 6 6 0 69.92 0.676 16 6 10 2.83 4.48e-4 16 6 10 66.96
16 5.09e-4 16 16 0 65.85 0.643 36 16 20 2.68 0.0829 45 16 29 21.63
26 9.97e-4 26 26 0 60.01 0.664 32 26 6 2.73 0.3562 47 26 21 8.98
Γ3c\Gamma^{c}_{3} 6 2.97e-4 6 6 0 70.52 0.212 30 6 24 13.21 3.84e-4 13 6 7 68.31
16 2.70e-4 16 16 0 71.34 0.191 24 16 8 14.61 4.59e-4 34 16 18 66.75
26 3.28e-4 26 26 0 69.65 0.298 31 26 5 10.50 0.1387 50 26 24 17.15
Γ4c\Gamma^{c}_{4} 6 2.86e-4 6 6 0 70.85 0.197 23 6 17 16.42 3.43e-4 11 6 5 69.29
16 2.56e-4 16 16 0 71.82 0.193 24 16 8 14.21 3.27e-4 28 16 12 69.69
26 3.09e-4 26 26 0 70.19 0.239 29 25 4 13.05 8.06e-4 43 26 17 61.86
Refer to caption
(a) Observed fields with L=50L=50, αLtrue2,0=6\|\alpha^{\rm{true}}_{L}\|_{2,0}=6 and δ=1\delta=1 (masked by Γ1c\Gamma^{c}_{1}, Γ2c\Gamma^{c}_{2}, Γ3c\Gamma^{c}_{3} and Γ4c\Gamma^{c}_{4} from left to right)
Refer to caption
(b) Inpainted fields by Algorithm 4.2
Refer to caption
(c) Pointwise error of inpainted fields in (b) by Algorithm 4.2
Refer to caption
(d) Inpainted fields by YALL1
Refer to caption
(e) Pointwise error of inpainted fields in (d) by YALL1
Refer to caption
(f) Inpainted fields by group SPGl1
Refer to caption
(g) Pointwise error of inpainted fields in (f) by group SPGl1
Figure 2: Inpainting results.

To give some insight for the choice of LL of the truncated space, we conduct experiments on a random field with different LL under the noiseless case. We choose the mask Γ4\Gamma_{4} and the observed field TT^{\circ} with degree 5050. In Figure 3(a) we show the approximation error 𝒜(TL)TL2(𝕊2)2\|\mathcal{A}(T_{L}^{*})-T^{\circ}\|^{2}_{L_{2}(\mathbb{S}^{2})} on a logarithmic scale. We can see that the error decreases as LL increases and the errors are same for L=48,49,50L=48,49,50. Moreover, α49\alpha^{*}_{49\cdot} and α50\alpha^{*}_{50\cdot} are zero groups of αL\alpha_{L}^{*} for L=50L=50. From our discussion in Section 2 and Theorem 5.2, we can guess that the value of LL for (a) is 4848. We plot the error T50TLL2(𝕊2)2\|T_{50}^{*}-T_{L}^{*}\|^{2}_{L_{2}(\mathbb{S}^{2})} for different LL in Figure 3(b). We can observe that the error decreases as LL increases and the errors are zero for L=48,49,50L=48,49,50 due to α49=0\alpha^{*}_{49\cdot}=0 and α50=0\alpha^{*}_{50\cdot}=0. In Figure 3 (c) and (d), we plot the values of αl\|\alpha_{l\cdot}^{*}\| of TLT^{*}_{L} with different LL to give some insight of the sparsity pattern of α\alpha^{*}.

Refer to caption
(a) 𝒜(TL)TL2(𝕊2)2\|\mathcal{A}(T_{L}^{*})-T^{\circ}\|^{2}_{L_{2}(\mathbb{S}^{2})}
Refer to caption
(b) T50TLL2(𝕊2)2\|T_{50}^{*}-T^{*}_{L}\|^{2}_{L_{2}(\mathbb{S}^{2})}
Refer to caption
(c) αl\|\alpha_{l\cdot}^{*}\|
Refer to caption
(d) αl\|\alpha_{l\cdot}^{*}\|
Figure 3: Approximation errors and sparsity pattern.

6.2 Real image

We consider the inpainting of images from Earth topographic data, taken from Earth Gravitational Model (EGM2008) publicly released by the U.S. National Geospatial-Intelligence Agency (NGA) EGM Development Team444http://geoweb.princeton.edu/people/simons/DOTM/Earth.mat. We generate four resolution test images from the Earth topography data which are shown in Figures 5(a), 6(a), 7(a) and 8(a). For each resolution test image we consider a mask which is shown in Figures 5(d), 6(d), 7(d), and 8(d) respectively. For all the images we set δ=1\delta=1. The recovered images and pointwise errors are presented in Figures 5, 6, 7 and 8. We can see that model (2.5) using Algorithm 4.2 yields higher quality images with smaller pointwise errors and higher SNR values than the 1\ell_{1} optimization model (31) in [32] and the group SPGl1 method under different resolutions.

Refer to caption
(a) True
Refer to caption
(b) Algorithm 4.2
Refer to caption
(c) YALL1
Refer to caption
(d) group SPGl1
Refer to caption
(e) Observed
Refer to caption
(f) Error of (b)
Refer to caption
(g) Error of (c)
Refer to caption
(h) Error of (d)
Figure 4: Test images of Earth topographic data constructed to be band-limited at L=35L=35. The SNR of (b) is 97.30 and the SNR of (c) is 1.81 and the SNR of (d) is 13.04.
Refer to caption
(a) True
Refer to caption
(b) Algorithm 4.2
Refer to caption
(c) YALL1
Refer to caption
(d) group SPGl1
Refer to caption
(e) Observed
Refer to caption
(f) Error of (b)
Refer to caption
(g) Error of (c)
Refer to caption
(h) Error of (d)
Figure 5: Test images of Earth topographic data constructed to be band-limited at L=35L=35. The SNR of (b) is 97.30 and the SNR of (c) is 1.81 and the SNR of (d) is 13.04.
Refer to caption
(a) Ture
Refer to caption
(b) Algorithm 4.2
Refer to caption
(c) YALL1
Refer to caption
(d) group SPGl1
Refer to caption
(e) Observed
Refer to caption
(f) Error of (b)
Refer to caption
(g) Error of (c)
Refer to caption
(h) Error of (d)
Figure 6: Test images of Earth topographic data constructed to be band-limited at L=50L=50. The SNR of (b) is 90.47, the SNR of (c) is 6.44 and the SNR of (d) is 8.07.
Refer to caption
(a) True
Refer to caption
(b) Algorithm 4.2
Refer to caption
(c) YALL1
Refer to caption
(d) group SPGl1
Refer to caption
(e) Observed
Refer to caption
(f) Error of (b)
Refer to caption
(g) Error of (c)
Refer to caption
(h) Error of (d)
Figure 7: Test images of Earth topographic data constructed to be band-limited at L=80L=80. The SNR of (b) is 82.48, the SNR of (c) is 14.70 and the SNR of (d) is 18.04.
Refer to caption
(a) True
Refer to caption
(b) Algorithm 4.2
Refer to caption
(c) YALL1
Refer to caption
(d) group SPGl1
Refer to caption
(e) Observed
Refer to caption
(f) Error of (b)
Refer to caption
(g) Error of (c)
Refer to caption
(h) Error of (d)
Figure 8: Test images of Earth topographic data constructed to be band-limited at L=128L=128. The SNR of (b) is 73.18, the SNR of (c) is 21.60 and the SNR of (d) is 1.65.

7 Conclusion

In this paper, we propose a constrained group sparse optimization model (1.6) for the inpainting of random fields on the unit sphere with unique continuation property. Based on the K-L expansion, we rewrite the constraint in (1.6) in a discrete form and derive an equivalence formulation (2.2) of problem (1.6). For problem (2.2), we derive a lower bound (2.4) for the 2\ell_{2} norm of nonzero groups of its scaled KKT points. We show that problem (2.2) is equivalent to the finite-dimensional problem (2.5). For this finite-dimensional problem (2.5) and its penalty problem (2.6), we prove the exact penalization in terms of local minimizers and ε\varepsilon-minimizers. Moreover, we propose a smoothing penalty algorithm for solving problem (2.5) and prove its convergence. We also present the approximation error of the inpainted random field represented by the scaled KKT point in the space L2(Ω×𝕊2)L_{2}(\Omega\times\mathbb{S}^{2}). Finally, we present numerical results on band-limited random fields on the sphere and the images from Earth topographic data to show the promising performance of our model by using the smoothing penalty algorithm.

References

  • [1] P. Abrial, Y. Moudden, J.L. Starck, B. Afeyan, J. Bobin, J. Fadili, and M.K. Nguyen. Morphological component analysis and inpainting on the sphere: application in physics and astrophysics. J. Fourier Anal. Appl., 13(6):729–748, 2007.
  • [2] S. Axler, P. Bourdon, and R. Wade. Harmonic Function Theory, volume 137. Springer Science & Business Media, 2013.
  • [3] Amir Beck and Nadav Hallak. Optimization problems involving group sparsity terms. Math. Program., 178(1):39–67, 2019.
  • [4] D.H. Brandwood. A complex gradient operator and its application in adaptive array theory. IEE Proceedings H - Microwaves, Optics and Antennas, 130(1):11–16, 1983.
  • [5] P. Cabella and D. Marinucci. Statistical challenges in the analysis of cosmic microwave background radiation. Ann. Appl. Stat., 3:61–95, 2009.
  • [6] V. Cammarota and D. Marinucci. The stochastic properties of l1l_{1}-regularized spherical Gaussian fields. Appl. Comput. Harmon. Anal., 38(2):262–283, 2015.
  • [7] X. Chen. Smoothing methods for nonsmooth, nonconvex minimization. Math. Program., 134(1):71–99, 2012.
  • [8] X. Chen, Z. Lu, and T.K. Pong. Penalty methods for a class of non-Lipschitz optimization problems. SIAM J. Optim., 26(3):1465–1492, 2016.
  • [9] X. Chen and R.S. Womersley. Spherical designs and nonconvex minimization for recovery of sparse signals on the sphere. SIAM J. Imaging Sci., 11(2):1390–1415, 2018.
  • [10] X. Chen, F. Xu, and Y. Ye. Lower bound theory of nonzero entries in solutions of l2l_{2}-lpl_{p} minimization. SIAM J. Sci. Comput., 32(5):2832–2852, 2010.
  • [11] Xiaojun Chen and Ph L Toint. High-order evaluation complexity for convexly-constrained optimization with non-lipschitzian group sparsity terms. Math. Program., 187(1):47–78, 2021.
  • [12] P.E. Creasey and A. Lang. Fast generation of isotropic Gaussian random fields on the sphere. Monte Carlo Methods Appl., 24(1):1–11, 2018.
  • [13] S.M. Feeney, D. Marinucci, J.D. McEwen, H.V. Peiris, B.D. Wandelt, and V. Cammarota. Sparse inpainting and isotropy. J. Cosmol. Astropart. Phys., 2014(1):050, 2014.
  • [14] Q.T. Le Gia, I.H. Sloan, R.S. Womersley, and Y.G. Wang. Sparse isotropic regularization for spherical harmonic representations of random fields on the sphere. Appl. Comput. Harmon. Anal., 49:257–278, 2019.
  • [15] K.M. Górski, E. Hivon, A.J. Banday, B.D. Wandelt, F.K. Hansen, M. Reinecke, and M. Bartelmann. Healpix: a framework for high-resolution discretization and fast analysis of data distributed on the sphere. Astrophys. J., 622(2):759–771, 2005.
  • [16] E. Hille. Analytic Function Theory, volume 2. American Mathematical Soc., 2005.
  • [17] Jian Huang, Shuange Ma, Huiliang Xie, and Cun-Hui Zhang. A group bridge approach for variable selection. Biometrika, 96(2):339–355, 2009.
  • [18] Junzhou Huang and Tong Zhang. The benefit of group sparsity. Ann. Statist., 38(4):1978–2004, 2010.
  • [19] V. Isakov. Inverse Problems for Partial Differential Equations, volume 127. Springer, 2006.
  • [20] J. Jeong, M. Jun, and M. Genton. Spherical process models for global spatial statistics. Stat. Sci., 32(4):501–513, 2017.
  • [21] Z. Khalid, R.A. Kennedy, and J.D. McEwen. An optimal-dimensionality sampling scheme on the sphere with fast spherical harmonic transforms. IEEE Trans. Signal Process., 62(17):4597–4610, 2014.
  • [22] K. Kreutz-Delgado. The complex gradient operator and the CR-calculus. arXiv:0906.4835, 2009.
  • [23] A. Lang and C. Schwab. Isotropic Gaussian random fields on the sphere: regularity, fast simulation and stochastic partial differential equations. Ann. Appl. Probab., 25(6):3047–3094, 2015.
  • [24] C. Li and X. Chen. Isotropic non-Lipschitz regularization for sparse representations of random fields on the sphere. Math. Comp., 91:219–243, 2022.
  • [25] D. Marinucci and G. Peccati. Random Fields on the Sphere: Representations, Limit Theorems and Cosmological Applications. Cambridge University Press, Cambridge, 2011.
  • [26] H.S. Oh and T.H. Li. Estimation of global temperature fields from scattered observations by a spherical-wavelet-based spatially adaptive method. J. R. Statist. Soc. B, 66(1):221–238, 2004.
  • [27] Lili Pan and Xiaojun Chen. Group sparse optimization for images recovery using capped folded concave functions. SIAM J. Imaging Sci., 14(1):1–25, 2021.
  • [28] E. Porcu, A. Alegria, and R. Furrer. Modeling temporally evolving and spatially globally dependent data. Int. Stat. Rev., 86(2):344–377, 2018.
  • [29] R.T. Rockafellar and R.J-B Wets. Variational Analysis. Springer Science & Business Media, 2009.
  • [30] J.L. Starck, M. Fadili, and A. Rassat. Low-ll CMB analysis and inpainting. Astron. Astrophys., 550:A15, 2013.
  • [31] M.L. Stein. Spatial variation of total column ozone on a global scale. Ann. Appl. Stat., 1(1):191–210, 2007.
  • [32] C.G.R. Wallis, Y. Wiaux, and J.D. McEwen. Sparse image reconstruction on the sphere: analysis and synthesis. IEEE Trans. Image Process., 26(11):5176–5187, 2017.
  • [33] J. Yang and Y. Zhang. Alternating direction algorithms for 1\ell_{1}-problems in compressive sensing. SIAM J. Sci. Comput., 33(1):250–278, 2011.
  • [34] Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B, 68(1):49–67, 2006.

Appendix A. Unique continuation property

In this Appendix, we give details about the unique continuation property of a class of fields with spherical harmonic representations.

Let r^(1,η)\hat{r}\in(1,\eta) be a positive constant for some η>1\eta>1, B(0;r^):={(r,θ,ϕ):r[0,r^),θ[0,π],ϕ[0,2π]}3B(0;\hat{r}):=\{(r,\theta,\phi):r\in[0,\hat{r}),\theta\in[0,\pi],\phi\in[0,2\pi]\}\subset\mathbb{R}^{3} be an open ball centered at 0 of radius r^\hat{r}. Let

𝒯={T(θ,ϕ)L2(𝕊2):T(θ,ϕ)=l=0m=llαl,mYl,m(θ,ϕ)andαβp}\mathcal{T}=\left\{T(\theta,\phi)\in L_{2}(\mathbb{S}^{2}):T(\theta,\phi)=\sum_{l=0}^{\infty}\sum_{m=-l}^{l}\alpha_{l,m}Y_{l,m}(\theta,\phi)\ {\rm{and}}\ \alpha\in\ell_{\beta}^{p}\right\} (.1)

be a class of fields with spherical harmonic representations and coefficients belong to βp\ell_{\beta}^{p}. Then we show that any T𝒯T\in\mathcal{T} has the unique continuation property. Let H:B(0;r^)H:B(0;\hat{r})\rightarrow\mathbb{R} be given by

H(r,θ,ϕ)=l=0rlm=llαl,mYl,m(θ,ϕ),H(r,\theta,\phi)=\sum\limits_{l=0}^{\infty}r^{l}\sum_{m=-l}^{l}\alpha_{l,m}Y_{l,m}(\theta,\phi),

where αβp\alpha\in\ell_{\beta}^{p}.

Lemma 3.

If αβp\alpha\in\ell_{\beta}^{p}, then l=0ηllαl<\sum_{l=0}^{\infty}\eta^{l}l\|\alpha_{l\cdot}\|<\infty.

Proof.

Since αβp\alpha\in\ell_{\beta}^{p} and l=0ηl=\sum_{l=0}^{\infty}\eta^{l}=\infty, there exists an integer NN such that for any l>Nl>N, ηllpαlp<ηl.\eta^{l}l^{p}\|\alpha_{l\cdot}\|^{p}<\eta^{l}. Then, we obtain, for any l>Nl>N, lαllpαlp<1l\|\alpha_{l\cdot}\|\leq l^{p}\|\alpha_{l\cdot}\|^{p}<1 which implies that ηllαlηllpαlp\eta^{l}l\|\alpha_{l\cdot}\|\leq\eta^{l}l^{p}\|\alpha_{l\cdot}\|^{p}. Thus, l=0ηllαl<.\sum_{l=0}^{\infty}\eta^{l}l\|\alpha_{l\cdot}\|<\infty. This completes the proof. ∎

Lemma 4.

HH is real analytic in B(0;r^)B(0;\hat{r}).

Proof.

Let HL(r,θ,ϕ)=l=0LHl(r,θ,ϕ)H_{L}(r,\theta,\phi)=\sum_{l=0}^{L}H_{l}(r,\theta,\phi), where Hl(r,θ,ϕ)=rlm=llαl,mYl,m(θ,ϕ)H_{l}(r,\theta,\phi)=r^{l}\sum_{m=-l}^{l}\alpha_{l,m}Y_{l,m}(\theta,\phi). By definition of harmonic functions [2], rlYl,m(θ,ϕ)r^{l}Y_{l,m}(\theta,\phi), l0l\in\mathbb{N}_{0}, m=l,,lm=-l,\ldots,l are harmonic functions on B(0;r^)B(0;\hat{r}). Then we obtain that HLH_{L}, L0L\in\mathbb{N}_{0} are harmonics functions on B(0;r^)B(0;\hat{r}). Moreover, for any (r,θ,ϕ)B(0;r^)(r,\theta,\phi)\in B(0;\hat{r}) and l>0l>0,

|Hl(r,θ,ϕ)|\displaystyle|H_{l}(r,\theta,\phi)| =\displaystyle= |m=llrlαl,mYl,m(θ,ϕ)||(m=ll(rlαl,m)2)12(m=ll(Yl,m(θ,ϕ))2)12|\displaystyle\left|\sum\limits_{m=-l}^{l}r^{l}\alpha_{l,m}Y_{l,m}(\theta,\phi)\right|\leq\left|\left(\sum\limits_{m=-l}^{l}(r^{l}\alpha_{l,m})^{2}\right)^{\frac{1}{2}}\left(\sum\limits_{m=-l}^{l}(Y_{l,m}(\theta,\phi))^{2}\right)^{\frac{1}{2}}\right|
=\displaystyle= rlαl2l+14π<ηllαl,\displaystyle r^{l}\|\alpha_{l\cdot}\|\sqrt{\tfrac{2l+1}{4\pi}}<\eta^{l}l\|\alpha_{l\cdot}\|,

where the first inequality follows from Cauchy-Schwarz inequality, the second equality follows from addition theorem of spherical harmonics and the last inequality follows from r[0,r^)r\in[0,\hat{r}) and r^<η\hat{r}<\eta.

By Lemma 3, l=0ηllαl<\sum_{l=0}^{\infty}\eta^{l}l\|\alpha_{l\cdot}\|<\infty, then for any ϵ>0\epsilon>0, there exists NN such that for any L>L>NL^{\prime}>L>N, we have that for any (r,θ,ϕ)B(0;r^)(r,\theta,\phi)\in B(0;\hat{r}),

|HL(r,θ,ϕ)HL(r,θ,ϕ)|\displaystyle|H_{L^{\prime}}(r,\theta,\phi)-H_{L}(r,\theta,\phi)| =\displaystyle= |l=L+1LHl(r,θ,ϕ)|l=L+1L|Hl(r,θ,ϕ)|<l=L+1Lηllαl<ϵ.\displaystyle\left|\sum_{l=L+1}^{L^{\prime}}H_{l}(r,\theta,\phi)\right|\leq\sum_{l=L+1}^{L^{\prime}}\left|H_{l}(r,\theta,\phi)\right|<\sum_{l=L+1}^{L^{\prime}}\eta^{l}l\|\alpha_{l\cdot}\|<\epsilon.

Thus, the sequence of harmonic functions {HL}\{H_{L}\} converges uniformly to HH on B(0;r^)B(0;\hat{r}).

By [2, Theorem 1.23], HH is harmonic on B(0;r^)B(0;\hat{r}). Moreover, by [2, Theorem 1.28], HH is real analytic in B(0;r^)B(0;\hat{r}). This completes the proof. ∎

Lemma 5.

For any T𝒯T\in\mathcal{T}, if there exists a sequence of distinct points {(θn,ϕn)}𝕊2\{(\theta_{n},\phi_{n})\}\subseteq\mathbb{S}^{2}, with at least one limit point in 𝕊2\mathbb{S}^{2}, and if T(θn,ϕn)=0T(\theta_{n},\phi_{n})=0, n=1,2,n=1,2,\ldots, then T0T\equiv 0 on 𝕊2\mathbb{S}^{2}.

Proof.

It is obvious that H(1,θ,ϕ)=T(θ,ϕ)H(1,\theta,\phi)=T(\theta,\phi), then T(θn,ϕn)=0T(\theta_{n},\phi_{n})=0, n=1,2,n=1,2,\ldots implies that H(1,θn,ϕn)=0H(1,\theta_{n},\phi_{n})=0, n=1,2,n=1,2,\ldots By Lemma 4, HH is real analytic in B(0;r^)B(0;\hat{r}). Then by [16, Theorem 8.1.3], we obtain H0H\equiv 0 in B(0;r^)B(0;\hat{r}) which implies that T0T\equiv 0 on 𝕊2\mathbb{S}^{2}. This completes the proof. ∎

Now we present the unique continuation property of any T𝒯T\in\mathcal{T}.

Theorem 8.

For any T𝒯T\in\mathcal{T}, if T=0T=0 on Γ\Gamma, then T0T\equiv 0 on 𝕊2\mathbb{S}^{2}.

Proof.

Since the coefficients αβp\alpha\in\ell_{\beta}^{p} and Γ\Gamma has an open subset, we know from Lemma 5 that if T=0T=0 on Γ\Gamma then T0T\equiv 0 on 𝕊2\mathbb{S}^{2}. ∎

By Parseval’s theorem, for any TL2(𝕊2)T\in L_{2}(\mathbb{S}^{2}), we have TL2(𝕊2)2=α2\|T\|_{L_{2}(\mathbb{S}^{2})}^{2}=\|\alpha\|^{2}, which implies that T0T\equiv 0 if and only if α=0\alpha=0. Hence, we can claim that for any T𝒯T\in\mathcal{T}, 𝒜(T)0\mathcal{A}(T)\equiv 0 if and only if α=0\alpha=0.

Appendix B. Wirtinger gradient

In this appendix, we briefly introduce the Wirtinger gradient of real-valued functions with complex variables. For more details about Wirtinger calculus, we refer to [4, 22, 24].

Let f:nf:\mathbb{C}^{n}\rightarrow\mathbb{R} be a real-valued function of a vector of complex variables znz\in\mathbb{C}^{n}, which can be written as f1(z,z¯)f_{1}(z,{{\bar{z}}}), that is f(z)=f1(z,z¯)f(z)=f_{1}(z,{{\bar{z}}}). Let z=x+iyz=x+iy, where x,ynx,y\in\mathbb{R}^{n}. We say f(z)=f2(x+iy)f(z)=f_{{2}}(x+iy) is Wirtinger differentiable, if f2f_{{{2}}} is differentiable with respect to xx and yy, respectively. Following [4, 22], under the conjugate coordinates (zT,z¯T)Tn×n(z^{T},\bar{z}^{T})^{T}\in\mathbb{C}^{n}\times\mathbb{C}^{n}, the Wirtinger gradient of ff at znz\in\mathbb{C}^{n} is

f(z)=(zf(z)z¯f(z)),\nabla f(z)=\left(\begin{array}[]{c}\partial_{z}f(z)\vspace{1mm}\\ \partial_{\bar{z}}f(z)\\ \end{array}\right),

where zf(z):=f1(z,z¯)z|z¯=const\partial_{z}f(z):=\frac{\partial f_{{1}}(z,\bar{z})}{\partial z}\big{|}_{\bar{z}={\rm{const}}}, z¯f(z):=f1(z,z¯)z¯|z=const.\partial_{\bar{z}}f(z):=\frac{\partial f_{{1}}(z,\bar{z})}{\partial\bar{z}}\big{|}_{z={\rm{const}}}. Since ff is real-valued, zf¯=z¯f\overline{\partial_{z}f}=\partial_{\bar{z}}f. Thus,

f(z)=0  z¯f(z)=0.\nabla f(z)=0\textrm{ }\quad\Leftrightarrow\textrm{ }\quad\partial_{\bar{z}}f(z)=0.

We say znz^{*}\in\mathbb{C}^{n} is a stationary point of ff if it satisfies

z¯f(z)=0.\partial_{\bar{z}}f(z^{*})=0.

It is easy to verify that the Wirtinger gradient of f(z)=z2f(z)=\|z\|^{2}, znz\in\mathbb{C}^{n}, is f(z)=(z¯T,zT)T,\nabla f(z)=(\bar{z}^{T},z^{T})^{T}, and znz^{*}\in\mathbb{C}^{n} is a stationary point of ff if z=0z^{*}=0.