Robust time-of-arrival localization via ADMM

Wenxin Xiong [email protected] Christian Schindelhauer [email protected] Hing Cheung So¹¹1EURASIP Member. [email protected] Department of Computer Science, University of Freiburg, Freiburg 79110, Germany Department of Electrical Engineering, City University of Hong Kong, Hong Kong, China

Abstract

This article considers the problem of source localization (SL) using possibly unreliable time-of-arrival (TOA) based range measurements. Adopting the strategy of statistical robustification, we formulate TOA SL as minimization of a versatile loss that possesses resistance against the occurrence of outliers. We then present an alternating direction method of multipliers (ADMM) to tackle the nonconvex optimization problem in a computationally attractive iterative manner. Moreover, we prove that the solution obtained by the proposed ADMM will correspond to a Karush-Kuhn-Tucker point of the formulation when the algorithm converges, and discuss reasonable assumptions about the robust loss function under which the approach can be theoretically guaranteed to be convergent. Numerical investigations demonstrate the superiority of our method over many existing TOA SL schemes in terms of positioning accuracy and computational simplicity. In particular, the proposed ADMM achieves estimation results with mean square error performance closer to the Cramér-Rao lower bound than its competitors in our simulations of impulsive noise environments.

keywords:

Robust localization, time-of-arrival, alternating direction method of multipliers

^†^†journal: Journal of the Franklin Institute

1 Introduction

Source localization (SL) refers to estimating the position of a signal-emitting target from measurements collected using multiple spatially separated sensors [1]. Owing to its great significance to a lot of location-based applications, such as emergency assistance [2], asset tracking [3], Internet of Things [4], and radar [5], the SL problem has received much attention in the literature over the past decades [6]. Depending on the measurements being used, methods for SL can be roughly divided into two groups: range- and direction-based. The category of range-based SL techniques, as the term range suggests, involves the use of a variety of distance-related location data, including time-of-arrival (TOA) [7, 8, 9, 10, 11, 12, 13, 14, 15, 16], time-difference-of-arrival [9, 12, 14, 17, 18, 19, 20], time-sum-of-arrival [12, 21, 22, 23, 24, 25, 26, 27], and received signal strength (RSS) [15, 16, 28], to determine the source position. Among these schemes, the time-based ones exploit the signal propagation time, while their RSS counterparts harness the signal power strength to acquire distance-type sensor observations. This dichotomy allows time-based SL solutions to excel in precision at the cost of more stringent synchronization requirements, while RSS-based approaches prioritize simplicity and inexpensiveness in implementation. In contrast, direction-based SL techniques utilize the source’s bearings relative to the sensors to fulfill the positioning task [17, 29]. They, like methods built upon the RSS measurement model, can dispense with clock synchronization. However, direction-based positioning mandates the installation of an antenna array at each sensor for angle-of-arrival estimation, resulting in a somewhat higher hardware cost compared to its range-based rivals.

TOA-based SL takes center stage in this contribution, given its stature as one of the most widely adopted and intensively examined distance-based positioning schemes. Such a choice is further justified by the widespread availability of smartphones as well as other mobile devices that possess communication and information processing capabilities, most of which can obtain TOAs without encountering significant technical challenges [6]. Under the Gaussian noise assumption, least squares (LS) has undoubtedly been the very thing one needs to deliver statistically meaningful estimation results. In this vein, a wealth of LS techniques have been proposed in the literature, including algebraic explicit and exact solutions [7, 8, 9], convex programming approaches [10, 11], and local optimization algorithms [12, 13, 14], to name but a few.

Statistical model mismatch will appear in real-world scenarios where the Gaussian noise assumption may be violated due to the presence of unreliable sensor data [30], greatly deteriorating the performance of the LS methodology [31]. To reduce the negative impact of non-Gaussian measurement errors on positioning accuracy, researchers focusing on TOA-based SL have explored the strategies of joint estimation of location coordinates and a balancing parameter [26, 27, 32, 33], worst-case LS criterion [33, 34, 35], and robust statistics [16, 28, 36, 37, 38, 39, 40, 41]. Here, we are particularly interested in the statistical robustification of the TOA-based LS position estimator. Countermeasures of this type have been attracting increasing attention in recent years, mainly because of their relatively low prior knowledge requirements and low complexity of implementation.

Consider a single-source localization system deployed in $H$ -dimensional space. It consists of an emitting source that is to be located, with unknown position $\bm{x}\in\mathbb{R}^{H}$ , and $L$ sensors with signal receiving capabilities and known positions $\{\bm{x}_{i}\in\mathbb{R}^{H}|i=1,...,L\}$ . Synchronization is guaranteed not only among the sensors themselves but also between the source and each sensor, so that the TOAs of the emitted signal at all $L$ sensors can be estimated. Specifically, the TOA-based range measurements are modeled as [1]

{}r_{i}={\|\bm{x}-\bm{x}_{i}\|}_{2}+e_{i},~{}i=1,...,L,

(1)

where ${\|\cdot\|}_{2}$ denotes the $\ell_{2}$ -norm and $e_{i}$ accounts for the observation uncertainty associated with the $i$ th source-sensor path. Under the assumption that $\{e_{i}\}$ are uncorrelated zero-mean Gaussian processes with variances $\{\sigma_{i}^{2}\}$ and in a maximum likelihood (ML) sense [42], the TOA-based SL problem is mathematically stated as [12]

{}\min_{\bm{x}}~{}\sum_{i=1}^{L}\tfrac{(r_{i}-{\|\bm{x}-\bm{x}_{i}\|}_{2})^{2}}{\sigma_{i}^{2}}.

(2)

As mentioned earlier, the $\ell_{2}$ -norm based estimator (2) is vulnerable to outlying (unreliable) data, which manifest themselves as range measurements immersed in non-Gaussian, potentially bias-like errors. Common causes of them in the localization context include non-line-of-sight and multipath propagation of signals, interference, sensor malfunction, and malicious attacks [5, 30, 31, 43]. To achieve resistance against the occurrence of outliers in such adverse situations, (2) was statistically robustified in [16, 28, 38, 39, 40, 41] by substituting the $\ell_{2}$ loss $(\cdot)^{2}$ with a certain fitting error measure less sensitive to biased $\{r_{i}\}$ , i.e.,

{}\min_{\bm{x}}~{}\sum_{i=1}^{L}f(r_{i}-{\|\bm{x}-\bm{x}_{i}\|}_{2}),

(3)

where frequently utilized options for $f(\cdot)$ encompass the $\ell_{1}$ loss $|\cdot|$ [16, 39], the Huber loss:

{}f_{R_{i}}(\cdot)=\left\{\begin{aligned} &(\cdot)^{2},~{}&\textup{for}~{}|\cdot|\leq R_{i}\\ &2R_{i}|\cdot|-R_{i}^{2},~{}&\textup{for}~{}|\cdot|>R_{i}\end{aligned}\right.

(4)

with radius $R_{i}>0$ [16, 38], the Welsch loss $f_{\textup{W}}(\cdot)=\exp\left(-\tfrac{(\cdot)^{2}}{2\sigma_{\textup{W}}^{2}}\right)$ with parameter $\sigma_{\textup{W}}$ [40], and the $\ell_{p}$ loss ${|\cdot|}^{p}$ for $1\leq p<2$ [28, 41]. Below, let us provide a concise overview of several well-established optimization solutions available in the literature for addressing (3). They serve as competitors of our proposal, as we will discuss in Sections 3 and 4. By forming the associated epigraph problem [66]:

	$\displaystyle{}\min_{\bm{x},\bm{\upsilon}=[\upsilon_{1},...,\upsilon_{L}]^{T}\in\mathbb{R}^{L}}~{}\underbrace{\sum_{i=1}^{L}\upsilon_{i}}_{\textup{affine}},~{}~{}\textup{s.t.}~{}$	$\displaystyle\underbrace{r_{i}-\upsilon_{i}}_{\textup{affine}}-\underbrace{{\\|\bm{x}-\bm{x}_{i}\\|}_{2}}_{\textup{convex}}\leq 0,~{}i=1,...,L,$
		$\displaystyle\underbrace{{\\|\bm{x}-\bm{x}_{i}\\|}_{2}}_{\textup{convex}}-\underbrace{r_{i}-\upsilon_{i}}_{\textup{affine}}\leq 0,~{}i=1,...,L,$		(5)

one may directly apply second-order cone programming (SOCP), a classic convex approximation technique, or difference-of-convex programming (DCP) [16] to address the $\ell_{1}$ -minimization version of (3):

{}\min_{\bm{x}}~{}\sum_{i=1}^{L}|r_{i}-{\|\bm{x}-\bm{x}_{i}\|}_{2}|.

(6)

The epigraph form (1) plays a crucial role as the basis for deriving (3) and (3), which represent the SOCP-based solution and DCP-based solution for (6), respectively [16]. Smoothed approximations through the composition of natural logarithm and hyperbolic cosine functions [39], on the other hand, render the Lagrange-type neurodynamic optimization framework of Lagrange programming neural network (LPNN) [67] applicable to

{}\min_{\bm{x}}~{}\sum_{i=1}^{L}f_{1}(r_{i}-{\|\bm{x}-\bm{x}_{i}\|}_{2}),

(7)

where

f_{1}(z)=\tfrac{\log\left(\left(\exp{(\nu z)}+\exp{(-\nu z)}\right)/2\right)}{\nu}

(8)

with parameter $\nu>0$ is also known as (a.k.a.) the smoothed $\ell_{1}$ loss [68]. The idea of turning instead to the Huber convex underestimator for (1) based on the composite loss $f_{R_{i}}([-(\cdot)]^{+})$ was conceptualized in [38], where $[\cdot]^{+}=\max(\cdot,0)$ is the ramp function. Nonetheless, the emphasis of [38] is more on the extension of

{}\min_{\bm{x}}\sum_{i=1}^{L}f_{R_{i}}([{\|\bm{x}-\bm{x}_{i}\|}_{2}-r_{i}]^{+})

(9)

to sensor network node localization, rather than single-source positioning. In [40], the challenging range-based Welsch loss minimization problem:

{}\min_{\bm{x}}\sum_{i=1}^{L}f_{\textup{W}}\big{(}r_{i}-{\|\bm{x}-\bm{x}_{i}\|}_{2}\big{)}

(10)

was approximated by

{}\min_{\bm{x}}\sum_{i=1}^{L}f_{\textup{W}}\big{(}r_{i}^{2}-{\|\bm{x}-\bm{x}_{i}\|}_{2}^{2}\big{)}

(11)

using the squared ranges, after which a half-quadratic (HQ) optimization algorithm was devised. The authors of [41] converted the $\ell_{p}$ -minimization problem:

{}\min_{\bm{x}}~{}\sum_{i=1}^{L}{|r_{i}-{\|\bm{x}-\bm{x}_{i}\|}_{2}|}^{p}

(12)

into an iteratively reweighted LS (IRLS) framework and applied sum-product message passing (MP) to solve the $\bm{x}$ -update subproblem within each IRLS iteration in closed form.

While scaling poorly with the problem dimension is a well-known limitation of convex programming solutions whose implementation relies on interior-point methods, numerical realization of the neurodynamic approach in [39] typically takes several thousands of iterations to reach equilibrium [69]. Despite the low-complexity advantage of the HQ algorithm in [40], the squared-range formulation (11) it adopts inherently suffers from impaired statistical efficiency. The MP-assisted IRLS [41] is computationally attractive, yet the lack of a theoretical foundation to support convergence of it may affect the soundness of the technique. With these concerns in mind, we are motivated to explore new avenues for coping with (3), especially in a way less computationally demanding but more statistically efficient and theoretically complete compared to the existing solutions.

In this contribution, we employ the alternating direction method of multipliers (ADMM) as our algorithmic strategy. ADMM is an iterative optimization solver which works by breaking down the original problem into several more easily solvable subproblems and, technically, it can be seen as an effort to combine the advantages of dual ascent and augmented Lagrangian methods [44]. Having originated in the 1970s with roots dating back to the 1950s [44, 45], this simple but powerful algorithm has witnessed extensive use in the past few decades across various statistical and machine learning problems, including face recognition [46], subspace segmentation [47], portfolio optimization [48], image classification [49], and low-rank matrix recovery [25, 50, 51]. The key reasoning behind the adoption of ADMM in diverse applications of recent interest is that, while initially crafted for convex optimization, numerous extensions of it to the nonconvex setting still allow for lightweight yet reliable implementation and comprehensive theoretical analysis.

Our main contribution in this work is to develop an ADMM approach for tackling the generalized robust TOA positioning problem (3). As the versatility of the cost function $f(\cdot)$ will only be embodied in one of the subproblems that amounts to computing the proximal operator of it, we are able to adapt $f(\cdot)$ to specific noise environments with ease. In an analytical manner, we prove that if the presented ADMM is convergent, the limit point to which it converges will satisfy the Karush-Kuhn-Tucker (KKT) conditions for the equivalent constrained reformulation of (3). We also show the possibility of insuring theoretical convergence of the ADMM, under several additional assumptions about $f(\cdot)$ . Numerical examples are included to corroborate the positioning accuracy and computational simplicity superiorities of our solution over its competitors. It should be pointed out that compared with the ADMM in [12], our scheme holds independent significance in the following two aspects. First, the algorithm in [12] deals with the non-outlier-resistant formulation (2), whereas we aim at solving (3) with a robustness-conferring fitting error measure. Second, we provide analytical convergence results of the presented ADMM. The authors of [12], in contrast, incorrectly assumed the nonconvex constrained optimization reformulation of (2) to be convex, consequently omitting to probe into the convergence of their method under nonconvexity.

The rest of this contribution is organized as follows. The equivalent constrained reformulation of (3) and the ADMM solution to it are described in Section 2. The complexity and convergence properties of the proposed ADMM are discussed in detail in Section 3. Performance evaluations are conducted in Section 4. Ultimately, Section 5 concludes the article.

2 Algorithm development

Let us begin by converting (3) into

\displaystyle{}\min_{\bm{x},\bm{d}}\sum_{i=1}^{L}f(r_{i}-d_{i}),~{}\textup{s.t.}~{}{\left\|\bm{x}-\bm{x}_{i}\right\|}_{2}=d_{i},~{}i=1,...,L,

(13)

where $\bm{d}=\left[d_{1},...,d_{L}\right]\in\mathbb{R}^{L}$ is a dummy vector of decision variables for the source-sensor distances, independent of $\bm{x}$ .

We further rewrite (13) as


$\displaystyle\min_{\bm{x},\bm{d},\{\bm{\beta}_{i}\|{\\|\bm{\beta}_{i}\\|}_{2}^{2}=1\}}$	$\displaystyle\sum_{i=1}^{L}f(r_{i}-d_{i}),$
s.t.	$\displaystyle\bm{x}-\bm{x}_{i}=\bm{\beta}_{i}\cdot d_{i},~{}i=1,...,L,$	(14a)
	$\displaystyle d_{i}\geq 0,~{}i=1,...,L,$	(14b)

by introducing an auxiliary vector $\bm{\beta}=\big{[}\bm{\beta}_{1}^{T},...,\bm{\beta}_{L}^{T}\big{]}^{T}\in\mathbb{R}^{HL}$ into the characterization of range information [12]. Here, $\bm{\beta}_{i}\in\mathbb{R}^{H}$ is a unit vector that indicates the source’s direction with respect to (w.r.t.) the $i$ th sensor, a.k.a. the direction vector of arrival [52].

The transformation from (13) to (14) bears some resemblance to the conversion from (13) to a formulation, where the Euclidean distances are re-expressed in a quadratic form. This conversion, in fact, is a widely adopted preprocessing step in the domain of range-based localization [10, 23], aimed at eliminating the cumbersome Euclidean norm terms in the initial formulation and simplifying the associated constrained optimization problem. In order to achieve an equivalent reformulation during this simplification process, it is crucial to consider the quadratic form of the Euclidean distance constraints along with the nonnegativity constraints on the distance-representing variables [10, 23]. In other words, it would be mathematically less rigorous to casually disregard these nonnegativity constraints without the backing of appropriate theoretical justifications, as demonstrated by the Appendix in [10] and Theorem 1 in [23]. We therefore follow this tradition in our transformation from (13) to (14).

For (14), we construct the following augmented Lagrangian with constraints:

		$\displaystyle\mathcal{L}_{\rho}\left(\bm{x},\bm{d},\bm{\beta},\bm{\lambda}\right)\!=\!\sum_{i=1}^{L}f(r_{i}-d_{i})\!+\!\sum_{i=1}^{L}\bm{\lambda}_{i}^{T}(\bm{x}-\bm{x}_{i}\!-\!\bm{\beta}_{i}\cdot d_{i})+\tfrac{\rho}{2}\sum_{i=1}^{L}{\big{\\|}\bm{x}-\bm{x}_{i}-\bm{\beta}_{i}\cdot d_{i}\big{\\|}}_{2}^{2},$
		$\displaystyle\textup{s.t.}~{}{\\|\bm{\beta}_{i}\\|}_{2}^{2}=1,~{}i=1,...,L,~{}~{}\textup{(\ref{ch:ADMM:eqell_p_ref_1_inequa_cons})},$		(15)

where $\bm{\lambda}=\left[\bm{\lambda}_{1}^{T},...,\bm{\lambda}_{L}^{T}\right]^{T}\in\mathbb{R}^{HL}$ is a vector containing the Lagrange multipliers for (14a) and $\rho>0$ the augmented Lagrangian parameter. Splitting the primal variables into two parts, the ADMM for solving (14) consists of the following iterative steps:


	$\displaystyle{\bm{x}}^{(k+1)}=\arg\min_{\bm{x}}\mathcal{L}_{\rho}\big{(}\bm{x},{\bm{d}}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)},$		(16a)
	$\displaystyle\big{(}{\bm{d}}^{(k+1)},\bm{\beta}^{(k+1)}\big{)}=\arg\min_{{\bm{d}},\bm{\beta}}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},{\bm{d}},\bm{\beta},\bm{\lambda}^{(k)}\big{)},~{}~{}\textup{s.t.}~{}\textup{(\ref{ch:ADMM:eqell_p_ref_1_inequa_cons})},~{}{\\|\bm{\beta}_{i}\\|}_{2}^{2}=1,~{}i=1,...,L,$		(16b)
	$\displaystyle\bm{\lambda}_{i}^{(k+1)}=\rho\big{(}\bm{x}^{(k+1)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k+1)}\cdot d_{i}^{(k+1)}\big{)}+\bm{\lambda}_{i}^{(k)},~{}i=1,...,L,$		(16c)

where the iteration index is indicated by $(\cdot)^{(k)}$ . To be specific, (16a) and (16b) sequentially minimize the augmented Lagrangian (2) w.r.t. the primal variables, after which (16c) updates the dual variables via a gradient ascent with step size $\rho$ . We note that there are only two (namely, $\bm{x}$ and $(\bm{d},\bm{\beta})$ ) rather than three primal blocks in the ADMM governed by (16). By comparison, natural extension of the basic two-block ADMM to multi-block cases may result in divergence [53].

More detailed explanations for (16a) and (16b) are given as follows.

By ignoring the constant terms independent of $\bm{x}$ , the subproblem (16a) can be simplified into an LS form

\displaystyle{}\min_{\bm{x}}

\displaystyle~{}\tfrac{\rho}{2}\sum_{i=1}^{L}{\big{\|}\bm{x}-\bm{w}_{i}^{(k)}\big{\|}}_{2}^{2},

(17)

where

\bm{w}_{i}^{(k)}=\bm{x}_{i}+\bm{\beta}_{i}^{(k)}\cdot d_{i}^{(k)}-\tfrac{1}{\rho}\bm{\lambda}_{i}^{(k)},~{}i=1,...,L.

(18)

It has the following closed-form solution:

{\bm{x}}^{(k+1)}=\sum_{i=1}^{L}\bm{w}_{i}^{(k)}/L.

(19)

Similarly, another subproblem (16b) is re-expressed as

\displaystyle{}\min_{\bm{d},\bm{\beta}}~{}\sum_{i=1}^{L}f(r_{i}-d_{i})+\sum_{i=1}^{L}\tfrac{\rho}{2}{\big{\|}\bm{v}_{i}^{(k+1)}-\bm{\beta}_{i}\cdot d_{i}\big{\|}}_{2}^{2},~{}~{}\textup{s.t.}~{}{\|\bm{\beta}_{i}\|}_{2}^{2}=1,~{}i=1,...,L,~{}\textup{(\ref{ch:ADMM:eqell_p_ref_1_inequa_cons})},

(20)

where

\bm{v}_{i}^{(k+1)}=\bm{x}^{(k+1)}-\bm{x}_{i}+\tfrac{1}{\rho}\bm{\lambda}_{i}^{(k)}.

(21)

Exploiting the particular structure of (20), the optimal $\bm{\beta}$ can be obtained first as

\displaystyle{}\bm{\beta}_{i}^{(k+1)}={\bm{v}_{i}^{(k+1)}}\big{/}{\big{\|}\bm{v}_{i}^{(k+1)}\big{\|}_{2}},~{}i=1,...,L.

(22)

Thus, (20) is reduced to

\displaystyle{}\min_{\bm{d}}~{}\sum_{i=1}^{L}f(r_{i}-d_{i})+\sum_{i=1}^{L}\tfrac{\rho}{2}{\big{(}\big{\|}\bm{v}_{i}^{(k+1)}\big{\|}_{2}-d_{i}\big{)}}^{2},~{}~{}\textup{s.t.}~{}\textup{(\ref{ch:ADMM:eqell_p_ref_1_inequa_cons})},

(23)

which is separable w.r.t. the partition of $\bm{d}$ into its $L$ elements and leads to the following $L$ subproblems:

\displaystyle{}\min_{d_{i}\geq 0}f(r_{i}-d_{i})\!+\!\tfrac{\rho}{2}{\big{(}\big{\|}\bm{v}_{i}^{(k+1)}\big{\|}_{2}-d_{i}\big{)}}^{2},~{}i\!=\!1,...,L.

(24)

With the help of the following proposition, we can simplify (24) to a certain degree to facilitate the calculations.

Proposition 1. Under mild assumptions about the robust loss and range observations that $f(\cdot)$ is an even function strictly increasing on the nonnegative semi-axis and $\{r_{i}\}$ are always nonnegative, coping with (24) will be equivalent to handling

\displaystyle{}\min_{d_{i}}f(r_{i}-d_{i})\!+\!\tfrac{\rho}{2}{\big{(}\big{\|}\bm{v}_{i}^{(k+1)}\big{\|}_{2}-d_{i}\big{)}}^{2},~{}i\!=\!1,...,L.

(25)

Proof. See A.

It is worth noting that the assumptions made in Proposition 1 are actually rather common (see our justification in the following). In such cases (where Proposition 1 applies), solving (24) boils down to computing the proximal operator of $f(\cdot)$ , namely,

{}d_{i}^{(k+1)}=r_{i}-\mathcal{P}_{(1/\rho)f(\cdot)}\big{(}r_{i}-\big{\|}\bm{v}_{i}^{(k+1)}\big{\|}_{2}\big{)},

(26)

complying with the definition of proximal mapping below.

Definition 1. The proximal mapping of a function $f:\mathbb{R}\rightarrow\mathbb{R}$ with parameter $\tau>0$ for any $b\in\mathbb{R}$ is [54]

{}\mathcal{P}_{\tau f(\cdot)}(b)=\arg\min_{a\in\mathbb{R}}f(a)+\tfrac{1}{2\tau}(a-b)^{2}.

(27)

Table 1: Proximal computations for

\ell_{p}

(

1\leq p\leq 2

) and Huber loss functions

Loss type	Ref.	$f(\cdot)$	$\mathcal{P}_{\tau f(\cdot)}(b)$
$\ell_{1}$	[54]	$\|\cdot\|$	$\big{[}b-\tau\big{]}^{+}-\big{[}-b-\tau\big{]}^{+}$
$\ell_{p}$ with $1<p<2$	[55]	${\|\cdot\|}^{p}$	$\left\{\begin{aligned} &\arg\min_{a\in\{0,a^{+}\}}{\|a\|}^{p}+\tfrac{1}{2\tau}(a-b)^{2},~{}&\textup{if}~{}b\geq 0,\\ &\arg\min_{a\in\{a^{-},0\}}{\|a\|}^{p}+\tfrac{1}{2\tau}(a-b)^{2},~{}&\textup{if}~{}b<0.\end{aligned}\right.$
$\ell_{2}$	[56]	${(\cdot)}^{2}$	$\tfrac{b}{1+2\tau}$
Huber	[56]	$f_{R_{i}}(\cdot)$	$b-\tfrac{2\tau R_{i}b}{\max(\|b\|,R_{i}+2\tau R_{i})}$

Obviously, the computational simplicity of the proximal mapping procedure is crucial to the efficient update of $\bm{d}$ in each iteration of the proposed ADMM. In fact, the calculation of (27) can be done in a relatively unencumbered manner or in closed form for many choices of $f(\cdot)$ exhibiting outlier-resistance [54, 55, 56, 57]. As summarized in Table 1²²2For the purpose of completeness, we also include the non-robust case of $p=2$ , in which the $\ell_{p}$ -minimization estimator will coincide with (2), providing optimum estimation performance as long as the disturbances $\{e_{i}\}$ are independent and identically distributed (i.i.d.) Gaussian processes., this article restricts the scope of discussion to two typical options: (i) the $\ell_{p}$ loss ${|\cdot|}^{p}$ for $1\leq p<2$ and (ii) the Huber function $f_{R_{i}}(\cdot)$ . The two corresponding instantiations of (3) are then (12) and

{}\min_{\bm{x}}~{}\sum_{i=1}^{L}f_{R_{i}}(r_{i}-{\|\bm{x}-\bm{x}_{i}\|}_{2}),

(28)

respectively. Our justification for them is given as follows. The $\ell_{p}$ -minimization criterion with $1\leq p<2$ is known to show a considerable degree of robustness against outliers in a wide range of adverse environments [78, 75, 41, 55]. Closely resembling its $\ell_{1}$ (resp. $\ell_{2}$ ) counterpart for large (resp. small) fitting errors, the Huber loss minimization scheme offers another means of allowing somewhat controllability therebetween [58]. Most importantly, all these cost functions are compatible with our assumptions in Proposition 1.

We note that $a^{+}$ and $a^{-}$ in Table 1 are the unique positive root of $\tfrac{a-b}{\tau}+pa^{p-1}=0$ for $a\in[0,b]$ (when $b\geq 0$ ) and unique negative root of $\tfrac{a-b}{\tau}-p(-a)^{p-1}=0$ for $a\in[b,0]$ (when $b<0$ ), respectively, both of which can be conveniently found using simple root-finding algorithms (e.g., bisection or secant, among others). While these root-finding algorithms typically have only a constant time complexity of $\mathcal{O}(1)$ [55], the computational efficiency of the overall methodology may not always be optimal (e.g., they will lead to nested iterations in our $\ell_{p}$ -norm-based ADMM use case). This discrepancy is evident in our CPU time comparisons, where we found that the $\ell_{p}$ -norm-based ADMM method consistently required more runtime than the Huber-loss-based one. However, it is noteworthy that this increase in computational complexity does not significantly hinder the practicality of the corresponding ADMM algorithm. ADMM adopting the $\ell_{p}$ loss for $1<p<2$ , even with the increased complexity, remains far more computationally efficient than several existing convex approximation approaches. Furthermore, in most cases, the Huber variant of our ADMM algorithm, which does not involve any iterative root-finding steps thanks to the closed-form proximal operator for Huber loss, already delivers decent performance. This means that there is usually limited need for invoking the $\ell_{p}$ -norm version of the proposed ADMM technique (with iterative root-finding requirements) in typical TOA SL scenarios.

Input:

\{\bm{x}_{i}\}

\{r_{i}\}

\rho

, and

\delta

Initialize:

\bm{x}^{(0)}

\bm{d}^{(0)}

\bm{\beta}^{(0)}

, and

\bm{\lambda}^{(0)}

randomly and feasibly.

for $k=0,1,\cdots$ do

for $i=1,\cdots,L$ do

\bm{w}_{i}^{(k)}\leftarrow\bm{x}_{i}+\bm{\beta}_{i}^{(k)}\cdot d_{i}^{(k)}-\tfrac{1}{\rho}\bm{\lambda}_{i}^{(k)}

;

end for

\bm{x}^{(k+1)}\leftarrow\sum_{i=1}^{L}\bm{w}_{i}^{(k)}/L

;

for $i=1,\cdots,L$ do

\bm{v}_{i}^{(k+1)}\leftarrow\bm{x}^{(k+1)}-\bm{x}_{i}+\tfrac{1}{\rho}\bm{\lambda}_{i}^{(k)}

;

\bm{\beta}_{i}^{(k+1)}\leftarrow{\bm{v}_{i}^{(k+1)}}\big{/}{\big{\|}\bm{v}_{i}^{(k+1)}\big{\|}_{2}}

;

d_{i}^{(k+1)}\leftarrow r_{i}-\mathcal{P}_{(1/\rho)f(\cdot)}\big{(}r_{i}-\big{\|}\bm{v}_{i}^{(k+1)}\big{\|}_{2}\big{)}

;

end for

for $i=1,\cdots,L$ do

\bm{\lambda}_{i}^{(k+1)}\leftarrow\rho\big{(}\bm{x}^{(k+1)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k+1)}\cdot d_{i}^{(k+1)}\big{)}+\bm{\lambda}_{i}^{(k)}

;

end for

Stop if

\sum_{i=1}^{L}{\big{\|}\bm{x}^{(k)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k)}\cdot d_{i}^{(k)}\big{\|}}_{2}<\delta

end for

\tilde{\bm{x}}\leftarrow\bm{x}^{(k+1)}

;

Output: Estimate of source location

\tilde{\bm{x}}

Algorithm 1 ADMM for Robust TOA Positioning.

The steps of ADMM for dealing with the robust TOA SL problem are summarized in Algorithm 1, where $\delta$ is a user-defined tolerance to terminate the iterations.

3 Complexity and convergence properties

The complexity and convergence properties of the proposed ADMM are discussed in detail in this section.

From Algorithm 1, it is not difficult to find that the complexity of the ADMM is dominated by that of the $\bm{d}$ -update steps, in which proximal mapping calculations are involved. Therefore, the total complexity of Algorithm 1 will be $\mathcal{O}(N_{\textup{ADMM}}L)$ if the loss function taken from Table 1 admits closed-form $\bm{d}$ -updates and $\mathcal{O}(N_{\textup{ADMM}}KL)$ otherwise, where $N_{\textup{ADMM}}$ represents the iteration number of the ADMM and $K$ the number of searches involved in the root-finding process at each ADMM iteration. Empirically, a few steps of the bisection method applied in each $\bm{d}$ -update and a few tens to hundreds of ADMM iterations will already be sufficient to yield an accurate source location estimate.

By comparison, based on interior-point methods, solving the SOCP problem derived from (1) [16]:

	$\displaystyle{}\min_{\bm{x},\bm{\upsilon},\bm{d}}~{}\sum_{i=1}^{L}\upsilon_{i},~{}~{}\textup{s.t.}~{}$	$\displaystyle\big{\|}r_{i}-d_{i}\big{\|}\leq\upsilon_{i},~{}i=1,...,L,$
		$\displaystyle{\\|\bm{x}-\bm{x}_{i}\\|}_{2}\leq d_{i},~{}i=1,...,L,$		(29)

implementing $N_{\textup{CCCP}}$ concave-convex procedure (CCCP) iterations in tackling the DCP problem (1) [16]:

$\displaystyle{}\big{(}\bm{x}^{(k+1)},\bm{\upsilon}^{(k+1)}\big{)}=\arg\min_{\bm{x},\bm{\upsilon}}~{}\sum_{i}^{L}\upsilon_{i},~{}~{}\textup{s.t.}$	$\displaystyle~{}r_{i}-\upsilon_{i}-{\\|\bm{x}^{(k)}-\bm{x}_{i}\\|}_{2}-\big{[}\bm{\nabla}_{[\bm{x}^{T},\bm{\upsilon}^{T}]^{T}}\big{(}{\\|\bm{x}^{(k)}-\bm{x}_{i}\\|}_{2}\big{)}\big{]}^{T}$
	$\displaystyle~{}\big{(}\big{[}\bm{x}^{T},\bm{\upsilon}^{T}\big{]}^{T}-\big{[}\big{(}\bm{x}^{(k)}\big{)}^{T},\big{(}\bm{\upsilon}^{(k)}\big{)}^{T}\big{]}^{T}\big{)}\leq 0,~{}i=1,...,L,$
	$\displaystyle{\\|\bm{x}-\bm{x}_{i}\\|}_{2}-r_{i}-\upsilon_{i}^{(k)}-\big{[}\bm{\nabla}_{[\bm{x}^{T},\bm{\upsilon}^{T}]^{T}}\big{(}\upsilon_{i}^{(k)}\big{)}\big{]}^{T}$
	$\displaystyle~{}\big{(}\big{[}\bm{x}^{T},\bm{\upsilon}^{T}\big{]}^{T}-\big{[}\big{(}\bm{x}^{(k)}\big{)}^{T},\big{(}\bm{\upsilon}^{(k)}\big{)}^{T}\big{]}^{T}\big{)}\leq 0,~{}i=1,...,L,$
	$\displaystyle~{}k=0,...,N_{\textup{CCCP}}-1,$	(30)

and realizing the ramp function based Huber convex underestimator (9) [38] result in $\mathcal{O}(L^{3.5})$ , $\mathcal{O}(N_{\textup{CCCP}}L^{3.5})$ , and $\mathcal{O}(L^{3.5})$ complexity, respectively [19]. The LPNN [39], the HQ technique [40], and the MP-assisted IRLS algorithm [41, 28] are three other representative TOA positioning schemes from the literature that adopt the strategy of statistical robustification as well. They lead to $\mathcal{O}(N_{\textup{LPNN}}L)$ , $\mathcal{O}(N_{\textup{HQ}}KL)$ , and $\mathcal{O}(N_{\textup{IRLS}}L)$ complexity, respectively, where $N_{\textup{LPNN}}$ is the number of iterations in the numerical implementation of LPNN, $N_{\textup{HQ}}$ the number of steps needed for the HQ algorithm to converge, and $N_{\textup{IRLS}}$ that of the IRLS iterations. $N_{\textup{HQ}}$ typically takes a value of several tens, as does $K$ and $N_{\textup{IRLS}}$ , while $N_{\textup{LPNN}}$ is generally in the range of several thousands [40, 41, 69]. It turns out that the proposed ADMM with loss functions permitting closed-form proximal computations, the HQ algorithm in [40], and the IRLS in [41] are most computationally efficient.

With an appropriately selected value of $\rho$ , Algorithm 1 is observed to be always convergent in our simulations. However, many existing theoretical analyses of ADMM convergence in nonconvex scenarios, such as [59], may not be directly applicable to our context of robust TOA positioning due to two primary reasons. First, the optimization paradigms (of which convergence analyses were carried out) in many of these established analytical works differ from the constrained reformulation of the robust TOA positioning problem we address in our research. For instance, in [59], any constraints on the optimization variables that are not linear constraints are treated as indicator functions and incorporated into the objective function of the optimization framework. In our problem (14), nevertheless, we explicitly impose unit-length constraints on the direction-indicating variables and nonnegativity constraints on the distance-representing variables, and even the equality constraint functions involved with (14a) are not affine as per their definitions. Such discrepancies, in turn, result in distinct variable-update strategies between the ADMM algorithms considered in the relevant existing studies and our research. We note that similar dilemmas have been faced by many practitioners, despite empirically sound ADMM use cases in their respective contributions [12, 37, 55, 60]. In what follows, we will present our own analytic proofs for the convergence of Algorithm 1 in lieu of pinning all hopes on the generally applicable results from the literature.

Establishing the equivalence between (14) and

\displaystyle{}\min_{\bm{x},\bm{d},\{\bm{\beta}_{i}|{\|\bm{\beta}_{i}\|}_{2}^{2}=1\}}\sum_{i=1}^{L}f(r_{i}-d_{i}),~{}~{}\textup{s.t.}~{}\bm{x}-\bm{x}_{i}=\bm{\beta}_{i}\cdot d_{i},~{}i=1,...,L,

(31)

the following proposition points out that (14b) can be disregarded and we may shift our focus to the simplification (31) instead in the convergence analysis:

Proposition 2. Under the same assumptions as in Proposition 1, the formulations (14) and (31) are equivalent to one another.

Proof. The proof of Proposition 2 is similar to that of Proposition 1. Please see B for the details.

Next, we derive the following theorem for the optimality of tuples produced by Algorithm 1:

Theorem 1. Let $\big{\{}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{\}}_{k=1,...}$ be the tuples of primal and dual variables generated by Algorithm 1. If

{}\lim_{k\rightarrow\infty}\big{\{}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{\}}=(\bm{x}^{\star},\bm{d}^{\star},\bm{\beta}^{\star},\bm{\lambda}^{\star}),

(32)

then the limit $(\bm{x}^{\star},\bm{d}^{\star},\bm{\beta}^{\star},\bm{\lambda}^{\star})$ will satisfy the KKT conditions (a.k.a. the first-order necessary conditions [61]) for (31)³³3In cases where $f(\cdot)$ is non-differentiable at the origin, the condition (33b) should be substituted with the inclusion: $\bm{0}_{L}\in\hat{\partial}_{\bm{d}}(\bm{x}^{\star},\bm{d}^{\star},\bm{\beta}^{\star},\bm{\lambda}^{\star})$ [62], where $\hat{\partial}_{\bm{d}}(\cdot)$ denotes the generalized gradient of a function at $\bm{d}$ that takes into consideration the subdifferential calculus [63]. For simplicity’s sake, we confine our analyses here to the special case with a differentiable $f(\cdot)$ . Nevertheless, it is worth pointing out that with slight modifications, the results can be readily extended to the scenarios in the presence of non-differentiability.:


	$\displaystyle\bm{\nabla}_{\bm{x}}\hat{\mathcal{L}}(\bm{x}^{\star},\bm{d}^{\star},\bm{\beta}^{\star},\bm{\lambda}^{\star})=\bm{0}_{H},$		(33a)
	$\displaystyle\bm{\nabla}_{\bm{d}}\hat{\mathcal{L}}(\bm{x}^{\star},\bm{d}^{\star},\bm{\beta}^{\star},\bm{\lambda}^{\star})=\bm{0}_{L},$		(33b)
	$\displaystyle\bm{\nabla}_{\bm{\beta}}\hat{\mathcal{L}}(\bm{x}^{\star},\bm{d}^{\star},\bm{\beta}^{\star},\bm{\lambda}^{\star})=\bm{0}_{HL},$		(33c)
	$\displaystyle\bm{x}^{\star}-\bm{x}_{i}=\bm{\beta}_{i}^{\star}\cdot d_{i}^{\star},~{}i=1,...,L,$		(33d)
	$\displaystyle{\\|\bm{\beta}_{i}^{\star}\\|}_{2}^{2}=1,~{}i=1,...,L,$		(33e)

where

\displaystyle\hat{\mathcal{L}}(\bm{x},\bm{d},\bm{\beta},\bm{\lambda})=\sum_{i=1}^{L}f(r_{i}-d_{i})+\sum_{i=1}^{L}\bm{\lambda}_{i}^{T}(\bm{x}-\bm{x}_{i}-\bm{\beta}_{i}\cdot d_{i})

(34)

is the associated Lagrangian of (31).

Proof. See C.

Remark 1. Theorem 1 does not guarantee the convergence of the proposed ADMM. Instead, it only reveals the optimality of solution when the algorithm converges. To further improve the analytical completeness, one may borrow the idea from [60] to relax the formulation (31) somewhat by introducing additional auxiliary variables and an extra quadratic penalty, whereby rigorously proving the convergence of the generated sequence is made possible. This will, however, give rise to degradation of the estimator’s performance [60] (i.e., the incompleteness of analytical convergence results can be seen as an acceptable trade-off for higher estimation accuracy). For avoiding such performance-impairing approximations to the source localization formulation, one would have to limit the scope of choice of $f(\cdot)$ , as elaborated in the following statements.

Theorem 2. If $f(r_{i}-d_{i})$ is convex w.r.t. $d_{i},\forall i\in\{1,\ldots,L\}$ , and for some constant $\bar{M}$ , the gradient of $f(r_{i}-d_{i})$ satisfies that $\lvert\lvert\nabla_{d_{i}}f(r_{i}-d_{i})|_{d_{i}=d_{i,1}}\rvert+\lvert\nabla_{d_{i}}f(r_{i}-d_{i})|_{d_{i}=d_{i,2}}\rvert\rvert\leq\bar{M}{\lvert d_{i,1}-d_{i,2}\rvert}$ , the sequence:

\big{\{}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}\big{\}}_{k=1,...}

(35)

will be monotonically nonincreasing as the iterations (16a)–(16c) proceed, for a sufficiently large $\rho$ .

Proof. See D.

Theorem 3. With the assumptions made about $f(\cdot)$ in Proposition 1 and Theorem 2, and an additional assumption that the gradient of $f(\cdot)$ at $d_{i}$ is Lipschitz continuous with a constant $M_{1}$ , $\mathcal{L}_{\rho}\big{(}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}$ is bounded from below for sufficiently large $\rho$ .

Proof. See E.

Corollary 1. $\big{\{}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}\big{\}}_{k=1,...}$ is convergent under the above assumptions about $f(\cdot)$ and $\rho$ .

Proof. This corollary is established via Theorems 2 and 3. $\square$

Equipped with these promising tools, we are finally enabled to present the following theorem, which asserts that Algorithm 1 can be theoretically guaranteed to converge with certain reasonable assumptions.

Theorem 4. The sequence $\big{\{}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{\}}_{k=1,...}$ is convergent, namely, (32) holds, under the same circumstances as in Corollary 1.

Proof. See F.

Corollary 2. Let us adopt the setting of Corollary 1. The solution obtained by Algorithm 1 corresponds to a KKT point of the nonconvex constrained optimization problem (31).

Proof. This corollary is established via Theorems 1 and 4. $\square$

4 Simulation results

Table 2: Robust TOA positioning algorithms considered in comparisons

Description	Ref.	Problem	Abbr.
SOCP for $\ell_{1}$ -minimization	[16]	(3)	$\ell_{1}$ -SOCP
CCCP-based DCP for $\ell_{1}$ -minimization	[16]	(3)	$\ell_{1}$ -DCP
Huber convex underestimator	[38]	(9)	Huber-CUE
LPNN for smoothed $\ell_{1}$ -minimization	[39]	(7)	$\ell_{1}$ -LPNN
HQ algorithm for Welsch loss minimization	[40]	(11)	Welsch-HQ
MP-assisted IRLS for $\ell_{p}$ -minimization ( $1\leq p<2$ )	[41]	(12)	$\ell_{p}$ -IRLS
Proposed ADMM for $\ell_{p}$ -minimization ( $1\leq p\leq 2$ )	–	(12)	$\ell_{p}$ -ADMM
Proposed ADMM for Huber loss minimization	–	(28)	Huber-ADMM

In this section, we carry out computer simulations to evaluate the performance of Algorithm 1. Comparisons are made between the proposed ADMM and several existing TOA-based location estimators that are also built upon robust statistics. Table 2 gives a summary of these methods, and we note that the convex approximation techniques are all realized using the MATLAB CVX package [71]. To implement the LPNN [39], we invoke the MATLAB routine ode15s, a variable-step, variable-order solver relying on the formulas for numerical differentiation of orders 1 to 5 [72].

An SL system with $H=2$ and $L=8$ is considered. Unless otherwise mentioned, the locations of the source and sensors are randomly generated inside an origin-centered 20 m $\times$ 20 m square area, in each of the Monte Carlo (MC) runs. The positioning accuracy metric is the root-mean-square error (RMSE), defined as

\textup{RMSE}=\sqrt{\sum_{j=1}^{N_{\textup{MC}}}\tfrac{{\left\|\tilde{\bm{x}}^{\{j\}}-{\bm{x}}^{\{j\}}\right\|}_{2}^{2}}{N_{\textup{MC}}}},

(36)

where $N_{\textup{MC}}$ denotes the total number of MC runs and is fixed at $3000$ here, and $\tilde{\bm{x}}^{\{j\}}$ represents the estimate of the source position, ${\bm{x}}^{\{j\}}$ , in the $j$ th MC run. User-specified parameters of the ADMM are set as $\delta={10}^{-5}$ and $\rho=5$ ⁴⁴4It is essential to emphasize that the optimal selection of $\rho$ depends on both the problem scale and the specific setup of the loss function, i.e., careful consideration is required to ensure its optimality. For instance, when all variables in (14) are increased by a factor of 10, the last term of the augmented Lagrangian (2) will increase by a factor of 100, while the first two terms will increase by a factor of only 10 when using, e.g., the $\ell_{1}$ loss function. To maintain a balance between the two parts of the augmented Lagrangian, it is necessary to decrease $\rho$ to some extent, optimizing the numerical performance of ADMM. Nonetheless, for the sake of simplicity, our simulations exclusively employ a fixed $\rho$ -setting with $\rho=5$ , though this value may not be optimal in every scenario. We anticipate a thorough exploration of the impacts of $\rho$ to be a topic for future research, as explained in Section 5.. The tunable parameter $\sigma_{\textup{W}}$ of the Welsch loss is adaptively chosen according to the Silverman’s heuristic [73], in the same way as shown in [40]. The smoothing parameter $\nu$ is set to 0.1 to ensure feasibility of the ode15s solver.

Refer to caption — Figure 1: PDF plots for stable distributions with $\gamma=1$ and $\mu=0$ .

Taking into account the presence of outliers, we follow [41, 75] to use the class of $\alpha$ -stable distributions for modeling $\{e_{i}\}$ in (1). Stable processes are well known for their suitability to characterize skewness and heavy-tailedness. Except for a few special instances, members of the stable distribution family do not have an explicit expression of probability density function (PDF). Instead, their PDF $p(z)$ is implicitly described through the inverse Fourier transform of the characteristic function $\Phi(t;\alpha,\zeta,\gamma,\mu)$ :

\displaystyle p(z)=\tfrac{1}{2\pi}\int_{-\infty}^{\infty}\Phi(t)\exp(-jzt)\textup{d}t,

(37)

where the detailed analytical parameterization of $\Phi(t)$ can be found in [76]. As one may see, there are four parameters defining the family. The stability parameter, $0<\alpha\leq 2$ , controls the tails of the distribution. Generally speaking, the smaller the value of $\alpha$ , the heavier the tails and the more impulsive the random variable being modeled. $\mu$ determines the location. The skewness parameter, $-1\leq\zeta\leq 1$ , is a measure of asymmetry. In the simplest case where $\zeta=0$ , the distribution becomes symmetric about its mean (which is, $\mu$ , when $\alpha>1$ ) and degenerates into the so-called symmetric $\alpha$ -stable ( $S\alpha S$ ) distribution. By contrast, the distribution is said to be right-skewed (resp. left-skewed) for $\zeta>0$ (resp. $\zeta<0$ ). $\gamma>0$ is the scale parameter, measuring to what extent the distribution spreads out (similar to the variance of the normal distribution). For illustration purposes, Fig. 1 plots the PDF functions for several representative choices of $\alpha$ and $\zeta$ . As in our case, the stable-distributed range measurement noise $\{e_{i}\}$ is assumed to be i.i.d. with stability, skewness, scale, and location parameters $\alpha$ , $\zeta=0$ , $\gamma$ , and $\mu=0$ , respectively.

Because the variance of the stable distribution is undefined for $\alpha<2$ , we introduce the concept of generalized signal-to-noise ratio (GSNR) from [41] to quantify the relative noise level:

\displaystyle\textup{GSNR}=10\log_{10}\left(\tfrac{\sum_{i=1}^{L}{\|\bm{x}-\bm{x}_{i}\|}_{2}^{2}}{L{\gamma}^{\alpha}}\right).

(38)

Furthermore, the square root of the trace of the MC-approximated Cramér-Rao lower bound matrix (termed RCRLB) [77] is included in the comparison results to offer a benchmark for the accuracy of different robust position estimators in i.i.d. non-Gaussian noise. As it is in general difficult to work out a consistent schedule for adjusting the Huber radius under stably modeled non-Gaussianity, we simply follow [74] to assign a fixed value of 1 to $R_{i}$ .

Starting off, Figs. 2 and 3 show the convergence behavior of $\ell_{p}$ -ADMM ( $p$ taking different values from 1 to 2) and Huber-ADMM in a single MC run for $\alpha=1.5$ and GSNR $=20$ dB. We should also note that for reproducibility, in this test $L=8$ sensors are evenly placed on the perimeter of the afore-defined square region and the source is deterministically deployed at $\bm{x}=[2,3]^{T}$ m. Both $\ell_{p}$ -ADMM and Huber-ADMM are observed to rapidly decrease the objective function and converge to a point close to the true source position, within the first few tens of iterations. In our following simulation studies, the value of $p$ required by $\ell_{p}$ -ADMM and $\ell_{p}$ -IRLS will be set to the optimal one hinging on $\alpha$ [79], in a way analogous to [41, 75].

Fig. 4 depicts the RMSE versus GSNR $\in[17,25]$ dB while fixing $\alpha$ at $1.5$ . It is seen that the two proposed ADMM approaches deliver the lowest level of positioning errors for all GSNRs. Especially, the RMSE performance of $\ell_{p}$ -ADMM is the closest to the RCRLB benchmark (with a small gap of about half a meter). Among the six competitors, $\ell_{p}$ -IRLS produces the best results. This reconfirms the findings reported in [41], that $\ell_{p}$ -IRLS can be fairly statistically efficient in impulsive noise if $p$ is optly chosen. The direct nonlinear optimization based $\ell_{1}$ -minimization techniques, $\ell_{1}$ -DCP and $\ell_{1}$ -LPNN, are slightly inferior to $\ell_{p}$ -IRLS due to model mismatch but still capable of outperforming the remaining schemes, whose estimation performance looks relatively poorer. $\ell_{1}$ -LPNN will not be comparable to $\ell_{1}$ -DCP as the GSNR decreases from 25 dB, attributed to the smoothed approximations introduced in (7). Nonetheless, the former does not have tightness issues from which the traditional convex relaxation methods $\ell_{1}$ -SOCP and Huber-CUE suffer, nor is it quite as statistically inefficient as Welsch-HQ.

Setting the GSNR to 20 dB and the impulsiveness-controlling parameter as $\alpha=1.5$ , we plot the RMSE versus the number of sensors used for localization, $L\in[5,9]$ , in Fig. 5. Again, the performance of the two ADMM solutions is optimal. Although none of the estimators can attain RCRLB, our algorithms are still the closest ones to it, particularly in the scenarios with a large number of sensors. Fig. 6 further plots the RMSE as a function of $\alpha\in[1.1,1.9]$ for $L=8$ and GSNR $=20$ dB. In this scenario, once more, $\ell_{p}$ -ADMM stands out as superior compared to other approaches.

Table 3: Average CPU time for simulations conducted on a laptop with 16 GB of memory and a 4.7 GHz CPU

Algorithm	Time (s)
$\ell_{1}$ -SOCP	0.677
$\ell_{1}$ -DCP	3.261
Huber-CUE	1.669
$\ell_{1}$ -LPNN	0.158
Welsch-HQ	0.026
$\ell_{p}$ -IRLS	0.003
$\ell_{p}$ -ADMM	0.483
Huber-ADMM	0.010

In Table 3, we list the per-sample average CPU time of the eight location estimators in the above simulations. It is seen that the running time can vary by several orders of magnitude. Huber-ADMM that allows for closed-form proximal mapping of $f(\cdot)$ and Welsch-HQ are the second and third fastest, respectively, taking only slightly more time than the state-of-the-art method $\ell_{p}$ -IRLS. Because of the extra bisection procedure incorporated into the ADMM algorithm, $\ell_{p}$ -ADMM may not be as computationally efficient as Huber-ADMM in actual CPU time comparisons. Nonetheless, its problem-solving process still takes less time than those of the traditional convex approximation techniques. In general, the simulation results agree with our complexity analysis in Section 3.

5 Conclusion and future research directions

This article focused on the problem of outlier-resistant TOA SL. Based on the ADMM, we presented an iterative algorithm to handle the statistically robustified positioning formulation. Each iteration of our ADMM consists of two primal variable minimization steps and a dual variable update, all of which can be effortlessly implemented provided that the loss function relied on allows for convenient proximal computations. What is more, we proved that the ADMM will converge to a KKT point of the nonconvex constrained optimization problem under certain conditions, and verified that the LICQ holds at the point for nontrivial SL configurations. The superiority of the devised scheme over a number of robust statistical TOA SL methods in terms of positioning accuracy in impulsive noise and computational simplicity was demonstrated through simulations.

Future research endeavors could involve (i) deriving analytical convergence results under a milder setting of $\rho$ , (ii) embarking on a more detailed exploration of the influences of $\rho$ , and (iii) extending ADMM to the novel use case of robust direction-based SL with outliers.

Appendix A Proof of Proposition 1

Denote the globally optimal solution to (25) by $\bm{d}^{*}$ ⁵⁵5We stipulate that the asterisk and star symbols in this paper apply to each element of the corresponding vectors by default.. Straightforwardly, the proposition will hold if we are able to show that $d_{i}^{*}\geq 0$ holds $\forall i\in\{1,...,L\}$ . Based on the assumptions made and the reverse triangle inequality $||a|-|b||\leq|a-b|$ , we have


	$\displaystyle f(r_{i}-d_{i}^{})+\tfrac{\rho}{2}{\big{(}\big{\\|}\bm{v}_{i}^{(k+1)}\big{\\|}_{2}-d_{i}^{}\big{)}}^{2}$		(39a)
	$\displaystyle~{}=f(\|r_{i}-d_{i}^{}\|)+\tfrac{\rho}{2}{\big{(}\big{\|}\big{\\|}\bm{v}_{i}^{(k+1)}\big{\\|}_{2}-d_{i}^{}\big{\|}\big{)}}^{2}$		(39b)
	$\displaystyle~{}\geq f(\|\|r_{i}\|-\|d_{i}^{}\|\|)+\tfrac{\rho}{2}{\big{(}\big{\|}\big{\|}\big{\\|}\bm{v}_{i}^{(k+1)}\big{\\|}_{2}\big{\|}-\big{\|}d_{i}^{}\big{\|}\big{\|}\big{)}}^{2}$		(39c)
	$\displaystyle~{}=f(\|r_{i}\|-\|d_{i}^{}\|)+\tfrac{\rho}{2}{\big{(}\big{\|}\big{\\|}\bm{v}_{i}^{(k+1)}\big{\\|}_{2}\big{\|}-\big{\|}d_{i}^{}\big{\|}\big{)}}^{2}$		(39d)
	$\displaystyle~{}=f(r_{i}-\|d_{i}^{}\|)+\tfrac{\rho}{2}{\big{(}\big{\\|}\bm{v}_{i}^{(k+1)}\big{\\|}_{2}-\big{\|}d_{i}^{}\big{\|}\big{)}}^{2}.$		(39e)

which means that the objective function value associated with the global optimum $\bm{d}=\bm{d}^{*}$ is greater than or equal to its counterpart associated with $\bm{d}=\textup{abs}(\bm{d}^{*})$ , where $\textup{abs}(\cdot)$ stands for the element-wise absolute value function. Namely, $\bm{d}^{*}$ will no longer be the globally optimal solution if the inequality in (39c) holds strictly. It then follows that $\geq$ in (39c) should degrade into $=$ , which holds if and only if $d_{i}^{*}=|d_{i}^{*}|,\forall i\in\{1,...,L\}$ since the sum of two strictly monotonic functions of the same kind of monotonicity is still a monotonic function. Therefore, $d_{i}^{*}\geq 0$ is tacitly satisfied $\forall i\in\{1,...,L\}$ . $\square$

Appendix B Proof of Proposition 2

Let $(\bm{x}^{*},\bm{d}^{*},\bm{\beta}^{*})$ be the point corresponding to the globally optimal solution to (31). To prove the proposition, it suffices to show that $d_{i}^{*}\geq 0$ holds $\forall i\in\{1,...,L\}$ . Based on the assumptions made and the reverse triangle inequality, we have


	$\displaystyle\sum_{i=1}^{L}f(r_{i}-d_{i}^{})=\sum_{i=1}^{L}f(\|r_{i}-d_{i}^{}\|)$		(40a)
	$\displaystyle~{}\geq\sum_{i=1}^{L}f(\|\|r_{i}\|-\|d_{i}^{*}\|\|)$		(40b)
	$\displaystyle~{}=\sum_{i=1}^{L}f(\|r_{i}\|-\|d_{i}^{}\|)=\sum_{i=1}^{L}f(r_{i}-\|d_{i}^{}\|).$		(40c)

This implies that the objective function value associated with the global optimum $(\bm{x}^{*},\bm{d}^{*},\bm{\beta}^{*})$ is greater than or equal to that achieved by the feasible solution point $\big{(}\bm{x}^{*},\textup{abs}(\bm{d}^{*}),\hat{\bm{\beta}}^{*}\big{)}$ , where $\hat{\bm{\beta}}=\big{[}\hat{\bm{\beta}}_{1}^{T},...,\hat{\bm{\beta}}_{L}^{T}\big{]}^{T}\in\mathbb{R}^{HL}$ , and $\hat{\bm{\beta}}_{i}^{*}={\bm{\beta}}_{i}^{*}\cdot\textup{sgn}(d_{i}^{*})$ for $i=1,...,L$ . In other words, $(\bm{x}^{*},\bm{d}^{*},\bm{\beta}^{*})$ will not be the globally optimal solution point anymore if the inequality in (40b) holds strictly. It then follows that the symbol $\geq$ in (40b) should be replaced by $=$ , which holds if and only if $d_{i}^{*}=|d_{i}^{*}|,\forall i\in\{1,...,L\}$ . Therefore, $d_{i}^{*}\geq 0$ is tacitly satisfied $\forall i\in\{1,...,L\}$ . $\square$

Appendix C Proof of Theorem 1

Since (22) renders the solution to the $\bm{\beta}$ -subproblem in each of the ADMM iterations certainly feasible, the condition (33e) is satisfied. On the other hand, based on the dual variable update (16c) and our assumption about $\lim_{k\rightarrow\infty}\bm{\lambda}^{(k)}$ in (32), it is straightforward to deduce that the condition (33d) is satisfied as well. We now proceed to check the remaining conditions (33a)–(33c). As ${\bm{x}}^{(k+1)}$ and $\big{(}{\bm{d}}^{(k+1)},\bm{\beta}^{(k+1)}\big{)}$ are the minimizers of the subproblems (16a) and (16b), respectively, the following relations hold:


	$\displaystyle\bm{\nabla}_{\bm{x}}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},{\bm{d}}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}=\bm{0}_{H},$		(41a)
	$\displaystyle\bm{\nabla}_{\bm{d}}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},{\bm{d}}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}=\bm{0}_{L},$		(41b)
	$\displaystyle\bm{\nabla}_{\bm{\beta}}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},{\bm{d}}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}=\bm{0}_{HL}.$		(41c)

Putting (32), (41a), and the verified condition (33d) together, we have

		$\displaystyle\lim_{k\rightarrow\infty}\bm{\nabla}_{\bm{x}}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},{\bm{d}}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}=\bm{0}_{H}=\lim_{k\rightarrow\infty}\sum_{i=1}^{L}\big{[}\bm{\lambda}_{i}^{(k)}+\rho\big{(}\bm{x}^{(k+1)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k)}\cdot d_{i}^{(k)}\big{)}\big{]}$
		$\displaystyle~{}=\sum_{i=1}^{L}\bm{\lambda}_{i}^{\star}=\bm{\nabla}_{\bm{x}}\hat{\mathcal{L}}(\bm{x}^{\star},\bm{d}^{\star},\bm{\beta}^{\star},\bm{\lambda}^{\star}).$		(42)

Analogously, it follows from (32), (33d), and (41b) that

		$\displaystyle\lim_{k\rightarrow\infty}\bm{\nabla}_{\bm{d}}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},{\bm{d}}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}=\bm{0}_{L}=\big{[}-f^{\prime}(\cdot)-{(\bm{\lambda}_{1}^{\star})}^{T}\bm{\beta}_{1}^{\star},...,-f^{\prime}(\cdot)-{(\bm{\lambda}_{L}^{\star})}^{T}\bm{\beta}_{L}^{\star}\big{]}^{T}$
		$\displaystyle~{}=\bm{\nabla}_{\bm{d}}\hat{\mathcal{L}}(\bm{x}^{\star},\bm{d}^{\star},\bm{\beta}^{\star},\bm{\lambda}^{\star}),$		(43)

and (32), (33d), and (41c) that

		$\displaystyle\lim_{k\rightarrow\infty}\bm{\nabla}_{\bm{\beta}}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},{\bm{d}}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}=\bm{0}_{HL}=\big{[}-d_{1}^{\star}(\bm{\lambda}_{1}^{\star})^{T},...,-d_{L}^{\star}(\bm{\lambda}_{L}^{\star})^{T}\big{]}^{T}$
		$\displaystyle~{}=\bm{\nabla}_{\bm{\beta}}\hat{\mathcal{L}}(\bm{x}^{\star},\bm{d}^{\star},\bm{\beta}^{\star},\bm{\lambda}^{\star}).$		(44)

The conditions (33a)–(33c) are verified by the above equalities.

Thus far, we have wrapped up the proof for the core component of Theorem 1. Our attention will now shift towards verifying the linear independence constraint qualification (LICQ), which has been one of the most commonly adopted constraint qualifications and is necessary to guarantee that an optimal solution will indeed adhere to the KKT conditions [64]. Put differently, the significance of this verification lies in the fact that an optimization problem can potentially possess an optimal solution while lacking KKT points [65]. Moreover, the uniqueness of the Lagrange multipliers can be ensured by the verification of LICQ [61]. In our situation, LICQ pertains to the linear independence of constraint gradients at the local solution point $\bm{y}^{\star}=\big{[}(\bm{x}^{\star})^{T},(\bm{d}^{\star})^{T},(\bm{\beta}^{\star})^{T}\big{]}^{T}$ :

\displaystyle\tfrac{\partial(\bm{x}\!-\!\bm{x}_{i}\!-\!\bm{\beta}_{i}\!\cdot\!d_{i})}{\partial\bm{y}}\Big{|}_{\bm{y}=\bm{y}^{\star}}\!=\!\big{[}\bm{I}_{H},\bm{0}_{H\times(i-1)},-\bm{\beta}_{i}^{\star},\bm{0}_{H\times(L-i)},\bm{0}_{H\times(i-1)H},-d_{i}^{\star}\cdot\bm{I}_{H},\bm{0}_{H\times(L-i)H}\big{]},~{}i=1,...,L,

(45)

where $\bm{0}_{a\times b}$ (resp. $\bm{I}_{a}$ ) denotes the $a\times b$ zero (resp. $a\times a$ identity) matrix. In a nontrivial source localization setting, the position of the source is different from those of the sensors [20]. That is, none of the elements of $\{\bm{\beta}_{i}^{\star}\}$ should be equal to zero and the same goes for $\{d_{i}^{\star}\}$ . Consequently, it is readily apparent that the LICQ holds true in TOA SL scenarios with practical relevance. $\square$

Appendix D Proof of Theorem 2

In order to study the monotonicity properties of the sequence

\big{\{}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}\big{\}}_{k=1,...},

(46)

we decompose the difference in $\mathcal{L}_{\rho}$ between two successive ADMM iterations as


	$\displaystyle\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k+1)}\big{)}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}$
	$\displaystyle~{}~{}=\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}$		(47a)
	$\displaystyle~{}~{}~{}~{}+\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}$		(47b)
	$\displaystyle~{}~{}~{}~{}+\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}$		(47c)
	$\displaystyle~{}~{}~{}~{}+\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k+1)}\big{)}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}$		(47d)

and analyze their ranges one by one.

For (47a) and (47b), it follows from the $\bm{x}$ - and $\bm{\beta}$ -updates of the proposed ADMM that

\displaystyle{}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}\leq 0

(48)

and

\displaystyle{}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}\leq 0

(49)

hold by definition.

On the other hand, additional assumptions about $f(\cdot)$ will be required to facilitate the analysis of (47c) and (47d). As long as we assume the convexity of $f(r_{i}-d_{i})$ w.r.t. $d_{i},\forall i\in\{1,...,L\}$ , it would then be easily verified that

		$\displaystyle\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}=\sum_{i=1}^{L}f(r_{i}-d_{i})+\sum_{i=1}^{L}\big{(}\bm{\lambda}_{i}^{(k)}\big{)}^{T}\big{(}\bm{x}^{(k+1)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k+1)}\cdot d_{i}\big{)}$
		$\displaystyle~{}+\tfrac{\rho}{2}\sum_{i=1}^{L}{\big{\\|}\bm{x}^{(k+1)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k+1)}\cdot d_{i}\big{\\|}}_{2}^{2}$		(50)

is strongly convex w.r.t. $\bm{d}$ (with parameter $M_{2}>0$ ), by examining the positive semidefiniteness of the difference between its Hessian and $M_{2}\cdot\bm{I}_{L}$ . Applying the equivalent inequality that characterizes strong convexity [66] to the points $\bm{d}^{(k+1)}$ and $\bm{d}^{(k)}$ , we have for (47c)

		$\displaystyle\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}\geq\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}$
		$\displaystyle~{}~{}+\big{(}\bm{\nabla}_{\bm{d}}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}\big{\|}_{\bm{d}=\bm{d}^{(k+1)}}\big{)}^{T}\big{(}\bm{d}^{(k)}-\bm{d}^{(k+1)}\big{)}+\tfrac{M_{2}}{2}{\big{\\|}\bm{d}^{(k)}-\bm{d}^{(k+1)}\big{\\|}}_{2}^{2}$
		$\displaystyle~{}=\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}+\tfrac{M_{2}}{2}{\big{\\|}\bm{d}^{(k)}-\bm{d}^{(k+1)}\big{\\|}}_{2}^{2},$		(51)

which subsequently leads to

\displaystyle{}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}\leq-\tfrac{M_{2}}{2}{\big{\|}\bm{d}^{(k)}-\bm{d}^{(k+1)}\big{\|}}_{2}^{2}.

(52)

The remaining difference of two evaluated augmented Lagrangians, (47d), can be re-expressed as

		$\displaystyle\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k+1)}\big{)}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}$
		$\displaystyle~{}~{}=\sum_{i=1}^{L}\!\big{(}\bm{\lambda}_{i}^{(k+1)}-\bm{\lambda}_{i}^{(k)}\big{)}^{T}\!\big{(}\bm{x}^{(k+1)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k+1)}\cdot d_{i}^{(k+1)}\big{)}=\tfrac{1}{\rho}\sum_{i=1}^{L}{\big{\\|}\bm{\lambda}_{i}^{(k+1)}-\bm{\lambda}_{i}^{(k)}\big{\\|}}_{2}^{2}$		(53)

based on (16c). Since our $d_{i}$ -minimization step implies

	$\displaystyle d_{i}^{(k+1)}=\arg\min_{d_{i}}\Big{[}\big{(}\bm{\lambda}_{i}^{(k)}\big{)}^{T}\big{(}\bm{x}^{(k+1)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k+1)}\cdot d_{i}\big{)}$
	$\displaystyle~{}+f\big{(}r_{i}-d_{i}\big{)}+\tfrac{\rho}{2}{\big{\\|}\bm{x}^{(k+1)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k+1)}\cdot d_{i}\big{\\|}}_{2}^{2}\Big{]},$		(54)

taking advantage of the convexity of $f(r_{i}-d_{i})$ w.r.t. $d_{i},\forall i\in\{1,...,L\}$ once again we derive

		$\displaystyle-f^{\prime}\big{(}r_{i}-d_{i}^{(k+1)}\big{)}-\big{(}\bm{\lambda}_{i}^{(k)}\big{)}^{T}\bm{\beta}_{i}^{(k+1)}-\rho\big{(}\bm{\beta}_{i}^{(k+1)}\big{)}^{T}\big{(}\bm{x}^{(k+1)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k+1)}\cdot d_{i}^{(k+1)}\big{)}$
		$\displaystyle~{}~{}=-f^{\prime}\big{(}r_{i}-d_{i}^{(k+1)}\big{)}-\big{(}\bm{\lambda}_{i}^{(k+1)}\big{)}^{T}\bm{\beta}_{i}^{(k+1)}=0$		(55)

and

\displaystyle{}-f^{\prime}\big{(}r_{i}-d_{i}^{(k)}\big{)}-\big{(}\bm{\lambda}_{i}^{(k)}\big{)}^{T}\bm{\beta}_{i}^{(k)}=0.

(56)

Utilizing the fact that $\bm{\beta}_{i}^{(k)}$ and $\bm{\beta}_{i}^{(k+1)}$ are both unit vectors and our assumptions about $f(r_{i}-d_{i})$ , we arrive at

		$\displaystyle\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k+1)}\big{)}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}$
		$\displaystyle~{}~{}=\tfrac{1}{\rho}\sum_{i=1}^{L}\Big{\\|}\bm{\beta}_{i}^{(k+1)}\cdot\Big{(}-f^{\prime}\big{(}r_{i}-d_{i}^{(k+1)}\big{)}\Big{)}-\bm{\beta}_{i}^{(k)}\cdot\Big{(}-f^{\prime}\big{(}r_{i}-d_{i}^{(k)}\big{)}\Big{)}\Big{\\|}_{2}^{2}$
		$\displaystyle~{}~{}\leq\tfrac{1}{\rho}\sum_{i=1}^{L}\Big{\|}\big{\|}f^{\prime}\big{(}r_{i}-d_{i}^{(k+1)}\big{)}\big{\|}+\big{\|}f^{\prime}\big{(}r_{i}-d_{i}^{(k)}\big{)}\big{\|}\Big{\|}^{2}$
		$\displaystyle~{}~{}=\tfrac{1}{\rho}\sum_{i=1}^{L}\Big{\|}\big{\|}\nabla_{d_{i}}f(r_{i}-d_{i})\|_{d_{i}=d_{i}^{(k+1)}}\big{\|}+\big{\|}\nabla_{d_{i}}f(r_{i}-d_{i})\|_{d_{i}=d_{i}^{(k)}}\big{\|}\Big{\|}^{2}$
		$\displaystyle~{}~{}\leq\tfrac{\bar{M}^{2}}{\rho}\sum_{i=1}^{L}{\big{\|}d_{i}^{(k)}-d_{i}^{(k+1)}\big{\|}}^{2}=\tfrac{\bar{M}^{2}}{\rho}{\big{\\|}\bm{d}^{(k)}-\bm{d}^{(k+1)}\big{\\|}}_{2}^{2}$		(57)

after putting (D) and (56) into (D).

Plugging (48), (49), (52) and (D) into (47) yields

		$\displaystyle\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k+1)}\big{)}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}$
		$\displaystyle~{}~{}\leq\left(\tfrac{\bar{M}^{2}}{\rho}-\tfrac{M_{2}}{2}\right){\big{\\|}\bm{d}^{(k)}-\bm{d}^{(k+1)}\big{\\|}}_{2}^{2}.$		(58)

To rephrase, $\big{\{}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}\big{\}}_{k=1,...}$ is monotonically nonincreasing for $\rho\geq(2\bar{M}^{2})/M_{2}$ under our additional assumptions about the loss function. $\square$

Appendix E Proof of Theorem 3

Combining the convexity of $f(r_{i}-d_{i})$ w.r.t. $d_{i}$ with the $M_{1}$ -Lipschitz continuity of its gradient gives [70, Lemma 4]

\displaystyle{}f(r_{i}-\eta_{1})\leq f(r_{i}-\eta_{2})-f^{\prime}(r_{i}-\eta_{2})(\eta_{1}-\eta_{2})+\tfrac{M_{1}}{2}(\eta_{1}-\eta_{2})^{2},~{}\forall\eta_{1},\eta_{2}\in\mathbb{R}.

(59)

Having in mind that $\bm{\beta}_{i}^{(k)}$ is a unit column vector, we construct

		$\displaystyle f\Big{(}r_{i}-\big{(}\bm{\beta}_{i}^{(k)}\big{)}^{T}\big{(}\bm{x}^{(k)}-\bm{x}_{i}\big{)}\Big{)}\leq f\big{(}r_{i}-d_{i}^{(k)}\big{)}-f^{\prime}\big{(}r_{i}-d_{i}^{(k)}\big{)}\big{(}\big{(}\bm{\beta}_{i}^{(k)}\big{)}^{T}\big{(}\bm{x}^{(k)}-\bm{x}_{i}\big{)}-d_{i}^{(k)}\big{)}$
		$\displaystyle~{}+\tfrac{M_{1}}{2}\Big{(}\big{(}\bm{\beta}_{i}^{(k)}\big{)}^{T}\big{(}\bm{x}^{(k)}-\bm{x}_{i}\big{)}-d_{i}^{(k)}\Big{)}^{2}$		(60)

from (59) by letting $\eta_{1}$ and $\eta_{2}$ be $\big{(}\bm{\beta}_{i}^{(k)}\big{)}^{T}\big{(}\bm{x}^{(k)}-\bm{x}_{i}\big{)}$ and $d_{i}^{(k)}$ , respectively. Plugging (56) into (E) further deduces

		$\displaystyle f\Big{(}r_{i}-\big{(}\bm{\beta}_{i}^{(k)}\big{)}^{T}\big{(}\bm{x}^{(k)}-\bm{x}_{i}\big{)}\Big{)}\leq f\big{(}r_{i}-d_{i}^{(k)}\big{)}+\big{(}\bm{\lambda}_{i}^{(k)}\big{)}^{T}\big{(}\bm{x}^{(k)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k)}\cdot d_{i}^{(k)}\big{)}$
		$\displaystyle~{}~{}+\tfrac{M_{1}}{2}\Big{(}\big{(}\bm{\beta}_{i}^{(k)}\big{)}^{T}\big{(}\bm{x}^{(k)}-\bm{x}_{i}\big{)}-d_{i}^{(k)}\Big{)}^{2}.$		(61)

Based on (E), we have


	$\displaystyle\mathcal{L}_{\rho}\big{(}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}=\sum_{i=1}^{L}f\big{(}r_{i}-d_{i}^{(k)}\big{)}+\sum_{i=1}^{L}\big{(}\bm{\lambda}_{i}^{(k)}\big{)}^{T}\big{(}\bm{x}^{(k)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k)}\cdot d_{i}^{(k)}\big{)}$
	$\displaystyle~{}~{}+\tfrac{\rho}{2}\sum_{i=1}^{L}{\big{\\|}\bm{x}^{(k)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k)}\cdot d_{i}^{(k)}\big{\\|}}_{2}^{2}$		(62a)
	$\displaystyle~{}\geq\sum_{i=1}^{L}\Big{[}\underbrace{f\Big{(}r_{i}-\big{(}\bm{\beta}_{i}^{(k)}\big{)}^{T}\big{(}\bm{x}^{(k)}-\bm{x}_{i}\big{)}\Big{)}}_{\textup{1st part}}+\underbrace{\tfrac{(\rho-M_{1})}{2}{\big{\\|}\bm{x}^{(k)}-\bm{x}_{i}-\bm{\beta}_{i}^{(k)}\cdot d_{i}^{(k)}\big{\\|}}_{2}^{2}}_{\textup{2nd part}}\Big{]}.$		(62b)

Under the symmetry and monotonicity assumptions earlier made about $f(\cdot)$ in Proposition 1, it is not hard to find that the first part of the equation enclosed within square brackets in (62b) is bounded from below. The same can also be said of the second part for $\rho\geq M_{1}$ (in fact, $\rho$ should satisfy $\rho\geq\max(M_{1},(2\bar{M}^{2})/M_{2})$ so that Theorem 2 remains valid), trivially, as it is lower bounded by 0. $\square$

Appendix F Proof of Theorem 4

First of all, it follows from (D) and Corollary 1 that

		$\displaystyle\lim_{k\rightarrow\infty}{\big{\\|}\bm{d}^{(k+1)}-\bm{d}^{(k)}\big{\\|}}_{2}^{2}\leq\lim_{k\rightarrow\infty}\left(\tfrac{\bar{M}^{2}}{\rho}-\tfrac{M_{2}}{2}\right)^{-1}\cdot\big{[}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k+1)}\big{)}$
		$\displaystyle~{}~{}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}\big{]}=0.$		(63)

Combining (D), (D), and (F), we have for $\bm{\lambda}$

\displaystyle{}\lim_{k\rightarrow\infty}\tfrac{1}{\rho}\sum_{i=1}^{L}{\big{\|}\bm{\lambda}_{i}^{(k+1)}-\bm{\lambda}_{i}^{(k)}\big{\|}}_{2}^{2}\leq\lim_{k\rightarrow\infty}\tfrac{\bar{M}^{2}}{\rho}{\big{\|}\bm{d}^{(k+1)}-\bm{d}^{(k)}\big{\|}}_{2}^{2}=0

(64)

as well. Akin to (D) and (D), the strong convexity characterizing inequality (with parameter $M_{3}>0$ ) for

		$\displaystyle\mathcal{L}_{\rho}\big{(}\bm{x},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}=\sum_{i=1}^{L}f\big{(}r_{i}-d_{i}^{(k)}\big{)}+\sum_{i=1}^{L}\big{(}\bm{\lambda}_{i}^{(k)}\big{)}^{T}\big{(}\bm{x}-\bm{x}_{i}-\bm{\beta}_{i}^{(k)}\cdot d_{i}^{(k)}\big{)}$
		$\displaystyle~{}+\tfrac{\rho}{2}\sum_{i=1}^{L}{\big{\\|}\bm{x}-\bm{x}_{i}-\bm{\beta}_{i}^{(k)}\cdot d_{i}^{(k)}\big{\\|}}_{2}^{2}$		(65)

associated with the points $\bm{x}=\bm{x}^{(k+1)}$ and $\bm{x}=\bm{x}^{(k)}$ gives

		$\displaystyle\mathcal{L}_{\rho}\big{(}\bm{x}^{(k)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}\geq\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}$
		$\displaystyle~{}~{}+\big{(}\bm{\nabla}_{\bm{x}}\mathcal{L}_{\rho}\big{(}\bm{x},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\!\big{)}\big{\|}_{\bm{x}=\bm{x}^{(k+1)}}\big{)}^{T}\big{(}\bm{x}^{(k)}-\bm{x}^{(k+1)}\big{)}+\tfrac{M_{3}}{2}{\big{\\|}\bm{x}^{(k)}-\bm{x}^{(k+1)}\big{\\|}}_{2}^{2}$
		$\displaystyle~{}=\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}+\tfrac{M_{3}}{2}{\big{\\|}\bm{x}^{(k)}-\bm{x}^{(k+1)}\big{\\|}}_{2}^{2},$		(66)

which implies

\displaystyle{}\lim_{k\rightarrow\infty}{\big{\|}\bm{x}^{(k+1)}-\bm{x}^{(k)}\big{\|}}_{2}^{2}=0

(67)

by invoking Corollary 1 once more. Likewise, we can show

\displaystyle{}\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k)},\bm{\lambda}^{(k)}\big{)}\geq\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}+\tfrac{M_{4}}{2}{\big{\|}\bm{\beta}^{(k)}-\bm{\beta}^{(k+1)}\big{\|}}_{2}^{2},

(68)

and

\displaystyle{}\lim_{k\rightarrow\infty}{\big{\|}\bm{\beta}^{(k+1)}-\bm{\beta}^{(k)}\big{\|}}_{2}^{2}=0

(69)

for some parameter $M_{4}>0$ , following a procedure similar to (F)–(67). $\square$

Acknowledgment

This work was supported by the state graduate funding coordinated by the University of Freiburg. The first author would like to thank Prof. Moritz Diehl for the inspiring discussions related to the use of ADMM. All three authors wish to express their gratitude to the co-editor-in-chief for his efforts to ensure a rigorous review process and the anonymous reviewers whose insightful feedback greatly improved this work. Special thanks go to Deutsche Bahn for providing necessary onboard power supply that enabled the first author to conduct several additional simulations during his journey to and from Berlin in October 2023, in response to the first-round reviewers’ requests.

References

[1] H. C. So, “Source localization: Algorithms and analysis,” in Handbook of Position Location: Theory, Practice and Advances, 2nd ed., S. A. Zekavat and M. Buehrer, Eds. New York, NY, USA: Wiley-IEEE Press, 2019, pp. 59–106.
[2] J. H. Reed, K. J. Krizman, B. D. Woerner, and T. S. Rappaport, “An overview of the challenges and progress in meeting the E-911 requirement for location service,” IEEE Commun. Mag., vol. 36, no. 4, pp. 30–37, Apr. 1998.
[3] F. Höflinger, R. Zhang, J. Hoppe, A. Bannoura, L. M. Reindl, J. Wendeberg, M. Buhrer, and C. Schindelhauer, “Acoustic self-calibrating system for indoor smartphone tracking (ASSIST),” in Proc. 3rd. Int. Conf. Indoor Positioning and Indoor Navigat. (IPIN), Sydney, Australia, Nov. 2012, pp. 1–9.
[4] S. Li, L. D. Xu, and X. Wang, “Compressed sensing signal and data acquisition in wireless sensor networks and Internet of Things,” IEEE Trans. Ind. Informat., vol. 9, no. 4, pp. 2177–2186, Nov. 2013.
[5] P. Setlur, G. E. Smith, F. Ahmad, and M. G. Amin, “Target localization with a single sensor via multipath exploitation,” IEEE Trans. Aerosp. Electron. Syst., vol. 48, no. 3, pp. 1996–2014, Jul. 2012.
[6] F. Zafari, A. Gkelias, and K. K. Leung, “A survey of indoor localization systems and technologies,” IEEE Commun. Surveys Tuts., vol. 21, no. 3, pp. 2568–2599, 3rd Quart., 2019.
[7] K. W. Cheung, H. C. So, W.-K. Ma, and Y. T. Chan, “Least squares algorithms for time-of-arrival-based mobile location,” IEEE Trans. Signal Process., vol. 52, no. 4, pp. 1121–1130, Apr. 2004.
[8] F. K. W. Chan, H. C. So, J. Zheng and K. W. K. Lui, “Best linear unbiased estimator approach for time-of-arrival based localization,” IET Signal Processing, vol.2, no.2, pp.156–162, Jun. 2008.
[9] A. Beck, P. Stoica, and J. Li, “Exact and approximate solutions of source localization problems,” IEEE Trans. Signal Process., vol. 56, no. 5, pp. 1770–1778, May 2008.
[10] K. W. Cheung, W. K. Ma, and H. C. So, “Accurate approximation algorithm for TOA-based maximum likelihood mobile location using semidefinite programming,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., May 2004, p. 145.
[11] K. W. K. Lui, W.-K. Ma, H. C. So, and F. K. W. Chan, “Semidefinite programming algorithms for sensor network node localization with uncertainties in anchor positions and/or propagation speed,” IEEE Trans. Signal Process., vol. 57, no. 2, pp. 752–763, Feb. 2009.
[12] J. Liang, Y. Chen, H. C. So, and Y. Jing, “Circular/hyperbolic/elliptic localization via Euclidean norm elimination,” Signal Process., vol. 148, pp. 102–113, 2018.
[13] Y. -M. Pun and A. M. -C. So, “Local strong convexity of source localization and error bound for target tracking under time-of-arrival measurements,” IEEE Trans. Signal Process., vol. 70, pp. 190–201, 2022.
[14] W. H. Foy, “Position-location solutions by Taylor-series estimation,” IEEE Trans. Aerosp. Electron. Syst., vol. 12, no. 2, pp. 187–194, Mar. 1976.
[15] A. Coluccia and A. Fascista, “On the hybrid TOA/RSS range estimation in wireless sensor networks,” IEEE Trans. Wireless Commun., vol. 17, no. 1, pp. 361–371, Jan. 2018.
[16] W. Xiong, S. Mohanty, C. Schindelhauer, S. J. Rupitsch, and H. C. So, “Convex relaxation approaches to robust RSS-TOA based source localization in NLOS environments,” IEEE Trans. Veh. Technol., vol. 72, no. 8, pp. 11068–11073, Aug. 2023.
[17] D. J. Torrieri, “Statistical theory of passive location systems”, IEEE Trans. Aerosp. Electron. Syst., vol. 20, no. 2, pp. 183–198, Mar. 1984.
[18] M. R. Gholami, S. Gezici, and E. G. Strom, “A concave-convex procedure for TDOA based positioning,” IEEE Commun. Lett., vol. 17, no. 4, pp. 765–768, Apr. 2013.
[19] G. Wang, A. M. C. So, and Y. Li, “Robust convex approximation methods for TDOA-based localization under NLOS conditions,” IEEE Trans. Signal Process., vol. 64, no. 13, pp. 3281–3296, Jul. 2016.
[20] W. Xiong, C. Schindelhauer, H. C. So, J. Bordoy, A. Gabbrielli, and J. Liang, “TDOA-based localization with NLOS mitigation via robust model transformation and neurodynamic optimization,” Signal Process., vol. 178, 107774, Jan. 2021.
[21] M. Einemo and H. C. So, “Weighted least squares algorithm for target localization in distributed MIMO radar,” Signal Process., vol. 115, pp. 144–150, Oct. 2015.
[22] R. Amiri, F. Behnia, and M. A. M. Sadr, “Exact solution for elliptic localization in distributed MIMO radar systems,” IEEE Trans. Veh. Technol., vol. 67, no. 2, pp. 1075–1086, 2018.
[23] Z. Shi, H. Wang, C. S. Leung, and H. C. So, “Robust MIMO radar target localization based on Lagrange programming neural network,” Signal Process., vol. 174, 107574, Sep. 2020.
[24] W. Xiong, “Denoising of bistatic ranges for elliptic positioning,” IEEE Geosci. Remote Sens. Lett., vol. 20, pp. 1–3, 2023, Art. no. 3500503.
[25] W. Xiong, G. Cheng, C. Schindelhauer, and H. C. So, “Robust matrix completion for elliptic positioning in the presence of outliers and missing data,” IEEE Trans. Geosci. Remote Sens., vol. 61, pp. 1–12, 2023, Art no. 5105912.
[26] W. Xiong, C. Schindelhauer, and H. C. So, “Error-reduced elliptic positioning via joint estimation of location and a balancing parameter,” IEEE Signal Process. Lett., vol. 29, pp. 2447–2451, 2022.
[27] W. Xiong, J. Liang, Z. Wang, and H. C. So, “Elliptic target positioning based on balancing parameter estimation and augmented Lagrange programming neural network,” Digital Signal Process., vol. 136, 104004, May 2023.
[28] W. Xiong, S. Mohanty, and C. Schindelhauer, “A low-complexity iterative message passing algorithm for robust RSS-TOA IoT localization,” IEEE Internet Things J., vol. 10, no. 18, pp. 16028–16035, Sept. 2023.
[29] A. Gabbrielli, W. Xiong, D. J. Schott, G. Fischer, J. Wendeberg, F. Höflinger, L. M. Reindl, C. Schindelhauer, and S. J. Rupitsch, “An echo suppression delay estimator for angle of arrival ultrasonic indoor localization,” IEEE Trans. Instrum. Meas., vol. 70, pp. 1–12, 2021.
[30] I. Guvenc and C.-C. Chong, “A survey on TOA based wireless localization and NLOS mitigation techniques,” IEEE Commun. Surveys Tuts., vol. 11, no. 3, pp. 107–124, Aug. 2009.
[31] A. M. Zoubir, V. Koivunen, E. Ollila, and M. Muma, Robust Statistics for Signal Processing. Cambridge, U.K.: Cambridge Univ. Press, 2018.
[32] S. Tomic and M. Beko, “A bisection-based approach for exact target localization in NLOS environments,” Signal Process., vol. 143, pp. 328–335, Feb. 2018.
[33] G. Wang, H. Chen, Y. Li, and N. Ansari, “NLOS error mitigation for TOA-based localization via convex relaxation,” IEEE Trans. Wireless Commun., vol. 13, no. 8, pp. 4119–4131, Aug. 2014.
[34] S. Zhang, S. Gao, G. Wang, and Y. Li, “Robust NLOS error mitigation method for TOA-based localization via second-order cone relaxation,” IEEE Commun. Lett., vol. 19, no. 12, pp. 2210–2213, Dec. 2015.
[35] S. Tomic, M. Beko, R. Dinis, and P. Montezuma, “A robust bisection-based estimator for TOA-based target localization in NLOS environments,” IEEE Commun. Lett., vol. 21, no. 11, pp. 2488–2491, Nov. 2017.
[36] C.-H. Park and J.-H. Chang, “Robust localization employing weighted least squares method based on MM estimator and Kalman filter with maximum Versoria criterion,” IEEE Signal Process. Lett., vol. 28, pp. 1075–1079, 2021.
[37] W. Xiong and H. C. So, “TOA-based localization with NLOS mitigation via robust multidimensional similarity analysis,” IEEE Signal Process. Lett., vol. 26, no. 9, pp. 1334–1338, Sep. 2019.
[38] C. Soares and J. Gomes, “STRONG: Synchronous and asynchronous robust network localization, under non-Gaussian noise,” Signal Process., vol. 185, 108066, 2021.
[39] H. Wang, R. Feng, A. C. S. Leung, and K. F. Tsang, “Lagrange programming neural network approaches for robust time-of-arrival localization,” Cogn. Comput., vol. 10, no. 1, pp. 23–34, Feb. 2018.
[40] W. Xiong, C. Schindelhauer, H. C. So, and Z. Wang, “Maximum correntropy criterion for robust TOA-based localization in NLOS environments,” Circuits Syst. Signal Process., 2021.
[41] W. Xiong, C. Schindelhauer, H. C. So, and S. J. Rupitsch, “A message passing based iterative algorithm for robust TOA positioning in impulsive noise,” IEEE Trans. Veh. Technol., early access, doi: 10.1109/TVT.2022.3203487.
[42] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, Prentice-Hall, NJ: Englewood Cliffs, 1993.
[43] T. Wang, “Cramer-Rao bound for localization with a priori knowledge on biased range measurements,” IEEE Trans. Aerosp. Electron. Syst., vol. 48, no. 1, pp. 468–476, Jan. 2012.
[44] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein, “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Found. Trends Mach. Learn., vol. 3, no. 1, pp. 1–122, Jan. 2011.
[45] D. Gabay and B. Mercier, “A dual algorithm for the solution of nonlinear variational problems via finite element approximation,” Comput. Math. Applicat., vol. 2, no. 1, pp. 17–40, 1976.
[46] J. Xie, J. Yang, J. J. Qian, Y. Tai, and H. M. Zhang, “Robust nuclear norm-based matrix regression with applications to robust face recognition,” IEEE Trans. Image Process., vol. 26, no. 5, pp. 2286–2295, May 2017.
[47] H. Zhang, J. Yang, F. Shang, C. Gong, and Z. Zhang, “LRR for subspace segmentation via tractable schatten- $p$ norm minimization and factorization,” IEEE Trans. Cybern., vol. 49, no. 5, pp. 1722–1734, May 2019.
[48] Z.-L. Shi, X. P. Li, C.-S. Leung, and H. C. So, “Cardinality constrained portfolio optimization via alternating direction method of multipliers,” IEEE Trans. Neural Netw. Learn. Syst., early access, 2022, doi: 10.1109/TNNLS.2022.3192065.
[49] J. Qian, W. K. Wong, H. Zhang, J. Xie, and J. Yang, “Joint optimal transport with convex regularization for robust image classification,” IEEE Trans. Cybern., vol. 52, no. 3, pp. 1553–1564, Mar. 2022.
[50] H. Zhang, F. Qian, P. Shi, W. Du, Y. Tang, J. Qian, C. Gong, and J. Yang “Generalized nonconvex nonsmooth low-rank matrix recovery framework with feasible algorithm designs and convergence analysis,” IEEE Trans. Neural Netw. Learn. Syst., vol. 34, no. 9, pp. 5342–5353, Sep. 2023.
[51] H. Zhang, J. Gao, J. Qian, J. Yang, C. Xu, and B. Zhang, “Linear regression problem relaxations solved by nonconvex ADMM with convergence analysis,” IEEE Trans. Circuits Syst. Video Technol., early access, doi: 10.1109/TCSVT.2023.3291821.
[52] T.-K. Le and K. C. Ho, “Joint source and sensor localization by angles of arrival,” IEEE Trans. Signal Process., vol. 68, pp. 6521–6534, 2020.
[53] C. Chen, B. He, Y. Ye, and X. Yuan, “The direct extension of ADMM for multi-block convex minimization problems is not necessarily convergent,” Math. Program. Ser. A, vol. 155, no. 1, pp. 57–79, Jan. 2016.
[54] N. Parikh and S. Boyd, “Proximal algorithms,” Found. Trends Optim., vol. 1, no. 3, pp. 123–231, Jan. 2013.
[55] W.-J. Zeng and H. C. So, “Outlier-robust matrix completion via $\ell_{p}$ -minimization,” IEEE Trans. Signal Process., vol. 66, no. 5, pp. 1125–1140, Mar. 2018.
[56] A. Beck. “Chapter 6: The proximal operator,” in First-Order Methods in Optimization, Philadelphia, PA, USA: SIAM, 2017, pp. 129–177.
[57] P. Gong, C. Zhang, Z. Lu, J. Huang, and J. Ye, “A general iterative shrinkage and thresholding algorithm for non-convex regularized optimization problems,” in Proc. Int. Conf. Mach. Learn., Atlanta, GA, USA, Jun. 2013, pp. 37–45.
[58] D. De Menezes, D. M. Prata, A. R. Secchi, and J. C. Pinto, “A review on robust M-estimators for regression analysis,” Comput. Chem. Eng., vol. 147, 2021, Art. no. 107254.
[59] Y. Wang, W. Yin, and J. Zeng, “Global convergence of ADMM in nonconvex nonsmooth optimization,” J. Sci. Comput., vol. 78, pp. 29–63, Jan. 2019.
[60] Y. Wang, Y. Wang, and Q. Shi, “Optimized signal distortion for PAPR reduction of OFDM signals with IFFT/FFT complexity via ADMM approaches,” IEEE Trans. Signal Process., vol. 67, no. 2, pp. 399–414, Jan. 2019.
[61] J. Nocedal and S. J. Wright, Numerical Optimization, 2nd ed. New York, NY, USA: Springer, 2006.
[62] R. Feng, C.-S. Leung, A. G. Constantinides, and W.-J. Zeng, “Lagrange programming neural network for nondifferentiable optimization problems in sparse approximation,” IEEE Trans. Neural Netw. Learn. Syst., vol. 28, no. 10, pp. 2395–2407, Oct. 2017.
[63] F. H. Clarke, Optimization and Nonsmooth Analysis. Philadelphia, PA: SIAM, Jan. 1987.
[64] G. Wachsmuth, “On LICQ and the uniqueness of Lagrange multipliers,” Oper. Res. Lett., vol. 41, no. 1, pp. 78–80, 2013.
[65] When is LICQ useful in KKT conditions? Dec. 13, 2018. Accessed: Oct. 28, 2023. [Online]. Available: https://math.stackexchange.com/q/3038249.
[66] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge University Press, 2004.
[67] S. Zhang and A. G. Constantinides, “Lagrange programming neural networks,” IEEE Trans. Circuits Syst. II: Anal. Digit. Signal Process., vol. 39, no. 7, pp. 441–452, Jul. 1992.
[68] L. Zhao, P. Babu, and D. P. Palomar, “Efficient algorithms on robust low-rank matrix completion against outliers,” IEEE Trans. Signal Process., vol. 64, no. 18, pp. 4767–4780, Sep. 2016.
[69] J. Liang, C. S. Leung, and H. C. So, “Lagrange programming neural network approach for target localization in distributed MIMO radar,” IEEE Trans. Signal Process., vol. 64, no. 6, pp. 1574–1585, Mar. 2016.
[70] X. Zhou, “On the Fenchel duality between strong convexity and Lipschitz continuous gradient,” 2018. [Online]. Available: https://arxiv.org/abs/1803.06573
[71] M. Grant and S. Boyd, “CVX: MATLAB software for disciplined convex programming, version 2.1.” Accessed: Sep. 11, 2021. [Online]. Available: http://cvxr.com/cvx
[72] L. F. Shampine and M. W. Reichelt, “The MATLAB ODE suite,” SIAM J. Sci. Comput., vol. 18, no. 1, pp. 1–22, Jan. 1997.
[73] W. Liu, P. P. Pokharel, and J. C. Principe, “Correntropy: Properties and applications in non-Gaussian signal processing,” IEEE Trans. Signal Process., vol. 55, no. 11, pp. 5286–5298, Nov. 2007.
[74] Y. Liu and L. Jin, “Deep matching prior network: Toward tighter multioriented text detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., 2017, pp. 3454–3461.
[75] N. H. Nguyen, K. Doğançay, and E. E. Kuruoglu, “An iteratively reweighted instrumental-variable estimator for robust 3-D AOA localization in impulsive noise,” IEEE Trans. Signal Process., vol. 67, no. 18, pp. 4795–4808, Sep. 2019.
[76] J. P. Nolan, Univariate Stable Distributions: Models for Heavy Tailed Data. New York, NY, USA: Springer, 2020.
[77] F. Yin, C. Fritsche, F. Gustafsson, and A. M. Zoubir, “TOA based robust wireless geolocation and Cramer-Rao lower bound analysis in harsh LOS/NLOS environments,” IEEE Trans. Signal Process., vol. 61, no. 9, pp. 2243–2255, May 2013.
[78] W.-J. Zeng, H. C. So, and L. Huang, “ $\ell_{p}$ -MUSIC: Robust direction-of-arrival estimator for impulsive noise environments,” IEEE Trans. Signal Process., vol. 61, no. 17, pp. 4296–4308, Sep. 2013.
[79] Y. Chen, H. C. So, and E. E. Kuruoglu, “Variance analysis of unbiased least $\ell_{p}$ -norm estimator in non-Gaussian noise,” Signal Process., vol. 122, pp. 190–203, 2016.

		$\displaystyle\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k+1)}\big{)}-\mathcal{L}_{\rho}\big{(}\bm{x}^{(k+1)},\bm{d}^{(k+1)},\bm{\beta}^{(k+1)},\bm{\lambda}^{(k)}\big{)}$
		$\displaystyle~{}~{}=\tfrac{1}{\rho}\sum_{i=1}^{L}\Big{\\|}\bm{\beta}_{i}^{(k+1)}\cdot\Big{(}-f^{\prime}\big{(}r_{i}-d_{i}^{(k+1)}\big{)}\Big{)}-\bm{\beta}_{i}^{(k)}\cdot\Big{(}-f^{\prime}\big{(}r_{i}-d_{i}^{(k)}\big{)}\Big{)}\Big{\\|}_{2}^{2}$
		$\displaystyle~{}~{}\leq\tfrac{1}{\rho}\sum_{i=1}^{L}\Big{\|}\big{\|}f^{\prime}\big{(}r_{i}-d_{i}^{(k+1)}\big{)}\big{\|}+\big{\|}f^{\prime}\big{(}r_{i}-d_{i}^{(k)}\big{)}\big{\|}\Big{\|}^{2}$
		$\displaystyle~{}~{}=\tfrac{1}{\rho}\sum_{i=1}^{L}\Big{\|}\big{\|}\nabla_{d_{i}}f(r_{i}-d_{i})\|_{d_{i}=d_{i}^{(k+1)}}\big{\|}+\big{\|}\nabla_{d_{i}}f(r_{i}-d_{i})\|_{d_{i}=d_{i}^{(k)}}\big{\|}\Big{\|}^{2}$
		$\displaystyle~{}~{}\leq\tfrac{\bar{M}^{2}}{\rho}\sum_{i=1}^{L}{\big{\|}d_{i}^{(k)}-d_{i}^{(k+1)}\big{\|}}^{2}=\tfrac{\bar{M}^{2}}{\rho}{\big{\\|}\bm{d}^{(k)}-\bm{d}^{(k+1)}\big{\\|}}_{2}^{2}$		(57)