A Unifying Framework for Adaptive Radar Detection in the Presence of Multiple Alternative Hypotheses

Pia Addabbo, Senior Member, IEEE, Sudan Han, Filippo Biondi, Member, IEEE, Gaetano Giunta, Senior Member, IEEE, and Danilo Orlando, Senior Member, IEEE P. Addabbo is with Università degli studi “Giustino Fortunato”, Benevento, Italy. E-mail: [email protected].S. Han is with Defense Innovation Institute, Beijing, China E-mail: [email protected].F. Biondi is with Italian Ministry of Defence. Email: [email protected].G. Giunta is with the Department of Engineering, University of Roma Tre, 00146 Rome, Italy. E-mail: [email protected].D. Orlando is with the Faculty of Engineering, Università degli Studi “Niccolò Cusano”, 00166 Roma, Italy. E-mail: [email protected].

Abstract

In this paper, we develop a new elegant framework relying on the Kullback-Leibler Information Criterion to address the design of one-stage adaptive detection architectures for multiple hypothesis testing problems. Specifically, at the design stage, we assume that several alternative hypotheses may be in force and that only one null hypothesis exists. Then, starting from the case where all the parameters are known and proceeding until the case where the adaptivity with respect to the entire parameter set is required, we come up with decision schemes for multiple alternative hypotheses consisting of the sum between the compressed log-likelihood ratio based upon the available data and a penalty term accounting for the number of unknown parameters. The latter rises from suitable approximations of the Kullback-Leibler Divergence between the true and a candidate probability density function. Interestingly, under specific constraints, the proposed decision schemes can share the constant false alarm rate property by virtue of the Invariance Principle. Finally, we show the effectiveness of the proposed framework through the application to examples of practical value in the context of radar detection also in comparison with two-stage competitors. This analysis highlights that the architectures devised within the proposed framework represent an effective means to deal with detection problems where the uncertainty on some parameters leads to multiple alternative hypotheses.

Index Terms:

Adaptive Radar Detection, Constant False Alarm Rate, Generalized Likelihood Ratio Test, Kullback-Leibler Information Criterion, Model Order Selection, Multiple Hypothesis Testing, Nuisance Parameters, Radar, Statistical Invariance.

I Introduction

Nowadays, modern radar systems incorporate sophisticated signal processing algorithms which take advantage of the computational power made available by recent advances in technology. This growth in complexity is dictated by the fact these systems have to face with more and more challenging scenarios where conventional algorithms might fail or exhibit poor performance. For instance, in target-rich environments, structured echoes contaminate data used to estimate the spectral properties of the interference (also known as training or secondary data) leading to a dramatic attenuation of the signal of interest components and, hence, to a nonnegligible number of missed detections [1]. In such case, radar system should be endowed with signal processing schemes capable of detecting and suppressing the outliers in order to make the training data set homogeneous [2, 3, 4, 5]. Another important example concerns high resolution radars, which can resolve a target into a number of different scattering centers depending on the radar bandwidth and the range extent of the target [6, 7]. The classic approach to the detection of range-spread targets consists in processing one range bin at a time despite the fact that contiguous cells contain target energy. As a consequence, classic detection algorithms do not collect as much energy as possible to increase the Signal-to-Interference-plus-Noise Ratio (SINR). To overcome this drawback, architectures capable of detecting distributed targets by exploiting a preassigned number of contiguous range bins have been developed [8, 9, 10, 11, 12]. These energy issues also hold for multiple point-like targets. In fact, detection algorithms which can take advantage of the total energy associated with each point-like target are highly desirable.

In the open literature, the existing examples concerning the detection of either multiple point-like or range-spread targets share the assumption that the number of scatterers (or at least an upper bound on it) is known and are based upon the Maximum Likelihood Approach (MLA) [13, 14]. However, in scenarios of practical value, such a priori information is not often available especially when the radar system is operating in search mode. Moreover, the problem of jointly detecting multiple point-like targets is very difficult since target positions and, more importantly, target number are unknown parameters that must be estimated. Thus, conversely to the conventional detection problems that comprise two hypotheses, namely the noise-only (or null) and the signal-plus-noise (or alternative) hypothesis, this lack of a priori information naturally leads to multiple alternative hypotheses with the consequence that the radar engineer has to face with composite multiple hypothesis tests.

Besides target-dense environments, another operating situation leading to multiple hypothesis tests is related to possible electronic attacks by adversary forces (jammers). These attacks comprise active techniques aimed at protecting a platform from being detected and tracked by the radar [6] through two approaches: masking and deception. More precisely, noncoherent jammers or Noise-Like Jammers (NLJs) attempt to mask targets generating nondeceptive interference which blends into the thermal noise of the radar receiver degrading the radar sensitivity due to an increase of the Constant False Alarm Rate (CFAR) threshold [6, 15, 16]. On the other hand, the Coherent Jammers (CJs) illuminate the victim radar by means of low duty-cycle signals with specific parameters that, when estimated by the radar processor, force the latter to allocate resources to handle false targets. In fact, CJs are equipped with electronics apparatuses capable of receiving, modifying, amplifying, and retransmitting the radar’s own signal to create false targets with radar’s range, Doppler, and angle far away from the true position of the platform under protection [16, 6].

A possible way to react to this kind of interference relies on the use of decision schemes devised by modifying the conventional detection problem with additional hypotheses associated with the presence of such threats [17, 18]. In [17], adaptive detection and discrimination between useful signals and CJs in the presence of thermal noise, clutter, and possible NLJs is addressed by considering an additional hypothesis under which data contain the CJs only. In addition, the latter is assumed to lay on the orthogonal complement of the subspace spanned by the nominal steering vector (after whitening by the true covariance matrix of the composite disturbance). The resulting multiple hypothesis test is solved resorting to an approach based upon a generalized Neyman-Pearson criterion [19]. However, from a computational point of view, detection threshold setting might require an onerous load and, more importantly, such solution is effective when the multiple hypotheses are not nested. As a matter of fact, in the presence of nested hypotheses, the MLA and, hence, the Generalized Likelihood Ratio Test (GLRT), may fail because the likelihood function monotonically increases with the hypothesis order (or model order). As a consequence, the MLA experiences a natural inclination to overestimate the hypothesis order. An alternative approach consists in looking for detection schemes that incorporate the expedients of the so-called Model Order Selection (MOS) rules [20, 21, 22, 23], which include the parameter number diversity to moderate the overestimation attitude of the MLA. In [18], the authors follow the last approach to conceive two-stage detection architectures for multiple NLJs whose number is unknown. Specifically, the first stage exploits the MOS rules to provide an estimate of the number of NLJs, whereas the second stage consists of a jammer detector that uses the estimate obtained at the first stage. Finally, it is important to observe that MOS rules can be adapted to accomplish detection tasks by also considering the model order “ $0$ ” which is associated with the null hypothesis [24, 25, 26]. However, in this case, it is not possible to set any threshold in order to guarantee a preassigned Probability of False Alarm ( $P_{fa}$ ) and, more importantly, the CFAR property, which is of primary concern in radar, cannot be a priori stated.

In this paper, we develop an elegant framework relying on the Kullback-Leibler Information Criterion (KLIC) [27] to address multiple hypothesis testing problems where there exist many alternative hypotheses. This framework provides an important interpretation of both the Likelihood Ratio Test (LRT) and the GLRT from an information theoretic standpoint and, remarkably, it lays the theoretical foundation for the design of new one-stage decision schemes for multiple hypothesis tests. This result represents the main technical contribution of this paper and, interestingly, such new detection architectures share a common structure that is given by the sum of a conventional decision statistic and a penalty term as well as the generic structure of a KLIC-based MOS rule consists of the compressed log-likelihood plus a penalty term. The starting point of the developed framework is the case where all the parameters are known showing that under suitable regularity conditions the LRT approximates a test which selects the hypothesis with the associated probability density function (pdf) minimizing the Kullback-Leibler Divergence (KLD) with respect to the true data distribution. In addition, the LRT coincides with such test when the KLD is measured with respect to the Empirical Data Distribution (EDD). Then, we guide the reader towards more difficult scenarios where the distribution parameters are no longer known. Specifically, we resort to the Taylor series approximations of the KLD, that are also used to derive the MOS rules, in order to come up with decision schemes capable of moderating the overfitting inclination of the MLA. From a different perspective, the same results can be obtained by regularizing the pdf under the generic alternative hypothesis through a suitable prior for the unknown model order and applying a procedure that combines the MLA and the Bayesian estimation [28, 21] (see the appendix for further details). The proposed theoretical framework is, then, completed with the investigation of the CFAR behavior of these decision schemes framing the analysis in the more general context of statistical invariance and providing two propositions which allows to state when the newly proposed decision architectures are invariant with respect to a given group of transformations and, possibly, enjoy the CFAR property. Finally, we present numerical examples obtained over simulated data and concerning three different radar detection problems also in comparison with two-stage architectures where the first stage is aimed at estimating the model order while the second stage is a conventional detector. The analyses highlight that the developed MOS-based detectors are capable of providing good detection capabilities in several contexts of practical interest.

The remainder of this paper is organized as follows: the next section contains preliminary definitions and formulates the detection problem at hand in terms of a multiple hypothesis test. Section III describes the proposed framework and introduces the new architectures, whereas in Section IV some discussions about the CFAR properties ensured exploiting the Principle of Invariance are also provided. Then, Section V provides some illustrative examples to assess the performance of the new architectures also in comparison to natural competitors. Concluding remarks and future research tracks are given in Section VI. The derivation of an alternative framework leading to the same architectures is confined to the Appendix.

I-A Notation

In the sequel, vectors and matrices are denoted by boldface lower-case and upper-case letters, respectively. The symbols $\det(\cdot)$ , $\mbox{\rm Tr}\,(\cdot)$ , $(\cdot)^{T}$ , $(\cdot)^{\dagger}$ , and $(\cdot)^{-1}$ denote the determinant, trace, transpose, conjugate transpose, and inverse respectively. As to numerical sets, ${\mathds{R}}$ is the set of real numbers, ${\mathds{R}}^{N\times M}$ is the Euclidean space of $(N\times M)$ -dimensional real matrices (or vectors if $M=1$ ), ${\mathds{C}}$ is the set of complex numbers, and ${\mathds{C}}^{N\times M}$ is the Euclidean space of $(N\times M)$ -dimensional complex matrices (or vectors if $M=1$ ). The cardinality of a set $\Omega$ is denoted by $|\Omega|$ . The Dirac delta function is indicated by $\delta(\cdot)$ . ${\mbox{\boldmath$I$}}_{N}$ stands for the $N\times N$ identity matrix, while $0$ is the null vector or matrix of proper size. The acronym i.i.d. mean probability density function and independent and identically distributed, respectively, while symbol $E_{f}[\cdot]$ denotes the statistical expectation with respect to the pdf $f$ . If $A$ and $B$ are two continuous random variables, $f(A|B)$ is the conditional pdf of $A$ given $B$ , whereas the conditional probability of an event $A$ given the event $B$ is represented as $P(A|B)$ . Finally, we write ${\mbox{\boldmath$x$}}\sim\mbox{$\mathcal{C}$}\mbox{$\mathcal{N}$}_{N}({\mbox{\boldmath$m$}},{\mbox{\boldmath$M$}})$ if $x$ is a complex circular $N$ -dimensional normal vector with mean $m$ and covariance matrix ${\mbox{\boldmath$M$}}\succ{\mbox{\boldmath$0$}}$ , whereas ${\mbox{\boldmath$x$}}\sim f({\mbox{\boldmath$x$}};{\mbox{\boldmath$\theta$}})$ means that $f$ is the pdf of $x$ with parameter vector $\theta$ .

II Problem Formulation and Preliminary Definitions

Let us consider a radar system, equipped with $N$ space and/or time identical channels, which collects a data matrix ${\mbox{\boldmath$Z$}}=[{\mbox{\boldmath$z$}}_{1},\ldots,{\mbox{\boldmath$z$}}_{K}]\in{\mathds{C}}^{N\times K}$ whose¹¹1Observe that in the case of Space-Time Adaptive Processing (STAP), $N$ represents the number of space-time channels, whereas when the system transmits either a single pulse through an array of sensors or exploits a single antenna to transmit a pulse train, $N$ represents the number of elements of either the spatial array or the number of transmitted pulses, respectively [29]. Finally, $K$ may represent the number of range bins or the number of pulses when a slice of the STAP datacube is processed. columns can be modeled as statistically independent random vectors whose distribution belongs to a preassigned family. For simplicity and in order not to burden the notation (a point better explained below), we assume that these vectors share the same distribution parameters (identically distributed) and that their joint unknown pdf is denoted by $\bar{f}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}})$ with ${\mbox{\boldmath$\theta$}}\in{\mathds{R}}^{p\times 1}$ the parameter vector taking on value in a specific parameter space, $\Theta\subseteq{\mathds{R}}^{p\times 1}$ say. Now, a conventional binary decision problem partitions the latter into two subsets $\Theta_{0}$ and $\Theta_{1}$ corresponding to the null ( $H_{0}$ ) and the alternative ( $H_{1}$ ) hypothesis, respectively. Thus, denoting by ${\mbox{\boldmath$\theta$}}_{0}$ the elements of $\Theta_{0}$ and by ${\mbox{\boldmath$\theta$}}_{1}$ the elements of $\Theta_{1}$ , then the pdf of $Z$ under the $i$ th hypothesis can be written as $f_{i}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{i})$ , $i=0,1$ . It is important to stress here that $\bar{f}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}})$ is the actual pdf and is unknown, whereas $f_{i}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{i})$ is the pdf of $Z$ when²²2In what follows, we assume correctly specified models, namely that data distribution family is known. ${\mbox{\boldmath$\theta$}}={\mbox{\boldmath$\theta$}}_{i}\in\Theta_{i}$ .

Generally speaking, in many detection applications, the actual value of the parameter vector is unknown or, at least, partially known. In addition, not all the entries of $\theta$ are useful to take a decision for the specific problem at hand. As a consequence, we can split $\theta$ as ${\mbox{\boldmath$\theta$}}=[{\mbox{\boldmath$\theta$}}_{r}^{T},{\mbox{\boldmath$\theta$}}_{s}^{T}]^{T}$ , where ${\mbox{\boldmath$\theta$}}_{r}\in\Theta_{r}\subseteq{\mathds{R}}^{p_{r}\times 1}$ contains the parameters of interest, while the components of ${\mbox{\boldmath$\theta$}}_{s}\in\Theta_{s}\subseteq{\mathds{R}}^{p_{s}\times 1}$ represent the parameters that do not enter into the decision process and are called nuisance parameters [30]. From a more general perspective, it would be possible to associate a parameter vector of interest to each column of $Z$ with the consequence that ${\mbox{\boldmath$z$}}_{k}$ s are no longer identically distributed. However, as stated before, we prefer to proceed assuming that ${\mbox{\boldmath$z$}}_{k}$ s share the same parameter vector in order to maintain an easy notation and because the extension to the more general case is straightforward as we will show in Section V. Thus, with the above remarks in mind, a conventional binary hypothesis testing problem can be expressed as³³3Note that $p_{r}$ is the number of the parameters of interest and $p_{s}$ is the number of the nuisance parameters. It follows that $p=p_{r}+p_{s}$ .

\left\{\begin{array}[]{l}\begin{split}H_{0}:\displaystyle{\mbox{\boldmath$Z$}}\sim f_{0}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{0})&=f_{0}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})\\ &=\prod_{k=1}^{K}g_{0}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s}),\\ H_{1}:\displaystyle{\mbox{\boldmath$Z$}}\sim f_{1}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{1})&=f_{1}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s})\\ &=\prod_{k=1}^{K}g_{1}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s}),\end{split}\end{array}\right.

(1)

where ${\mbox{\boldmath$\theta$}}_{i}=[{\mbox{\boldmath$\theta$}}_{r,i}^{T},{\mbox{\boldmath$\theta$}}_{s}^{T}]^{T}\in{\mathds{R}}^{p\times 1}$ , $i=0,1$ , with ${\mbox{\boldmath$\theta$}}_{r,i}\in\Theta_{r}^{i}\subseteq{\mathds{R}}^{p_{r}\times 1}$ the parameter vector of interest under $H_{i}$ , $g_{i}(\cdot;\cdot)$ , $i=0,1$ , is the pdf of ${\mbox{\boldmath$z$}}_{k}$ , $k=1,\ldots,K$ , under $H_{i}$ .

Two remarks are now in order. First, note that $\{\Theta_{r}^{0},\Theta_{r}^{1}\}$ is a partition of $\Theta_{r}$ . Second, the above problem assumes that $p_{r}$ , $p_{s}$ , and, hence, the total number of unknown parameters $p$ , are perfectly known. However, in many radar (and, generally speaking, signal processing) applications, it is not seldom for $p$ to be unknown under the alternative hypothesis due to the fact that the size of ${\mbox{\boldmath$\theta$}}_{r,1}$ might depend on the specific operating scenario [21]. For instance, radar systems might face with situations where an unknown number of targets are present in the surveillance area [31, 13] or be under the attack of an unknown number of jammers [18, 32, 33]. Under this assumption, the hypothesis test can be modified as

\begin{cases}\mbox{under $H_{0}$}:\ {\mbox{\boldmath$\theta$}}_{r,0}\in\Theta_{r}^{0}\subseteq{\mathds{R}}^{p_{r,0}\times 1},\\ \mbox{under $H_{1}$}:\ {\mbox{\boldmath$\theta$}}_{r,1}\in\Theta_{r}^{m}\subseteq{\mathds{R}}^{p_{r,m}\times 1},\ p_{r,m}\in\Omega_{r},\end{cases}

(2)

where $p_{r,0}$ and $\Omega_{r}=\{p_{r,1},\ldots,p_{r,M}\}$ with $p_{r,1}\leq\ldots\leq p_{r,M}$ are known, while $p_{r,m}$ is unknown. It follows that the pdf of $Z$ (and, hence, that of ${\mbox{\boldmath$z$}}_{k}$ ) under $H_{1}$ depends on $p_{r,m}$ and, more important, the uncertainty on the latter leads to a testing problem formed by multiple (possibly nested) $H_{1}$ hypotheses, i.e.,

\begin{cases}H_{0}:&\displaystyle{\mbox{\boldmath$Z$}}\sim f_{0}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s}),\\ H_{{1,1}}:&{\mbox{\boldmath$Z$}}\sim f_{1,1}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,1}),\\ \vdots&\vdots\\ H_{{1,M}}:&{\mbox{\boldmath$Z$}}\sim f_{1,M}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,M}),\end{cases}

(3)

where $f_{1,m}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,m})=\prod_{k=1}^{K}g_{1,m}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,m})$ is the pdf of $Z$ under $H_{1,m}$ , namely when ${\mbox{\boldmath$\theta$}}_{r,1}\in\Theta_{r}^{m}$ , with $g_{1,m}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,m})$ the pdf of ${\mbox{\boldmath$z$}}_{k}$ under $H_{1,m}$ .

In the next section, we propose an Information-theoretic based approach to deal with problem (3) exploiting the KLIC [27]. Specifically, this criterion relies on the measurement of a certain distance, the so-called KLD, between a candidate distribution belonging to the family of densities $\mbox{$\mathcal{F}$}=\left\{f_{0},\ f_{1,m},\ m\in\{1,\ldots,M\}\right\}$ and the actual distribution of $Z$ , which is assumed to lie in $\mathcal{F}$ and denoted by

\bar{f}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}})=\prod_{k=1}^{K}\bar{g}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}),

(4)

where $\bar{g}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}})$ is the true pdf of ${\mbox{\boldmath$z$}}_{k}$ , $k=1,\ldots,K$ . Besides, we suppose that the inequalities required to invoke the Kintchine’s Strong Law of Large Numbers [34] are valid, namely

	$\displaystyle\|E_{\bar{g}}[\log g_{1,m}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,m})]\|$	$\displaystyle<+\infty,$		(5)
	$\displaystyle\|E_{\bar{g}}[\log g_{0}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})]\|$	$\displaystyle<+\infty.$		(6)

as well as the following “regularity conditions” [21]

\begin{split}\frac{1}{T}&\frac{\partial^{2}}{\partial{\mbox{\boldmath$\theta$}}\partial{\mbox{\boldmath$\theta$}}^{T}}\left[\log\prod_{k=1}^{K}f_{1,m}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,m})\right]\\ &\overset{T\rightarrow\infty}{\longrightarrow}\frac{1}{T}E\left\{\frac{\partial^{2}}{\partial{\mbox{\boldmath$\theta$}}\partial{\mbox{\boldmath$\theta$}}^{T}}\left[\log\prod_{k=1}^{K}f_{1,m}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,m})\right]\right\},\end{split}

(7)

and ${p}/{T}\overset{T\rightarrow\infty}{\longrightarrow}0$ , which are required to suitably approximate the KLD. In the above equations, $T$ represents the total number of real-valued observations, that, for the problem at hand, is equal to $2NK$ since we are dealing with complex vectors.

Note that a “minimum information distance” selection criterion has already been successfully applied for model order estimation giving rise to the so-called MOS rules [21, 35]. The resulting selection architectures share the same structure consisting of a fitting term (the compressed log-likelihood function) plus an adjustment which also depends on the number of parameters. As a consequence, the parameter number diversity comes into play to moderate the overfitting inclination of the compressed likelihood in the case of nested hypothesis. In fact, in this context, the KLIC-based rules can provide satisfactory classification performance whereas the Maximum Likelihood (ML) approach may fail because the likelihood function monotonically increases with $p_{r,m}$ and the ML estimate (MLE) of $p_{r}$ will always be $p_{r,M}$ (or, equivalently, the MLE of $p$ will be $p_{r,M}+p_{s}$ ).

To conclude this preliminary section, for the reader ease, we recall that the KLD [27] (also called relative entropy) between $\bar{f}$ and $f_{1,m}$ , namely the pdf of a generic candidate model under $H_{1}$ , can be written as [36]

D(\bar{f},f_{1,m})=\int_{-\infty}^{\infty}\bar{f}({\mbox{\boldmath$Z$}})\log\frac{\bar{f}({\mbox{\boldmath$Z$}})}{f_{1,m}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,m})}d{\mbox{\boldmath$Z$}},

(8)

where $d{\mbox{\boldmath$Z$}}=dz^{r}_{1,1}dz^{i}_{1,1}\ldots dz^{r}_{N,K}dz^{i}_{N,K}$ with $z^{r}_{n,k}$ and $z^{i}_{n,k}$ the real and imaginary parts of the $n$ th component of ${\mbox{\boldmath$z$}}_{k}$ , and can be decomposed into the sum of two terms⁴⁴4We assume that the considered pdfs exist with respect to a Lebesgue measure.

$\displaystyle D(\bar{f},f_{1,m})$	$\displaystyle=\int_{-\infty}^{\infty}\bar{f}({\mbox{\boldmath$Z$}})\log{\bar{f}({\mbox{\boldmath$Z$}})}d{\mbox{\boldmath$Z$}}$
	$\displaystyle-\int_{-\infty}^{\infty}\bar{f}({\mbox{\boldmath$Z$}})\log{f_{1,m}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,m})}d{\mbox{\boldmath$Z$}}$
	$\displaystyle=-h(\bar{f})+h(\bar{f},f_{1,m}),$	(9)

where $h(\bar{f})$ is the differential entropy of $\bar{f}$ and $h(\bar{f},f_{1,m})$ is the cross entropy between $\bar{f}$ and $f_{1,m}$ . Note that unlike $h(\bar{f})$ , $h(\bar{f},f_{1,m})$ depends on the $m$ th model (or hypothesis). Analogously, we can write the KLD with respect to the pdf under $H_{0}$ as

D(\bar{f},f_{0})=-h(\bar{f})+h(\bar{f},f_{0}).

(10)

Recall that $D(\cdot,\cdot)$ is not a true distance between distributions since it is not symmetric and does not satisfy the triangle inequality [36]. Nonetheless, it is often useful to think of the KLD as a “distance” between distributions. Finally, the KLD can be interpreted as the information loss when either $f_{1,m}$ or $f_{0}$ is used to approximate $\bar{f}$ [20].

III KLIC-based Decision Rules

In this section, we exploit the KLIC to devise decision schemes for problem (3). To this end, we proceed by considering the case where all the parameters are known and then evolve to more difficult cases where the parameters become unknown. It is important to observe that we define here an information theoretic framework where well-established decision rules as the Likelihood Ratio Test and the GLRT are suitably encompassed.

III-A KLIC-based Detectors for Known Model and Parameters

In this case, the number of alternative hypotheses is reduced to $1$ and problem (3) turns into problem (1) with the additional assumption that ${\mbox{\boldmath$\theta$}}_{0}$ and ${\mbox{\boldmath$\theta$}}_{1}$ are known. Moreover, the number of parameters of interest is $p_{r,\bar{m}}$ and is known. As a consequence, data distribution is completely determined by the hypotheses and the true pdf of $Z$ belongs to the following family $\mbox{$\mathcal{F}$}_{\theta,p}=\{f_{0},\ f_{1,\bar{m}}\}$ . Thus, a natural test based upon KLIC would decide for the hypothesis where the associated pdf (computed at $Z$ ) exhibits the “minimum distance” from $\bar{f}$ . Otherwise stated, such test can be formulated as

D(\bar{f},f_{0})\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}D(\bar{f},f_{1,\bar{m}}).

(11)

The above rule selects $H_{0}$ if the distance between $\bar{f}$ and $f_{0}$ is lower than that between $\bar{f}$ and $f_{1,\bar{m}}$ . In the opposite case, it decides for $H_{1}$ . Now, let us exploit (9) and (10) to recast (11) as

-h(\bar{f})+h(\bar{f},f_{0})\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}-h(\bar{f})+h(\bar{f},f_{1,\bar{m}}),

(12)

or, equivalently,

h(\bar{f},f_{0})-h(\bar{f},f_{1,\bar{m}})\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0.

(13)

From the information theory point of view, the above forms highlight that the considered decision rule minimizes the loss of information which occurs when $\bar{f}$ is approximated with $f_{1,\bar{m}}$ or $f_{0}$ [20]. Moreover, using (9), we can rewrite (13) as

	$\displaystyle E_{\bar{f}}[\log f_{1,\bar{m}}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})]$
	$\displaystyle-E_{\bar{f}}[\log f_{0}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})]\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0$	(15)
$\displaystyle\Rightarrow$	$\displaystyle\sum_{k=1}^{K}E_{\bar{g}}[\log g_{1,\bar{m}}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})]$
	$\displaystyle-\sum_{k=1}^{K}E_{\bar{g}}[\log g_{0}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})]\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0$	(17)
$\displaystyle\Rightarrow$	$\displaystyle KE_{\bar{g}}[\log g_{1,\bar{m}}({\mbox{\boldmath$z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})]$
	$\displaystyle-KE_{\bar{g}}[\log g_{0}({\mbox{\boldmath$z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})]\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0$	(19)
$\displaystyle\Rightarrow$	$\displaystyle E_{\bar{g}}[\log g_{1,\bar{m}}({\mbox{\boldmath$z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})]$
	$\displaystyle-E_{\bar{g}}[\log g_{0}({\mbox{\boldmath$z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})]\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0$	(21)
$\displaystyle\Rightarrow$	$\displaystyle\Lambda({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s},\bar{m})=$
	$\displaystyle E_{\bar{g}}\left[\log\frac{g_{1,\bar{m}}({\mbox{\boldmath$z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})}{g_{0}({\mbox{\boldmath$z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})}\right]\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0.$	(23)

Note that starting from (19), we have dropped the subscript of ${\mbox{\boldmath$z$}}_{k}$ since ${\mbox{\boldmath$z$}}_{1},\ldots,{\mbox{\boldmath$z$}}_{K}$ are i.i.d. random vectors. Test (23) cannot be applied in practice because it involves the computation of the expected log-likelihood ratio and requires the knowledge of $\bar{g}(\cdot)$ (or, equivalently, of $\bar{f}(\cdot)$ ). For this reason, we replace $\Lambda$ with a suitable estimate which is function of the observed data. To this end, we resort to the following unbiased estimator [37]

\begin{split}\widehat{\Lambda}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s},\bar{m})&=\frac{1}{K}\sum_{k=1}^{K}\log\frac{g_{1,\bar{m}}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})}{g_{0}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})}\\ &=\frac{1}{K}\log\frac{f_{1,\bar{m}}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})}{f_{0}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})}.\end{split}

(24)

In fact, since the technical assumptions of the Kintchine’s Strong Law of Large Numbers hold true due to (5) and (6), in the limit for $K\rightarrow+\infty$ , we have that $\widehat{\Lambda}$ converges almost surely to $\Lambda$ , namely

\widehat{\Lambda}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s},\bar{m})\xrightarrow{a.s.}\Lambda({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s},\bar{m}).

(25)

Summarizing, test (23) is replaced by

\widehat{\Lambda}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s},\bar{m})\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0,

(26)

where $\widehat{\Lambda}$ is a random variable whose value depends on the observed data. An alternative derivation for (26) consists in replacing $f_{1,\bar{m}}(\cdot)$ and $f_{0}(\cdot)$ with $g_{1,\bar{m}}(\cdot)$ and $g_{0}(\cdot)$ , respectively, in (11), while the EDD, whose expression is $\varXi({\mbox{\boldmath$z$}})=\frac{1}{K}\sum_{k=1}^{K}\delta({\mbox{\boldmath$z$}}-{\mbox{\boldmath$z$}}_{k})$ , is used in place of $\bar{f}(\cdot)$ . Thus, the KLD between the EDD and the distribution of a generic ${\mbox{\boldmath$z$}}_{k}$ is measured to decide which hypothesis is in force. As a matter of fact, criterion (11) becomes

	$\displaystyle D(\varXi,g_{0})\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}D(\varXi,g_{1,\bar{m}})$	(28)
$\displaystyle\Rightarrow$	$\displaystyle E_{\varXi}[\log g_{1,\bar{m}}({\mbox{\boldmath$z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})]$
	$\displaystyle-E_{\varXi}[\log g_{0}({\mbox{\boldmath$z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})]\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0$	(30)
$\displaystyle\Rightarrow$	$\displaystyle\sum_{k=1}^{K}\log g_{1,\bar{m}}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})$
	$\displaystyle-\sum_{k=1}^{K}\log g_{0}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0$	(32)
$\displaystyle\Rightarrow$	$\displaystyle\log\frac{\displaystyle\prod_{k=1}^{K}g_{1,\bar{m}}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})}{\displaystyle\prod_{k=1}^{K}g_{0}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})}\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0.$	(34)

Finally, note that the detection threshold of test (26) is set to zero and, as a consequence, it does not allow for a control on the probability of type I error also known as $P_{fa}$ in Detection Theory [30]. In order to circumvent this limitation, the decision rule can be modified as

\log\frac{f_{1,\bar{m}}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})}{f_{0}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})}\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}\eta,

(35)

where the detection threshold, $\eta$ , is suitably tuned in order to guarantee the desired⁵⁵5Hereafter, symbol $\eta$ is used to denote the generic detection threshold. $P_{fa}$ . Remarkably, the decision rule (35) is statistically equivalent to the LRT or the Neyman-Pearson test [38].

III-B KLIC-based Detectors for Known Model and Unknown Parameters

Let us now consider (III-A) and assume that only the number of parameters of interest, $p_{r,\bar{m}}$ say, is known, whereas ${\mbox{\boldmath$\theta$}}_{r,1}$ , ${\mbox{\boldmath$\theta$}}_{r,0}$ , and ${\mbox{\boldmath$\theta$}}_{s}$ are unknown. In this case the family of candidate models becomes

\begin{split}\mbox{$\mathcal{F}$}_{p}&=\mbox{$\mathcal{F}$}_{0}\cup\mbox{$\mathcal{F}$}_{1}=\{f_{0}(\cdot;{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s}):{\mbox{\boldmath$\theta$}}_{r,0}\in\Theta_{r}^{0},\ {\mbox{\boldmath$\theta$}}_{s}\in\Theta_{s}\}\\ &\cup\{f_{1,\bar{m}}(\cdot;{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}}):\ {\mbox{\boldmath$\theta$}}_{r,1}\in\Theta_{r}^{\bar{m}},\ {\mbox{\boldmath$\theta$}}_{s}\in\Theta_{s}\},\end{split}

(36)

where $\Theta_{r}^{0}$ and $\Theta_{r}^{\bar{m}}$ form a partition of the parameter space of interest while $\Theta_{s}$ is the nuisance parameter space. Note that in this case, the hypotheses of (1) are composite implying that, in order to build up a decision rule based upon (11), the unknown parameters ${\mbox{\boldmath$\theta$}}_{r,1}$ , ${\mbox{\boldmath$\theta$}}_{r,0}$ , and ${\mbox{\boldmath$\theta$}}_{s}$ must be estimated from data. Among different alternatives, we resort to the ML approach, which enjoys “good” asymptotic properties [39]. In fact, given a model, the consistency (when it holds) of the MLE ensures that it converges in probability to the true parameter value, which is also the minimizer of the KLD (see [20, 40] and references therein). Thus, in (III-A), we replace ${\mbox{\boldmath$\theta$}}_{r,1}$ , ${\mbox{\boldmath$\theta$}}_{r,0}$ , and ${\mbox{\boldmath$\theta$}}_{s}$ with their respective MLEs under each hypothesis. Specifically, denoting by $\widehat{{\mbox{\boldmath$\theta$}}}_{r,i}$ , $i=0,1$ , the MLE of ${\mbox{\boldmath$\theta$}}_{r,i}$ and by $\widehat{{\mbox{\boldmath$\theta$}}}_{s,i}$ the MLE of ${\mbox{\boldmath$\theta$}}_{s}$ under the $H_{i}$ hypothesis, $i=0,1$ , (III-A) can be recast as

	$\displaystyle E_{\bar{f}}[\log f_{1,\bar{m}}({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,1},\widehat{{\mbox{\boldmath$\theta$}}}_{s,1},p_{r,\bar{m}})]$
	$\displaystyle-E_{\bar{f}}[\log f_{0}({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,0},\widehat{{\mbox{\boldmath$\theta$}}}_{s,0})]\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0$		(38)
	$\displaystyle\Rightarrow\Lambda_{1}({\mbox{\boldmath$Z$}};\bar{m})=E_{\bar{f}}\left[\log\frac{f_{1,\bar{m}}({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,1},\widehat{{\mbox{\boldmath$\theta$}}}_{s,1},p_{r,\bar{m}})}{f_{0}({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,0},\widehat{{\mbox{\boldmath$\theta$}}}_{s,0})}\right]\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0.$		(40)

Now, in place of the expectation with respect to the unknown $\bar{f}$ , we use an unbiased estimator of $\Lambda_{1}({\mbox{\boldmath$Z$}};\bar{m})$ (see also equation (48) of [21]), namely

\widehat{\Lambda}_{1}({\mbox{\boldmath$Z$}};\bar{m})=\log\frac{f_{1,\bar{m}}({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,1},\widehat{{\mbox{\boldmath$\theta$}}}_{s,1},p_{r,\bar{m}})}{f_{0}({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,0},\widehat{{\mbox{\boldmath$\theta$}}}_{s,0})},

(41)

and introduce a threshold to control the $P_{fa}$ yielding

\widehat{\Lambda}_{1}({\mbox{\boldmath$Z$}};\bar{m})\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}\eta.

(42)

It is important to underline that the above test is statistically equivalent to GLRT for known $p_{r}$ .

The same result can be derived showing that the MLEs minimize the KLD between the EDD and the candidate model with respect to the unknown parameters. To this end, let us start from (28) and minimize both sides with respect to the unknown parameters

\min_{{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s}}D(\varXi,g_{0})\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}\min_{{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s}}D(\varXi,g_{1,\bar{m}}).

(43)

The above problem is equivalent to

	$\displaystyle\min_{{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s}}\{-E_{\varXi}[\log g_{0}({\mbox{\boldmath$z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})]\}\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}$		(45)
	$\displaystyle\min_{{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s}}\{-E_{\varXi}[\log g_{1,\bar{m}}({\mbox{\boldmath$z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})]\}$		(46)

	$\displaystyle\Rightarrow\underbrace{\min_{{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s}}\left[-\sum_{k=1}^{K}\log g_{0}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})\right]}_{P_{1}}\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}$		(48)
	$\displaystyle\underbrace{\min_{{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s}}\left[-\sum_{k=1}^{K}\log g_{1,\bar{m}}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}})\right]}_{P_{2}}.$		(49)

Now, note that $P_{1}$ and $P_{2}$ are equivalent to

	$\displaystyle-\max_{{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s}}\sum_{k=1}^{K}\log g_{0}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s}),$		(50)
	$\displaystyle-\max_{{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s}}\sum_{k=1}^{K}\log g_{1,\bar{m}}({\mbox{\boldmath$z$}}_{k};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,\bar{m}}),$		(51)

respectively, and, hence, the sought minimizers coincides with the MLEs of ${\mbox{\boldmath$\theta$}}_{r,1}$ , ${\mbox{\boldmath$\theta$}}_{r,0}$ , ${\mbox{\boldmath$\theta$}}_{s}$ (the latter under each hypothesis), namely $\widehat{{\mbox{\boldmath$\theta$}}}_{r,1}=\arg\min_{{\mbox{\boldmath$\theta$}}_{r,1}}D(\varXi,g_{1,\bar{m}})$ , $\widehat{{\mbox{\boldmath$\theta$}}}_{s,1}=\arg\min_{{\mbox{\boldmath$\theta$}}_{s}}D(\varXi,g_{1,\bar{m}})$ , $\widehat{{\mbox{\boldmath$\theta$}}}_{r,0}=\arg\min_{{\mbox{\boldmath$\theta$}}_{r,0}}D(\varXi,g_{0})$ , and $\widehat{{\mbox{\boldmath$\theta$}}}_{s,0}=\arg\min_{{\mbox{\boldmath$\theta$}}_{s}}D(\varXi,g_{0})$ . With the above result in mind, (49) can be recast as

	$\displaystyle\sum_{k=1}^{K}\log g_{1,\bar{m}}({\mbox{\boldmath$z$}}_{k};\widehat{{\mbox{\boldmath$\theta$}}}_{r,1},\widehat{{\mbox{\boldmath$\theta$}}}_{s,1},p_{r,\bar{m}})$
	$\displaystyle-\sum_{k=1}^{K}\log g_{0}({\mbox{\boldmath$z$}}_{k};\widehat{{\mbox{\boldmath$\theta$}}}_{r,0},\widehat{{\mbox{\boldmath$\theta$}}}_{s,0})\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0$	(53)
$\displaystyle\Rightarrow$	$\displaystyle\log\frac{\displaystyle\prod_{k=1}^{K}g_{1,\bar{m}}({\mbox{\boldmath$z$}}_{k};\widehat{{\mbox{\boldmath$\theta$}}}_{r,1},\widehat{{\mbox{\boldmath$\theta$}}}_{s,1},p_{r,\bar{m}})}{\displaystyle\prod_{k=1}^{K}g_{0}({\mbox{\boldmath$z$}}_{k};\widehat{{\mbox{\boldmath$\theta$}}}_{r,0},\widehat{{\mbox{\boldmath$\theta$}}}_{s,0})}\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0,$	(55)

which coincides with (41).

III-C KLIC-based Detectors for Unknown Model and Parameters

In this last subsection, we deal with the most general case where $p_{r}$ , ${\mbox{\boldmath$\theta$}}_{r,1}$ , ${\mbox{\boldmath$\theta$}}_{r,0}$ , and ${\mbox{\boldmath$\theta$}}_{s}$ are unknown. As stated in Section II, under this assumption, there exist multiple alternative hypotheses depending on the model order $p=p_{r,m}+p_{s}$ .

As possible strategy to select the most likely hypothesis, we might follow the same line of reasoning as in the previous subsection replacing $p$ with its MLE. However, if on one hand, this approach (given $p_{r}$ ) makes sense for ${\mbox{\boldmath$\theta$}}_{r,1}$ , ${\mbox{\boldmath$\theta$}}_{r,0}$ , and ${\mbox{\boldmath$\theta$}}_{s}$ , on the other, when the considered models are nested, it fails in the estimation of $p_{r}$ . In fact, the log-likelihood function monotonically increases with $p_{r}$ and, as a consequence, the ML approach will always select the maximum possible order leading to an overfitting. Therefore, an alternative approach is required in order to find “good” approximations of the negative cross entropy which moderate the overfitting inclination of the ML approach. To this end, we follow the same line of reasoning used to derive the MOS rules as shown in [21], where suitable Taylor series expansions of the cross entropy (used in (III-A)) are exploited. Then, the dependence on $p$ is removed by optimizing these expansions over the latter, which is tantamount to minimizing approximations of the KLD between $\bar{f}$ and $f_{1,m}$ with respect to the unknown parameters.

As for the null hypothesis, it is independent of $p$ and, hence, we can exploit previously devised estimators. Specifically, we replace $E_{\bar{f}}[\log f_{0}({\mbox{\boldmath$Z$}};{{\mbox{\boldmath$\theta$}}}_{r,0},{{\mbox{\boldmath$\theta$}}}_{s})]$ with the same estimator as in the previous subsection, namely $\log f_{0}({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,0},\widehat{{\mbox{\boldmath$\theta$}}}_{s,0})$ .

Now, let us define $\mbox{$\mathcal{I}$}_{m}({\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s})=-h(\bar{f},f_{1,m})$ and denote by $\widehat{\mbox{$\mathcal{I}$}}_{m}$ an estimator of the former. Following the lead of [21], several alternatives are possible for $\widehat{\mbox{$\mathcal{I}$}}_{m}$ , namely

•

through equations (49)-(53) of [21], we come up with

\widehat{\mbox{$\mathcal{I}$}}_{m}=\log f_{1,m}({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,1},\widehat{{\mbox{\boldmath$\theta$}}}_{s,1},p_{r,m})-\frac{p_{r,m}+p_{s}}{2};

(56)

•

through equations (57)-(62) of [21], we obtain

\widehat{\mbox{$\mathcal{I}$}}_{m}=\log f_{1,m}({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,1},\widehat{{\mbox{\boldmath$\theta$}}}_{s,1},p_{r,m})-(p_{r,m}+p_{s});

(57)

•

through equations (59)-(60) and (73) of [21], we have that

\begin{split}\widehat{\mbox{$\mathcal{I}$}}_{m}=&\log f_{1,m}({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,1},\widehat{{\mbox{\boldmath$\theta$}}}_{s,1},p_{r,m})-\frac{1+\rho}{2}(p_{r,m}+p_{s}),\\ &\rho>1;\end{split}

(58)

•

through equations (79)-(86) of [21], we get

\widehat{\mbox{$\mathcal{I}$}}_{m}=\log f_{1,m}({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,1},\widehat{{\mbox{\boldmath$\theta$}}}_{s,1},p_{r,m})-\frac{p_{r,m}+p_{s}}{2}\log(T)+C,

(59)

where $C$ is a constant. Note that the above expression results from an asymptotic approximation for sufficiently large values of $T$ of the more general form [21]

\widehat{\mbox{$\mathcal{I}$}}_{m}=\log f_{1,m}({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,1},\widehat{{\mbox{\boldmath$\theta$}}}_{s,1},p_{r,m})-\frac{1}{2}\log\det\widehat{{\mbox{\boldmath$J$}}}+C,

(60)

where

\widehat{{\mbox{\boldmath$J$}}}=\left.\left[-\frac{\partial^{2}}{\partial{\mbox{\boldmath$\theta$}}\partial{\mbox{\boldmath$\theta$}}^{T}}\log f_{1,m}({\mbox{\boldmath$Z$}};{{\mbox{\boldmath$\theta$}}}_{r,1},{{\mbox{\boldmath$\theta$}}}_{s,1},p_{r,m})\right]\right|_{{{\mbox{\boldmath$\theta$}}}_{r,1}=\widehat{{\mbox{\boldmath$\theta$}}}_{r,1}\atop{{\mbox{\boldmath$\theta$}}}_{s,1}=\widehat{{\mbox{\boldmath$\theta$}}}_{s,1}}

(61)

It is clear that other approximations are possible by considering the asymptotic behavior with respect to different parameters.

Finally, an estimate of $m$ can be obtained as

\widehat{m}=\operatorname*{arg\,max}_{m\in\{1,\ldots,M\}}\widehat{\mbox{$\mathcal{I}$}}_{m}

(62)

with $p_{r,\widehat{m}}$ the corresponding estimate of $p_{r}$ , and we can replace each addendum of (III-A) with the respective approximation to come up with the following adaptive rule

\widehat{\mbox{$\mathcal{I}$}}_{\widehat{m}}-\log f({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,0},\widehat{{\mbox{\boldmath$\theta$}}}_{s,0})\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}0,

(63)

which, introducing the threshold to control the $P_{fa}$ , can be recast as

\displaystyle\max_{m\in\{1,\ldots,M\}}\left\{\widehat{\Lambda}_{1}({\mbox{\boldmath$Z$}};m)-h(m)\right\}\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}\eta,

(64)

where

h(m)=\begin{cases}(p_{r,m}+p_{s})/2,&\mbox{(a)}\\ (p_{r,m}+p_{s}),&\mbox{(b)}\\ \displaystyle\frac{1+\rho}{2}(p_{r,m}+p_{s}),\ \rho>1,&\mbox{(c)}\\ \displaystyle\frac{(p_{r,m}+p_{s})}{2}\log(T),&\mbox{(d)}\end{cases}

(65)

is a penalty term.

Before concluding this section, an important remark is in order. Specifically, let us focus on (64) and observe that $\widehat{\Lambda}_{1}({\mbox{\boldmath$Z$}};m)$ is the logarithm of the generalized likelihood ratio (GLR) assuming that the model order is $p_{r,m}+p_{s}$ . Thus, for sufficiently large $T$ , it would be reasonable to consider alternative decision statistics which share the same asymptotic behavior as the GLR. For instance, the GLR can be replaced by the decision statistics of the Rao or Wald test [30]. Summarizing, decision rule (64) represents a starting point for the design of detection architectures dealing with multiple alternative hypotheses. Another modification of (64) may consist in replacing $\widehat{\Lambda}_{1}({\mbox{\boldmath$Z$}};m)$ with the decision statistics derived applying ad hoc GLRT-based design procedures [41, 42, 43, 44, 32]. Finally, in the Appendix, we show that (64) can be also considered as the result of the joint application of the Bayesian and ML framework after assigning a suitable prior to the number of parameters.

IV Invariance Issues and CFAR Property

The design of architectures capable of guaranteeing the CFAR property is an issue of primary concern in radar (as well as in other application fields) since it allows for reliable target detection also in those situations where the interference (or unwanted components) spectral properties are unknown or highly variable. As a matter of fact, controlling and keeping low the number of false alarms is a precaution aimed at avoiding the disastrous effects that the latter may ensue. Thus, system engineers set detection thresholds in order to guarantee very small values for the $P_{fa}$ [45, 46, 47, 48, 49, 50]. Unfortunately, the CFAR property is not granted by a generic detection scheme and, hence, before claiming the CFARness for a given receiver, it must be proved that its decision statistic does not depend on the interference parameters under the null hypothesis. However, there exist some cases where the decision statistic is functionally invariant to those transformations that modify the nuisance parameters, which do not enter the decision process, and, at the same time, preserve the hypothesis testing problem. As a consequence, under the null hypothesis, such statistic can be expressed as a function of random quantities whose distribution is independent of the nuisance parameters ensuring the CFAR property [51, 41, 44].

Therefore, the above remarks suggest that it may be reasonable to look for decision rules that are invariant to nuisance parameters in the sense described above. To this end, we can invoke the Principle of Invariance [52, 53], whose key idea consists in finding a specific group of transformations, formed by a set $\mathcal{G}$ equipped with a binary composition operation $\circ$ , that leaves unaltered the formal structure of the hypothesis testing problem (also inducing a group of transformations in the parameter space) and the family of distribution under each hypothesis. Then, a data compression can be accomplished by resorting to the so-called maximal invariant statistics which organize data into distinguishable classes of equivalence with respect to the group of transformations wherein such statistics are constant. Now, given a group of transformations, every invariant test can be written in terms of the maximal invariant statistic whose distribution may depend only on a function of the parameters (induced maximal invariant). If the latter exists and is constant over $\Theta_{0}$ , then any invariant test guarantees the CFAR property with respect to the unknown nuisance parameters.

In what follows, we provide two propositions which allows to state when (64) is invariant with respect to a given group of transformations and, possibly, enjoys the CFAR property.

Proposition 1.

Let us assume that there exists a group of transformations $\mbox{$\mathcal{L}$}=(\mbox{$\mathcal{G}$},\circ)$ , which acts on data through $l(\cdot)$ and leaves both the following binary hypothesis testing problems

\mbox{$\mathcal{P}$}_{m}:\left\{\begin{array}[]{ll}H_{0}:&\displaystyle{\mbox{\boldmath$Z$}}\sim f_{0}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s}),\\ H_{{1,m}}:&{\mbox{\boldmath$Z$}}\sim f_{1,m}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s},p_{r,m}),\end{array}\right.

(66)

for all $m\in\{1,\ldots,M\}$ and the data distribution family unaltered, then the decision statistic (64) is invariant with respect to $\mathcal{L}$ .

Proof.

Since each problem $\mbox{$\mathcal{P}$}_{m}$ is invariant with respect to $\mathcal{L}$ by definition, the GLR for the $m$ th testing problem, namely $\widehat{\Lambda}_{1}({\mbox{\boldmath$Z$}};m)$ , is invariant to the same transformation group as shown in [52, 54], namely

\widehat{\Lambda}_{1}(l({\mbox{\boldmath$Z$}});m)=\widehat{\Lambda}_{1}({\mbox{\boldmath$Z$}};m),\quad\forall m\in\{1,\ldots,M\}.

(67)

As a consequence, we obtain that

\begin{split}&\displaystyle\max_{m\in\{1,\ldots,M\}}\left\{\widehat{\Lambda}_{1}(l({\mbox{\boldmath$Z$}});m)-h(m)\right\}\\ &=\displaystyle\max_{m\in\{1,\ldots,M\}}\left\{\widehat{\Lambda}_{1}({\mbox{\boldmath$Z$}};m)-h(m)\right\}.\end{split}

(68)

The last equality establishes the invariance of (64) with respect to $\mathcal{L}$ and concludes the proof. ∎

From a practical point of view, if, for all $m\in\{1,\ldots,M\}$ , the generic $\mbox{$\mathcal{P}$}_{m}$ is invariant with respect to a subgroup $\mbox{$\mathcal{L}$}_{m}$ of a more general group, then it is possible to obtain $\mathcal{L}$ as the intersection of the subgroups $\mbox{$\mathcal{L}$}_{1},\ldots,\mbox{$\mathcal{L}$}_{M}$ [55]. Moreover, observe that if (64) is invariant with respect to $\mathcal{L}$ then, by Theorem 6.2.1 of [52], its decision statistic can be expressed as a function of the previously described maximal invariant statistics. Next, under the hypothesis of Theorem 6.3.2 of [52], the distribution of a maximal invariant statistic depends on the induced maximal invariant which is denoted by $\xi({\mbox{\boldmath$\theta$}})$ and the following proposition holds true.

Proposition 2.

Let us assume that Proposition 1 is valid and that

\forall{\mbox{\boldmath$\theta$}}\in\Theta_{0}:\ \xi({\mbox{\boldmath$\theta$}})=c,

(69)

with $c\in{\mathds{R}}$ , then (64) ensures the CFAR property.

Proof.

Since (64) is invariant, its decision statistic is a function of the maximal invariant statistic, whose distribution does not depend on the specific value of the parameter vector under $H_{0}$ but only on $c$ . ∎

V Illustrative Examples

In this section, we show the effectiveness of the newly proposed approach in three operating scenarios related to radar systems. For each scenario, we compare the proposed solutions with Two-Stage (TS) architectures consisting of an estimation stage devoted to the model order selection and a detection stage which exploits the model order estimate provided by the first stage⁶⁶6Some of these competitors can be found in the open literature..

In what follows, the performance of the proposed approach is investigated in terms of the Probability of Correct Detection, $P_{d|m}$ say, defined as⁷⁷7Note that, focusing on problem (1) and assuming that $p_{r}$ is unknown, the $P_{d|m}$ can be recast as $P(H_{1},\hat{m}=m|H_{m})$ with $m$ the actual hypothesis order. $P_{d|m}=P(H_{m}|H_{m})$ as well as either the classification histograms (a point better explained in the next subsections) or the Root Mean Square Errors (RMSEs) of the interest parameters. Moreover, the above performance metrics are computed resorting to standard Monte Carlo counting techniques over $10^{4}$ and $100/P_{fa}$ independent trials to estimate the $P_{d|m}$ and the thresholds to guarantee a preassigned $P_{fa}=P(H_{m},m>0|H_{0})$ , respectively. Finally, all the numerical examples assume $K=32$ , $P_{fa}=10^{-4}$ , a thermal noise power $\sigma^{2}_{n}=1$ , and a radar system equipped with $N=16$ spatial channels leading to the following expression for the nominal steering vector [44] ${\mbox{\boldmath$v$}}(\theta)=\frac{1}{4}\left[1,e^{j\pi\sin\theta},\ldots,e^{j\pi 15\sin\theta}\right]^{T}$ , where $\theta$ is the steering angle measured with respect to the antenna boresight.

V-A Multiple Noise-like Jammers Detection

Let us assume that the considered radar system is listening to the environment in the presence of an unknown number of NLJs. Samples of interest are organized into $N$ -dimensional vectors denoted by ${\mbox{\boldmath$z$}}_{k}$ , $k=1,\ldots,K$ . Note that, in this case, data under test are not affected by clutter since they are collected without transmitting any signal [56, 18]. Thus, the detection problem at hand can be formulated in terms of the following multiple hypothesis test

\left\{\begin{array}[]{lll}H_{0}:&{\mbox{\boldmath$z$}}_{k}\sim\mbox{$\mathcal{C}$}\mbox{$\mathcal{N}$}_{N}({\mbox{\boldmath$0$}},\sigma^{2}_{n}{\mbox{\boldmath$I$}}),&\!k=1,\ldots,K,\\ H_{m}:&{\mbox{\boldmath$z$}}_{k}\sim\mbox{$\mathcal{C}$}\mbox{$\mathcal{N}$}_{N}({\mbox{\boldmath$0$}},\sigma^{2}_{n}{\mbox{\boldmath$I$}}+{\mbox{\boldmath$M$}}_{J}(m)),&\!k=1,\ldots,K,\\ &&\!m=1,\ldots,N_{J},\end{array}\right.

(70)

where ${\mbox{\boldmath$M$}}_{J}(m)\in{\mathds{C}}^{N\times N}$ is the unknown positive semidefinite matrix representing the jammer component whose rank is denoted by $m$ and is related to the actual number of directions from which interfering signals hit the system, $N_{J}\leq N$ is the maximum allowable number of such directions, and $\sigma^{2}_{n}{\mbox{\boldmath$I$}}$ with unknown $\sigma^{2}_{n}>0$ is the noise component.

In order to apply (64), we need to compute the logarithm of the GLR for (70) and the penalty term. The former is computed by following the lead of [18], where the role of $r$ is played by $m$ , to come up with

\begin{split}&\widehat{\Lambda}_{1}({\mbox{\boldmath$Z$}};m)=\log\left\{\frac{\displaystyle{\left[\frac{1}{K(N-m)}\sum_{i=m+1}^{N}\gamma_{i}\right]^{-K(N-m)}}}{\displaystyle{\prod_{i=1}^{m}\left(\frac{\gamma_{i}}{K}\right)^{K}\left[\frac{1}{NK}\sum_{i=1}^{N}\gamma_{i}\right]^{-NK}}}\right\},\\ &\quad m=1,\ldots,N_{J}.\end{split}

(71)

where ${\mbox{\boldmath$Z$}}=[{\mbox{\boldmath$z$}}_{1},\ldots,{\mbox{\boldmath$z$}}_{K}]$ and $\gamma_{1}\geq\ldots\geq\gamma_{N}\geq 0$ are the eigenvalues of ${\mbox{\boldmath$Z$}}{\mbox{\boldmath$Z$}}^{\dagger}$ . The penalty term can be written by noticing that the number of unknown parameters is $p=m(2N-m)+1$ while $T=2KN$ . Notice that $T$ depends on $N$ and, hence, we exploits (60) to find another suitable asymptotic approximation in place of (65)(d). Specifically, we assume that only $K$ goes to infinite obtaining $h(m)=\frac{p_{r,m}+p_{s}}{2}\log(K)$ (in the following we refer to this penalty term as modified (65)(d)).

The considered simulation scenario comprises three noise jammers with different Angle Of Arrivals (AOA), viz., $10^{\circ}$ , $20^{\circ}$ , and $-15^{\circ}$ , respectively, and sharing the same power. Therefore, the jammer component of the data covariance matrix can be written as ${\mbox{\boldmath$M$}}_{J}=\text{JNR}\sum_{i=1}^{3}{\mbox{\boldmath$v$}}(\theta_{i}){\mbox{\boldmath$v$}}^{{\dagger}}(\theta_{i})$ , where $\theta_{i}$ is the AOA of the $i$ th noise-like jammer and JNR the Jammer to Noise Ratio. Finally, we set $N_{J}=6$ . As already stated, for comparison purposes, we show the performance of analogous TS architectures where the MOS rule at the first stage shares up to a factor $2$ the same penalty term as (64) whereas the second stage is given by the GLRT for known $m$ [18].

Figure 1, where we plot the $P_{d|3}$ versus JNR, shows that the curves related to the new decision schemes are perfectly overlapped to those of the TS architectures. The worst performance is provided by (64) coupled with (65)(a) due to the nonnegligible attitude of the latter to overestimate the hypothesis order as shown in the next figure. Such behavior, although in a milder form, can also be noted for (64) coupled with (65)(b) whose detection curve exhibits a floor. On the other hand, (64) coupled with (65)(c) and (64) coupled with modified (65)(d) provide satisfactory performance.

Finally, in Figure 2, we plot the classification histograms, namely the probability to select $H_{m}$ , $m=1,2,3,4,5,6$ , under $H_{n}$ , $n=1,2,3$ and assuming $\text{JNR}=10$ dB. It clearly turns out the significant overestimation inclination of (64) coupled with (65)(a). The remaining decision schemes exhibit a percentage of correct classification very close to $100$ % except for (64) coupled with (65)(b) whose percentages are around $90$ %.

Other results not reported here for brevity highlight that (64) coupled with (65)(c) and (64) coupled with modified (65)(d) maintain good detection and classification performance for different parameter settings whereas the behavior of the other competitors is more sensitive to the specific parameter values experiencing in some situations a significant performance degradation.

Refer to caption — Figure 1: Multiple Noise-Like Jammers Detection. $P_{d|3}$ versus JNR for the considered architectures assuming $N=16$ , $K=32$ .

V-B Detection in the presence of Coherent Jammers

This second illustrative example concerns a scenario where the target echoes compete against a possible coherent jammer. Denoting by ${\mbox{\boldmath$z$}}\in{\mathds{C}}^{N\times 1}$ the cell under test and by ${\mbox{\boldmath$z$}}_{k}$ , $k=1,\ldots,K$ , the training data set used to estimate the unknown Interference Covariance Matrix (ICM) [44], the problem at hand can be recast as

\left\{\begin{array}[]{lll}H_{0}:&{\mbox{\boldmath$z$}}={\mbox{\boldmath$n$}},&\!{\mbox{\boldmath$z$}}_{k}={\mbox{\boldmath$n$}}_{k},\ k=1,\ldots,K,\\ H_{1}:&{\mbox{\boldmath$z$}}={\mbox{\boldmath$q$}}+{\mbox{\boldmath$n$}},&\!{\mbox{\boldmath$z$}}_{k}={\mbox{\boldmath$n$}}_{k},\ k=1,\ldots,K,\\ H_{2}:&{\mbox{\boldmath$z$}}=\alpha{\mbox{\boldmath$v$}}(\theta_{T})+{\mbox{\boldmath$n$}},&\!{\mbox{\boldmath$z$}}_{k}={\mbox{\boldmath$n$}}_{k},\ k=1,\ldots,K,\\ H_{3}:&{\mbox{\boldmath$z$}}=\alpha{\mbox{\boldmath$v$}}(\theta_{T})+{\mbox{\boldmath$q$}}+{\mbox{\boldmath$n$}},&\!{\mbox{\boldmath$z$}}_{k}={\mbox{\boldmath$n$}}_{k},\ k=1,\ldots,K,\end{array}\right.

(72)

where $\theta_{T}$ is the AOA of the target, ${\mbox{\boldmath$n$}},{\mbox{\boldmath$n$}}_{k}\sim\mbox{$\mathcal{C}$}\mbox{$\mathcal{N}$}_{N}({\mbox{\boldmath$0$}},{\mbox{\boldmath$M$}})$ , $k=1,\ldots,K$ , are statistically independent, ${\mbox{\boldmath$M$}}\in{\mathds{C}}^{N\times N}$ is the unknown ICM, ${\mbox{\boldmath$q$}}\in{\mathds{C}}^{N\times 1}$ is the unknown spatial signature of the coherent jammer, and $\alpha\in{\mathds{C}}$ is an unknown deterministic factor accounting for target and channel propagation effects. Hereafter, for simplicity, we denote by $v$ the nominal steering vector of the target, namely ${\mbox{\boldmath$v$}}(\theta_{T})$ .

In what follows, we exploit the subspace paradigm to model the coherent interference [11], namely, ${\mbox{\boldmath$q$}}={\mbox{\boldmath$J$}}\mbox{\boldmath{$a$}}$ , where ${\mbox{\boldmath$J$}}\in{\mathds{C}}^{N\times q}$ is a known full-column rank matrix whose columns span the jammer subspace⁸⁸8A priori information about $J$ can be gathered by exploiting an Electronic Support Measure system that provides a rough estimate of the coherent jammer AOA. and are linearly independent of $v$ , while ${\mbox{\boldmath$a$}}\in{\mathds{C}}^{q\times 1}$ is the vector of the jammer coordinates.

Now, using the results contained in [51, 11, 57], it is possible to show that

	$\displaystyle\widehat{\Lambda}_{1}({\mbox{\boldmath$Z$}},1)=(K+1)\bigg{\{}\log(1+{\mbox{\boldmath$z$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1}{\mbox{\boldmath$z$}})$		(73)
	$\displaystyle-\log\left[1+{\mbox{\boldmath$z$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1}{\mbox{\boldmath$z$}}-{\mbox{\boldmath$z$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1}{\mbox{\boldmath$J$}}\left({\mbox{\boldmath$J$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1}{\mbox{\boldmath$J$}}\right)^{-1}{\mbox{\boldmath$J$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1}{\mbox{\boldmath$z$}}\right]\bigg{\}},$
	$\displaystyle\widehat{\Lambda}_{1}({\mbox{\boldmath$Z$}},2)=(K+1)\bigg{[}\log(1+{\mbox{\boldmath$z$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1}{\mbox{\boldmath$z$}})$		(74)
	$\displaystyle-\log\left(1+{\mbox{\boldmath$z$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1}{\mbox{\boldmath$z$}}-\frac{{\mbox{\boldmath$z$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1}{\mbox{\boldmath$v$}}{\mbox{\boldmath$v$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1}{\mbox{\boldmath$z$}}}{{\mbox{\boldmath$v$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1}{\mbox{\boldmath$v$}}}\right)\bigg{]},$
	$\displaystyle\widehat{\Lambda}_{1}({\mbox{\boldmath$Z$}},3)=$		(75)
	$\displaystyle(K+1)\left[\log(1+{\mbox{\boldmath$z$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1}{\mbox{\boldmath$z$}})-\log\left(1+\tilde{{\mbox{\boldmath$z$}}}_{S}^{{\dagger}}\tilde{{\mbox{\boldmath$P$}}}_{\tilde{{\mbox{\boldmath$J$}}}_{S}}^{\perp}\tilde{{\mbox{\boldmath$z$}}}_{S}\right)\right],$

where ${\mbox{\boldmath$S$}}=\sum_{k=1}^{K}{\mbox{\boldmath$z$}}_{k}{\mbox{\boldmath$z$}}_{k}^{\dagger}$ , $\tilde{{\mbox{\boldmath$P$}}}_{\tilde{{\mbox{\boldmath$J$}}}_{S}}^{\perp}={\mbox{\boldmath$I$}}-\tilde{{\mbox{\boldmath$J$}}}_{S}\left(\tilde{{\mbox{\boldmath$J$}}}_{S}^{{\dagger}}\tilde{{\mbox{\boldmath$J$}}}_{S}\right)^{-1}\tilde{{\mbox{\boldmath$J$}}}_{S}^{{\dagger}}$ , with $\tilde{{\mbox{\boldmath$J$}}}_{S}=\left({\mbox{\boldmath$P$}}_{{\mbox{\boldmath$S$}}^{-1/2}{\mbox{\boldmath$v$}}}^{\perp}\right)^{1/2}{\mbox{\boldmath$J$}}_{S}$ , ${\mbox{\boldmath$J$}}_{S}={\mbox{\boldmath$S$}}^{-1/2}{\mbox{\boldmath$J$}}$ , and

{\mbox{\boldmath$P$}}_{{\mbox{\boldmath$S$}}^{-1/2}{\mbox{\boldmath$v$}}}^{\perp}={\mbox{\boldmath$I$}}-\frac{{\mbox{\boldmath$S$}}^{-1/2}{\mbox{\boldmath$v$}}{\mbox{\boldmath$v$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1/2}}{{\mbox{\boldmath$v$}}^{{\dagger}}{\mbox{\boldmath$S$}}^{-1}{\mbox{\boldmath$v$}}^{{\dagger}}},

having set $\tilde{{\mbox{\boldmath$z$}}}_{S}=\left({\mbox{\boldmath$P$}}_{{\mbox{\boldmath$S$}}^{-1/2}{\mbox{\boldmath$v$}}}^{\perp}\right)^{1/2}{\mbox{\boldmath$z$}}_{S}$ and ${\mbox{\boldmath$z$}}_{S}={\mbox{\boldmath$S$}}^{-1/2}{\mbox{\boldmath$z$}}$ . The number of unknown parameters is $2q+N^{2}$ , $2+N^{2}$ , and $2+2q+N^{2}$ under $H_{1}$ , $H_{2}$ , and $H_{3}$ , respectively whereas $T=2(K+1)N$ . Also in this case, (65)(d) is replaced by the asymptotic approximation of (60) for $K\rightarrow+\infty$ whose expression is the same as in the previous subsection.

The probability of correct detection is estimated assuming that $H_{3}$ holds in a scenario where the actual AOAs of the coherent jammer and target are $40^{\circ}$ and $0^{\circ}$ , respectively. For simplicity, the subspace of the coherent interferer is set through the following matrix ${\mbox{\boldmath$J$}}=[{\mbox{\boldmath$v$}}(\theta_{J,1}),{\mbox{\boldmath$v$}}(\theta_{J,2}),{\mbox{\boldmath$v$}}(\theta_{J,3})]$ , where $\theta_{J,1}=35^{\circ}$ , $\theta_{J,2}=40^{\circ}$ , and $\theta_{J,3}=45^{\circ}$ . The Jammer-to-Clutter plus Noise Ratio (JCNR) is defined as $\text{JCNR}={\mbox{\boldmath$v$}}(\theta_{J,2})^{\dagger}{\mbox{\boldmath$M$}}^{-1}{\mbox{\boldmath$v$}}(\theta_{J,2})$ . Finally, we assume that the $(n,m)$ th entry of the ICM is given by

{\mbox{\boldmath$M$}}(n,m)=\sigma_{n}^{2}+\sigma_{c}^{2}\rho_{c}^{\left|n-m\right|},

(76)

with $\rho_{c}=0.95$ the one-lag correlation coefficient and $\sigma^{2}_{c}$ the clutter power.

Figure 3 shows the $P_{d|3}$ versus JCNR assuming $\text{SNR}=20$ dB and CNR $=20$ dB. The curves related to the TS counterparts, where the second stage is the GLRT corresponding to the hypothesis selected by the first stage, are also reported. Also in this case, the curves for the proposed architectures and their analogous TS counterparts are perfectly overlapped. The floor observed at low JCNR values is due to the presence of strong useful signal components which increase the decision statistic value. The figure draws a ranking where the first position is occupied by (64) coupled with (65)(a) followed by (64) coupled with (65)(b), (64) coupled with (65)(c), and, finally, (64) coupled with modified (65)(d). However, such ranking may be misleading since (64) coupled with (65)(a) or (65)(b) is inclined to overestimate the hypothesis order. As a matter of fact, from the classification histograms shown in Figure 4, such behavior is evident, whereas (64) coupled with (65)(c) or modified (65)(d) exhibits a more reliable performance with a percentage of correct classification greater than $80$ % for each hypothesis.

V-C Range-spread Targets

The last illustrative example is related to range-spread targets whose size is greater than the range resolution leading to scattering centers belonging to several contiguous range bins. Moreover, we assume that target size in terms of contiguous range bins and position are not known. Thus, instead of the conventional detection procedure which examines one range bin at time, we resort to the proposed framework to devise an architecture that processes a set of contiguous range bins (window under test). This scenario naturally leads to a multiple-hypothesis test where each alternative hypothesis is associated with a specific target size and position. Note that this detection problem is similar to that addressed in [31] except for the size of the extended target which is not known here. Summarizing, the multiple hypothesis test modeling the problem at hand has the following expression

\left\{\begin{array}[]{l}H_{0}:{\mbox{\boldmath$z$}}_{l}={\mbox{\boldmath$n$}}_{l},\,l\in\Omega_{T},\,{\mbox{\boldmath$z$}}_{k}={\mbox{\boldmath$n$}}_{k},\,k\in\Omega_{S}\\ H_{m}:\left\{\begin{array}[]{l}{\mbox{\boldmath$z$}}_{l}=\alpha_{l}{\mbox{\boldmath$v$}}+{\mbox{\boldmath$n$}}_{l},\,l\in\Omega_{m}\subseteq\Omega_{T},\,{\mbox{\boldmath$z$}}_{k}={\mbox{\boldmath$n$}}_{k},\,k\in\Omega_{S}\\ {\mbox{\boldmath$z$}}_{l}={\mbox{\boldmath$n$}}_{l},\,l\in\Omega_{T}\setminus\Omega_{m}\end{array}\right.\end{array}\right.

(77)

where $v$ is defined in the previous subsection, $\Omega_{T}=\left\{1,\ldots,L\right\}$ , $\Omega_{S}=\left\{L+1,\ldots,L+K\right\}$ , $\Omega_{m}\subseteq\Omega_{T}$ is the set of consecutive range bins containing useful signal components.

Based upon the results of [12], the logarithm of the GLR for the above problem can be written as

\begin{split}\widehat{\Lambda}_{1}&({\mbox{\boldmath$Z$}};m)=(L+K)\bigg{\{}\log\det\left({\mbox{\boldmath$S$}}_{0}\right)-\log\det\bigg{[}\\ &\sum_{l\in\Omega_{m}}\left({\mbox{\boldmath$z$}}_{l}-\frac{{\mbox{\boldmath$v$}}^{{\dagger}}{\mbox{\boldmath$S$}}_{1}^{-1}{\mbox{\boldmath$z$}}_{l}}{{\mbox{\boldmath$v$}}^{{\dagger}}{\mbox{\boldmath$S$}}_{1}^{-1}{\mbox{\boldmath$v$}}}{\mbox{\boldmath$v$}}\right)\left({\mbox{\boldmath$z$}}_{l}-\frac{{\mbox{\boldmath$v$}}^{{\dagger}}{\mbox{\boldmath$S$}}_{1}^{-1}{\mbox{\boldmath$z$}}_{l}}{{\mbox{\boldmath$v$}}^{{\dagger}}{\mbox{\boldmath$S$}}_{1}^{-1}{\mbox{\boldmath$v$}}}{\mbox{\boldmath$v$}}\right)^{{\dagger}}+{\mbox{\boldmath$S$}}_{1}\bigg{]}\bigg{\}},\end{split}

(78)

where ${\mbox{\boldmath$S$}}_{0}=\sum_{l=1}^{L}{\mbox{\boldmath$z$}}_{l}{\mbox{\boldmath$z$}}_{l}^{{\dagger}}+{\mbox{\boldmath$S$}}$ with ${\mbox{\boldmath$S$}}=\sum_{k=L+1}^{L+K}{\mbox{\boldmath$z$}}_{k}{\mbox{\boldmath$z$}}_{k}^{{\dagger}}$ and ${\mbox{\boldmath$S$}}_{1}=\sum_{l\in\Omega_{T}\setminus\Omega_{m}}{\mbox{\boldmath$z$}}_{l}{\mbox{\boldmath$z$}}_{l}^{{\dagger}}+{\mbox{\boldmath$S$}}$ . The number of unknown parameters is $p=2|\Omega_{m}|+1+N^{2}$ while $T=2(L+K)N$ . The penalty factor raising from the asymptotic approximation of (60) for $K\rightarrow+\infty$ is the same as in the previous subsections.

The performance of the proposed MOS-based detectors are assessed assuming the same ICM as in the previous subsection (see (76)). In addition, the target is assumed to be located at the antenna boresight and occupying the range bins $\{4,5\}$ within a window under test of size $L=10$ . After suitably labeling all the alternative hypotheses through the index $m=1,\ldots,55$ , the index associated with the actual position and size of the target is $m=14$ . Thus, the figure of merit becomes $P_{d|14}$ and it is estimated in Figure 5 as a function of the SINR whose expression is given by $\text{SINR}={\sum_{l\in\{4,5\}}}|\alpha_{l}|^{2}{\mbox{\boldmath$v$}}^{{\dagger}}{\mbox{\boldmath$M$}}^{-1}{\mbox{\boldmath$v$}}$ with CNR $=20$ dB.

The results show that only the curve related to (64) coupled with (65)(c) (and its TS counterpart) with $\rho=15$ achieves $P_{d|14}=1$ , whereas the curves of the other decision rules saturate to low values with the newly proposed architectures overcoming the TS competitors. This situation can be explained analyzing the next figure where in place of the classification histograms, we show the Root Mean Square Error (RMSE) for target size and position. The choice of these figures of merit is due to the huge number of hypotheses that makes the histogram readability very difficult. The inspection of this figure points out that the estimates returned by (64) coupled with (65)(a), (65)(b), and modified (65)(d) take on a constant value for high SINR values with the side effect of lowering the correct detection probability. On the other hand, the RMSE curves associated with (64) coupled with (65)(c) decrease to zero as the SINR increases by virtue of the tuning parameter, which is set in order to contrast the overestimation behavior generated by the presence of several nested hypotheses.

VI Conclusions

In this paper, we have developed a new framework based upon information theoretic criteria to address multiple hypothesis testing problems. Specifically, we have considered multiple (possible nested) alternative hypotheses and only one null hypothesis. Such problems frequently rise in radar scenarios of practical interest. The proposed design procedure exploits the KLIC to come up with decision statistics that incorporate a penalized GLR, where the penalty term depends on the number of unknown parameters, which, in turn, is related to the specific alternative hypothesis under consideration. Interestingly, such framework provides an information theoretic derivation of the GLRT and lays the foundation for the design of detection architectures capable of operating in the presence of multiple alternative hypotheses solving the limitations of the MLA which exhibits an overestimation inclination when the hypotheses are nested. Moreover, we have shown that under some specific conditions, decision schemes devised within this framework ensure the CFAR property. Finally, we have applied the new design procedure to three different radar detection problems and investigated the performance of four new detectors also in comparison with the analogous TS counterparts, whose structure is more complex to be implemented in real systems and that are slightly more time demanding than the former. The analysis has singled out (64) coupled with (65)(c) as the first choice at least for the considered cases, whereas (64) coupled with (65)(d) arises as the second choice since it is a parameter-free decision scheme and represents a good compromise between detection and classification performance in most cases. However, it is important to notice that different study cases require further investigations which may lead to different results.

Future research tracks may encompass the analysis where the GLR is replaced by asymptotically equivalent statistics or the derivation of other decision rules based upon the joint ML and maximum a posteriori estimation procedure testing several priors for the number of parameters.

Acknowledgments

The authors would like to thank Prof. A. De Maio and Prof. G. Ricci for the interesting discussions.

Decision rule (64) can be viewed as the result of a suitable log-likelihood regularization, which aims at overcoming the previously described limitations of the ML approach in the case of nested models. To this end, we assume that the number of parameters of interest under $H_{1}$ is a discrete random variable with a prior promoting low-dimensional model in order to mitigate the natural inclination of the ML approach to overestimate the model size.

With the above remarks in mind, a possible probability mass function for $p_{r}$ is chosen as

\pi(p_{r})=\frac{1}{A}e^{-g(p_{r})},\quad p_{r}\in\Omega_{r},

(79)

where $g(\cdot)$ is a positive and increasing function of $p_{r}$ and $A$ is a normalization constant such that $A=\sum_{p_{r}\in\Omega_{r}}e^{-g(p_{r})}$ . For the next development, it is important to note that if $p_{r}$ is a continuous random variable, the joint pdf of $Z$ and $p_{r}$ can be recast as

f({\mbox{\boldmath$Z$}},p_{r};{\mbox{\boldmath$\theta$}}_{p_{r}},{\mbox{\boldmath$\theta$}}_{p_{s}})=f({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s}|p_{r})f(p_{r}),

(80)

where $f(p_{r})$ is the pdf of $p_{r}$ . However, since $p_{r}$ is a discrete random variable, the joint pdf of $Z$ and $p_{r}$ is available only in the generalized sense exhibiting Dirac delta functions in correspondence of the values assumed by $p_{r}$ . Thus, if we consider the following decision rule

\frac{\displaystyle\max_{p_{r}\in\Omega_{r}}\displaystyle\max_{{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s}}f({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s}|p_{r})\pi(p_{r})}{\displaystyle\max_{{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s}}f_{0}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})}\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}\eta,

(81)

at the numerator, we attempt to maximize the multipliers of the Dirac delta functions, i.e., it only focuses on the lines where the probability masses are located. In this sense, the optimization at the numerator can be interpreted in terms of the joint ML and maximum a posteriori estimation procedure [28, 21]. The above architecture can be written as

\begin{split}\displaystyle\max_{p_{r}\in\Omega_{r}}&\displaystyle\max_{{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s}}\log{f({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,1},{\mbox{\boldmath$\theta$}}_{s}|p_{r})\pi(p_{r})}\\ &-\displaystyle\max_{{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s}}\log f_{0}({\mbox{\boldmath$Z$}};{\mbox{\boldmath$\theta$}}_{r,0},{\mbox{\boldmath$\theta$}}_{s})\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}\eta,\end{split}

(82)

After maximizing with respect to ${\mbox{\boldmath$\theta$}}_{r,1}$ , ${\mbox{\boldmath$\theta$}}_{r,0}$ , and ${\mbox{\boldmath$\theta$}}_{s}$ , given $p_{r}$ , (82) can be recast as

\displaystyle\max_{p_{r}\in\Omega_{r}}\left\{\log\frac{f({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,1},\widehat{{\mbox{\boldmath$\theta$}}}_{s,1}|p_{r})}{f({\mbox{\boldmath$Z$}};\widehat{{\mbox{\boldmath$\theta$}}}_{r,0},\widehat{{\mbox{\boldmath$\theta$}}}_{s,0})}+\log\pi(p_{r})\right\}\mbox{$\begin{array}[]{c}\stackrel{{\scriptstyle\stackrel{{\scriptstyle\textstyle H_{1}}}{{\textstyle>}}}}{{\stackrel{{\scriptstyle\textstyle<}}{{\textstyle H_{0}}}}}\end{array}$}\eta.

(83)

The above decision scheme has the same structure as (64) and, when $g(p_{r})=h(p_{r})$ , they are equivalent. Nevertheless, several alternatives can be used for $g(p_{r})$ to come up with different penalization terms.

References

[1] J. S. Bergin, P. M. Techau, W. L. Melvin, and J. R. Guerci, “GMTI STAP in Target-Rich Environments: Site-Specific Analysis,” in Radar Conference, 2002. Proceedings of the IEEE. IEEE, 2002, pp. 391–396.
[2] M. C. Wicks, W. L. Melvin, and P. Chen, “An Efficient Architecture for Nonhomogeneity Detection in Space-Time Adaptive Processing Airborne Early Warning Radar,” in Radar 97 (Conf. Publ. No. 449), October 1997, pp. 295–299.
[3] R. S. Adve, T. B. Hale, and M. C. Wicks, “Transform Domain Localized Processing using Measured Steering Vectors and Non-Homogeneity Detection,” in Proceedings of the 1999 IEEE Radar Conference. Radar into the Next Millennium (Cat. No.99CH36249), April 1999, pp. 285–290.
[4] L. Jiang and T. Wang, “Robust Non-Homogeneity Detector based on Reweighted Adaptive Power Residue,” IET Radar, Sonar Navigation, vol. 10, no. 8, pp. 1367–1374, 2016.
[5] M. Rangaswamy, “Non-Homogeneity Detector for Gaussian and non-Gaussian Interference Scenarios,” in Sensor Array and Multichannel Signal Processing Workshop Proceedings, 2002, Aug 2002, pp. 528–532.
[6] W. L. Melvin and J. A. Scheer, Principles of Modern Radar: Advanced Techniques, S. Publishing, Ed., Edison, NJ, 2013.
[7] P. Addabbo, A. Aubry, A. De Maio, L. Pallotta, and S. L. Ullo, “HRR Profile Estimation using SLIM,” IET Radar, Sonar Navigation, vol. 13, no. 4, pp. 512–521, 2019.
[8] W. Liu, W. Xie, and Y. Wang, “Rao and Wald Tests for Distributed Targets Detection With Unknown Signal Steering,” IEEE Signal Processing Letters, vol. 20, no. 11, pp. 1086–1089, 2013.
[9] J. Liu, H. Li, and B. Himed, “Persymmetric Adaptive Target Detection with Distributed MIMO Radar,” IEEE Transactions on Aerospace and Electronic Systems, vol. 51, no. 1, pp. 372–382, 2015.
[10] K. Gerlach and M. J. Steiner, “Adaptive detection of range distributed targets,” IEEE Transactions on Signal Processing, vol. 47, no. 7, pp. 1844–1851, July 1999.
[11] F. Bandiera, O. Besson, D. Orlando, G. Ricci, and L. L. Scharf, “GLRT-Based Direction Detectors in Homogeneous Noise and Subspace Interference,” IEEE Transactions on Signal Processing, vol. 55, no. 6, pp. 2386–2394, June 2007.
[12] E. Conte, A. De Maio, and G. Ricci, “GLRT-Based Adaptive Detection Algorithms for Range-Spread Targets,” IEEE Transactions on Signal Processing, vol. 49, no. 7, pp. 1336–1348, July 2001.
[13] F. Bandiera and G. Ricci, “Adaptive Detection and Interference Rejection of Multiple Point-Like Radar Targets,” IEEE Transactions on Signal Processing, vol. 54, no. 12, pp. 4510–4518, December 2006.
[14] F. Bandiera, D. Orlando, and G. Ricci, “CFAR Detection of Extended and Multiple Point-Like Targets Without Assignment of Secondary Data,” IEEE Signal Processing Letters, vol. 13, no. 4, pp. 240–243, April 2006.
[15] D. Adamy, EW101: A First Course in Electronic Warfare, A. House, Ed., Norwood, MA, 2001.
[16] A. Farina, “ECCM Techniques,” in Radar Handbook, M. I. Skolnik, Ed. McGraw-Hill, 2008, ch. 24.
[17] F. Bandiera, A. Farina, D. Orlando, and G. Ricci, “Detection Algorithms to Discriminate Between Radar Targets and ECM Signals,” IEEE Transactions on Signal Processing, vol. 58, no. 12, pp. 5984–5993, December 2010.
[18] V. Carotenuto, C. Hao, D. Orlando, A. De Maio, and S. Iommelli, “Detection of Multiple Noise-like Jammers for Radar Applications,” in 2018 5th IEEE International Workshop on Metrology for AeroSpace (MetroAeroSpace), June 2018, pp. 328–333.
[19] J. Stuller, “Generalized Likelihood Signal Resolution,” IEEE Transactions on Information Theory, vol. 21, no. 3, pp. 276–282, May 1975.
[20] K. P. Burnham and D. R. Anderson, Model Selection And Multimodel Inference, A Practical Information-Theoretic Approach, 2nd ed. New York, USA: Springer-Verlag, 2002.
[21] P. Stoica and Y. Selen, “Model-Order Selection: A Review of Information Criterion Rules,” IEEE Signal Processing Magazine, vol. 21, no. 4, pp. 36–47, 2004.
[22] A. A. Neath and J. E. Cavanaugh, “The Bayesian Information Criterion: Background, Derivation, and Applications,” WIREs Computational Statistics, vol. 4, no. 2, pp. 199–203, March 2012.
[23] R. J. Bhansali and D. Y. Downham, “Some Properties of the Order of an Autoregressive Model Selected by a Generalization of Akaike’s FPE Criterion,” Biometrika, vol. 64, pp. 547–551, 1977.
[24] M. Wax and T. Kailath, “Detection of Signals by Information Theoretic Criteria,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 33, no. 2, pp. 387–392, April 1985.
[25] E. Fishler, M. Grosmann, and H. Messer, “Detection of Signals by Information Theoretic Criteria: General Asymptotic Performance Analysis,” IEEE Transactions on Signal Processing, vol. 50, no. 5, pp. 1027–1036, May 2002.
[26] H. L. Van Trees, Optimum Array Processing (Detection, Estimation, and Modulation Theory, Part IV). John Wiley & Sons, 2002.
[27] S. Kullback and R. A. Leibler, “On Information and Sufficiency,” Ann. Math. Statist., vol. 22, no. 1, pp. 79–86, 03 1951.
[28] A. Yeredor, “The Joint MAP-ML Criterion and its Relation to ML and to Extended Least-Squares,” IEEE Transactions on Signal Processing, vol. 48, no. 12, pp. 3484–3492, 2000.
[29] M. A. Richards, J. A. Scheer, and W. A. Holm, Principles of Modern Radar: Basic Principles, S. Publishing, Ed., Raleigh, NC, 2010.
[30] S. M. Kay, Fundamentals of Statistical Signal Processing: Detection Theory, P. Hall, Ed., 1998, vol. 2.
[31] F. Bandiera, D. Orlando, and G. Ricci, “CFAR Detection of Extended and Multiple Point-Like Targets Without Assignment of Secondary Data,” IEEE Signal Processing Letters, vol. 13, no. 4, pp. 240–243, April 2006.
[32] L. Yan, P. Addabbo, C. Hao, D. Orlando, and A. Farina, “New ECCM Techniques Against Noise-like and/or Coherent Interferers,” IEEE Transactions on Aerospace and Electronic Systems, pp. 1–1, 2019.
[33] L. Yan, C. Hao, P. Addabbo, D. Orlando, and A. Farina, “An Improved Adaptive Radar Detector based on Two Sets of Training Data,” in 2019 IEEE Radar Conference (RadarConf), April 2019, pp. 1–6.
[34] P. K. Sen and J. M. Singer, Large Sample Methods in Statistics: An Introduction with Applications. Springer US, 1993.
[35] Y. Selén, Model Selection and Sparse Modeling. Department of Information Technology, Uppsala University, 2007.
[36] T. Cover and J. Thomas, Elements of Information Theory. Wiley, 2012.
[37] “Interpreting kullback–leibler divergence with the neyman–pearson lemma,” Journal of Multivariate Analysis, vol. 97, no. 9, pp. 2034 – 2040, 2006.
[38] E. L. Lehmann, J. P. Romano, and G. Casella, “Testing Statistical Hypotheses,” vol. 150, 1986.
[39] S. M. Kay, Fundamentals of Statistical Signal Processing: Estimation Theory, P. Hall, Ed., 1993, vol. 1.
[40] C. J. Geyer, “Stat 8112 Lecture Notes: The Wald Consistency Theorem,” University of Minnesota, School of Statistics, pp. 1–13, 2012.
[41] F. C. Robey, D. R. Fuhrmann, E. J. Kelly, and R. Nitzberg, “A CFAR Adaptive Matched Filter Detector,” IEEE Trans. on Aerospace and Electronic Systems, vol. 28, no. 1, pp. 208–216, 1992.
[42] Y. I. Abramovich and B. A. Johnson, “GLRT-Based Detection-Estimation for Undersampled Training Conditions,” IEEE Transactions on Signal Processing, vol. 56, no. 8, pp. 3600–3612, 2008.
[43] C. Hao, D. Orlando, G. Foglia, and G. Giunta, “Knowledge-based adaptive detection: Joint exploitation of clutter and system symmetry properties,” IEEE Signal Processing Letters, vol. 23, no. 10, pp. 1489–1493, October 2016.
[44] F. Bandiera, D. Orlando, and G. Ricci, Advanced Radar Detection Schemes Under Mismatched Signal Models, M. . C. P. Synthesis Lectures on Signal Processing No. 8, Ed., San Rafael, US, 2009.
[45] A. Farina and F. A. Studer, “A Review of CFAR Detection Techniques in Radar Systems,” Microwave Journal, vol. 29, p. 115, 1986.
[46] H. Rohling, “Radar CFAR Thresholding in Clutter and Multiple Target Situations,” IEEE Trans. on Aerospace and Electronic Systems, no. 4, pp. 608–621, 1983.
[47] M. Barkat, S. D. Himonas, and P. K. Varshney, “CFAR Detection for Multiple Target Situations,” in IEE Proceedings F (Radar and Signal Processing), vol. 136, no. 5. IET, 1989, pp. 193–209.
[48] E. Conte, A. De Maio, and G. Ricci, “CFAR Detection of Distributed Targets in non-Gaussian Disturbance,” IEEE Trans. on Aerospace and Electronic Systems, vol. 38, no. 2, pp. 612–621, 2002.
[49] J. R. Roman, M. Rangaswamy, D. W. Davis, Q. Zhang, B. Himed, and J. H. Michels, “Parametric Adaptive Matched Filter for Airborne Radar Applications,” IEEE Trans. on Aerospace and Electronic Systems, vol. 36, no. 2, pp. 677–692, 2000.
[50] F. Gini and M. Greco, “Covariance Matrix Estimation for CFAR Detection in Correlated Heavy Tailed Clutter,” Signal Processing, vol. 82, no. 12, pp. 1847–1859, 2002.
[51] E. J. Kelly, “An Adaptive Detection Algorithm,” IEEE Transactions on Aerospace and Electronic Systems, no. 2, pp. 115–127, 1986.
[52] E. L. Lehmann, Testing Statistical Hypotheses, 2nd ed. New York, USA: Springer-Verlag, 1986.
[53] L. L. Scharf and C. Demeure, Statistical Signal Processing: Detection, Estimation, and Time Series Analysis, ser. Addison-Wesley Series in Electrical and Computer Engineering. Addison-Wesley Publishing Company, 1991.
[54] M. Eaton, Multivariate statistics: a vector space approach, ser. Lecture notes-monograph series. Institute of Mathematical Statistics, 1983.
[55] D. Robinson, S. Axler, F. Gehring, and P. Halmos, A Course in the Theory of Groups, ser. Graduate Texts in Mathematics. Springer New York, 1996.
[56] V. Carotenuto, A. De Maio, D. Orlando, and L. Pallotta, “Adaptive Radar Detection Using Two Sets of Training Data,” IEEE Transactions on Signal Processing, vol. 66, no. 7, pp. 1791–1801, 2017.
[57] F. Bandiera, A. De Maio, A. S. Greco, and G. Ricci, “Adaptive radar detection of distributed targets in homogeneous and partially homogeneous noise plus subspace interference,” IEEE Transactions on Signal Processing, vol. 55, no. 4, pp. 1223–1237, April 2007.