This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Regularized Halfspace Depth for Functional Data

Hyemin Yeon, Xiongtao Dai, and Sara Lopez-Pintado Department of Statistics, North Carolina State University, North Carolina 27695, U.S.A. Email: [email protected]. Division of Biostatistics, University of California, Berkeley Department of Health Sciences, Northeastern University
Abstract

Data depth is a powerful nonparametric tool originally proposed to rank multivariate data from center outward. In this context, one of the most archetypical depth notions is Tukey’s halfspace depth. In the last few decades notions of depth have also been proposed for functional data. However, Tukey’s depth cannot be extended to handle functional data because of its degeneracy. Here, we propose a new halfspace depth for functional data which avoids degeneracy by regularization. The halfspace projection directions are constrained to have a small reproducing kernel Hilbert space norm. Desirable theoretical properties of the proposed depth, such as isometry invariance, maximality at center, monotonicity relative to a deepest point, upper semi-continuity, and consistency are established. Moreover, the regularized halfspace depth can rank functional data with varying emphasis in shape or magnitude, depending on the regularization. A new outlier detection approach is also proposed, which is capable of detecting both shape and magnitude outliers. It is applicable to trajectories in L2L^{2}, a very general space of functions that include non-smooth trajectories. Based on extensive numerical studies, our methods are shown to perform well in terms of detecting outliers of different types. Three real data examples showcase the proposed depth notion.

Keywords and phrases: Functional data analysis, Functional rankings; Infinite dimension; Outlier detection; Robust statistics.

1 Introduction

1.1 Backgrounds and degeneracy issue

In the last several decades, data depth methods for multivariate data have been developed and are shown to be a powerful tool for both exploratory data analysis and robust statistics. Many different depth notions have been proposed, such as Tukey’s halfspace depth (Tukey, 1975), simplicial depth (Liu, 1990), and Mahalanobis depth (Liu, 1992). These multivariate depth functions have not only been well-investigated theoretically (Donoho and Gasko, 1992, Zuo and Serfling, 2000) but have also been applied for handling different statistical problems (Yeh and Singh, 1997, Rousseeuw et al., 1999, Li et al., 2012).

In particular, Tukey’s halfspace depth has been popular due to its many desirable properties (Zuo and Serfling, 2000) and the robustness of its depth median (Donoho and Gasko, 1992). However, its generalization to functional data inevitably encounters a degeneracy issue, namely, a naive extension of Tukey’s depth has zero depth values almost surely. Intuitively, this is due to the infinite-dimensionality of the functional space, where the set of projections is “too rich” and thus Tukey’s depth defined from the most extreme projection is too small. A numerical illustration of the degeneracy phenomenon in practice is shown in Figure 1, where the naive direct extension of Tukey’s depth to functional data (red curve) assigns zero depth to the majority of observations, preventing depth comparisons. The degeneracy was first discovered by Dutta et al. (2011) and further investigated by Kuelbs and Zinn (2013), Chakraborty and Chaudhuri (2014a). See Table 1 of Gijbels and Nagy (2017) for a summary of the degeneracy issues in this and other existing functional depth notions. Tukey’s halfspace depth has therefore been practically dismissed from consideration for functional data due to the failure of the naive generalization.

Refer to caption
Figure 1: Distributions of the original Tukey’s halfspace depth values and the proposed regularized halfspace depth with different regularization parameters λ{1.302,1.454}\lambda\in\{1.302,1.454\}. The depth values are w.r.t. 𝒳n{Xi}i=1nX\mathcal{X}_{n}\equiv\{X_{i}\}_{i=1}^{n_{X}} and evaluated at the points in 𝒴n{Yi}i=1nY\mathcal{Y}_{n}\equiv\{Y_{i}\}_{i=1}^{n_{Y}}, where the observations are i.i.d. realizations from a Gaussian process and nX=nY=1000n_{X}=n_{Y}=1000. Tukey’s depth corresponds to λ=\lambda=\infty.

1.2 Regularized halfspace depth

In this work, we propose a new notion of functional depth called the regularized halfspace depth (RHD). We resolve the degeneracy issue by restricting the set of projection directions in the definition of the halfspace depth to the set of directions with reproducing kernel Hilbert space (RKHS) norm less than a given positive number λ(0,)\lambda\in(0,\infty). Indeed, the proposed RHD obtained in this fashion is non-degenerate over a general class of random functions. This allows for depth-based rankings if, for example, the underlying distribution of the random element is Gaussian. Figure 1 illustrates that the RHD is free of the degeneracy issue as shown by the green and yellow curves for small and large λ\lambda values; details about this simulation setup can be found in Section S2.1 in the Supplemental Materials. Smaller λ\lambda results in greater regularization, in which case the distribution of the depth values become more positive and dispersed, while the original Tukey’s halfspace depth, corresponding to the red curve with λ=\lambda=\infty, exhibits a degenerate behavior, since almost all observations have zero depth.

A related but highly different projection-based Tukey’s depth for functional data has been considered by Cuesta-Albertos and Nieto-Reyes (2008), who proposed the random Tukey depth which defines the depth as the most severe extremity as seen along a fixed number of random projections. The number of projections should not diverge to infinity because otherwise the random Tukey depth will approach Tukey’s halfspace depth, which degenerates given functional observations. In contrast, our approach targets to utilize infinitely many projections and thus is more sensitive against different types of extremeness; the degeneracy issue is resolved through regularizing the projections. These approaches both demonstrate the necessity of applying restrictions when defining functional depth through the infimum over projection directions.

In addition to non-degeneracy, we establish other desirable depth properties for the RHD in analogy to the ones for multivariate and functional data postulated by Zuo and Serfling (2000), Nieto-Reyes and Battey (2016), and Gijbels and Nagy (2017), respectively. We also include topological properties of the depth level sets. A uniform consistency of the sample RHD for the population RHD is verified over a totally bounded subset. A version of the Glivenko–Cantelli theorem is derived for a class of infinite-dimensional halfspaces as a by-product. In all, our new depth satisfies all of the desirable properties considered by Gijbels and Nagy (2017), unlike most other existing functional depths.

A prominent feature of the RHD is its flexibility in ranking curves based on different shape features controlled by the regularization parameter λ\lambda: RHD with a larger λ\lambda emphasizes higher-frequency shape differences in the samples, while a smaller λ\lambda emphasizes overall magnitude. The definition of “shape” is data-driven and comes from the modes of variation in the observed data. This feature leads to our new definitions of extremeness for curves informed by data, which is utilized and demonstrated in other applications such as outlier detection method and will be introduced next.

1.3 Functional outlier detection

In recent years, substantial development of outlier detection methods in functional data have been accomplished by using notions of functional depth. The functional boxplot proposed by Sun and Genton (2011) is an extension of the original boxplot (cf. Tukey, 1977), where functional central region and whiskers are constructed based on a functional depth method, in particular the modified band depth (MBD) (López-Pintado and Romo, 2009). Outlier detection methods for functions taking values in a multivariate space have also been considered, by e.g., Hubert et al. (2015).

Trajectories outlying in only shape but not at any given time point are challenging to detect; Nagy et al. (2017) focused on the detection of this type of outliers by applying data depths to the joint distribution of a set of two or three time points altogether. Dai et al. (2020) proposed to apply transformations sequentially to the functional data to reveal shape outliers that are not easily detected by other methods. Alternative outlier detection methods based on functional data depth include functional bagplots (Hyndman and Shang, 2010), the outliergram (Arribas-Gil and Romo, 2014), and directional outlyingness (Dai and Genton, 2018).

Unlike magnitude outliers that tend to have an extreme average value over the support, shape outliers are hard to define in the first place because there is vast possibility of outlyingness in shape, in terms of, e.g., jumps, peaks, smoothness, trends etc. A shape outlier can often be inlying pointwise but has a different overall pattern than the majority of the sample curves over the support. A major benefit of the proposed RHD is that it captures different modes of variation in the sample 𝒳n{Xi}i=1n\mathcal{X}_{n}\equiv\{X_{i}\}_{i=1}^{n} depending on the choice of the regularization parameter λ\lambda. We propose a data-adaptive definition of shape outlyingness and establish a novel shape outlier detection method based on the proposed RHD. The method uses a large number of randomly drawn directions to approximate the sample RHD and applies the univariate boxplot to the data functions projected onto those directions. Acknowledging the reviewers’ recommendations, we have also considered phase variation and phase outliers. Namely, the observed curves exhibit variability in the x-axis, for example, when each curve is shifted (or delayed) compared to the others. As provided in the numerical studies, the proposed outlier detection method well detects different types of extremeness/outlyingness in shape, phase, as well as in magnitude.

Our methods are shown to be practical and useful with different real data examples: medfly, world population growth, and sea surface temperature datasets. Due to the complexity of the first dataset, it has barely been studied (cf. Sguera and López-Pintado, 2021) in the literature of functional outlier detection, while the other two datasets have been analyzed a few times previously (cf. Sun and Genton, 2011, Nagy et al., 2017, Dai et al., 2020). Our data illustrations demonstrate that the proposed method is relevant for analyzing both complicated and rough trajectories as well as relatively simple smooth curves.

1.4 Contributions and outline of the paper

Next we highlight our main contributions while summarizing the organization of the paper. Section 2 defines and studies the proposed RHD, which is a novel, flexible and useful depth notion for functional data; it is shown in Section 2.1 that the proposed RHD is non-degenerate over a large class of random functions. We then establish desirable depth properties of the RHD in Sections 2.2-2.3 and provide a theoretically reasonable and practical algorithm to compute it in Section 2.4; some intermediate results such as a Glivenko-Cantelli theorem are new in the functional depth literature. The choice of relevant tuning parameters are discussed in Section 2.5. Section 3 is devoted to describing the proposed outlier detection method; the details are presented in Section 3.1 along with a rule for selecting the adjustment factor. Section 3.2 emphasizes the robustness of the overall procedure against outliers. The proposed methods are useful and outstanding for identifying functional shape outliers, which can be challenging for the existing depth notions, as illustrated in the numerical study in Section 4. Data applications are described in Section 5, indicating that our methods are applicable to some complex structured functional data. All proofs, additional simulations, and extra results from the real data analyses can be found in the supplement.

2 Regularized Halfspace Depth

In Section 2.1, we start by defining the regularized halfspace depth for functional data and verifying the non-degeneracy property. Section 2.2 then describes and digests the desirable theoretical properties of the RHD, while Section 2.3 is devoted to defining the sample version of the RHD and establishing its consistency. A practical computational algorithm for approximating the sample RHD based on random projections is provided along with its theoretical justification in Section 2.4. Finally, we discuss how to select tuning parameters involved in the computation of the RHD in Section 2.5.

2.1 Motivation and Definition

Let XX be a random function defined on a probability space (Ω,,𝖯)(\Omega,\mathcal{F},\mathsf{P}) that takes values in the underlying Hilbert space \mathbb{H} with inner product ,\langle\cdot,\cdot\rangle. We suppose that \mathbb{H} is infinite-dimensional and separable. The original Tukey’s halfspace depth (Tukey, 1975) of xx\in\mathbb{H} with respect to the probability measure PXP_{X} induced by XX is, for xx\in\mathbb{H},

D(x)=D(x;PX)\displaystyle D(x)=D(x;P_{X}) =infv:v=1𝖯(Xx,v0).\displaystyle=\inf_{v\in\mathbb{H}:\|v\|=1}\mathsf{P}(\langle X-x,v\rangle\geq 0). (1)

However, the direction set 𝕊{v:v=1}\mathbb{S}\equiv\{v\in\mathbb{H}:\|v\|=1\} in the definition of the Tukey’s halfspace depth (1) is too rich due to the infinite dimension of \mathbb{H}. This set 𝕊\mathbb{S} contains an infinite orthonormal set in \mathbb{H}, which results in 𝕊\mathbb{S} not being totally bounded (MacCluer, 2009, Chapter 4) and the infimum in (1) being too small. As a result, the naive definition in (1) may become degenerate in an infinite dimensional space (Dutta et al., 2011), as described next.

Theorem 1 (Degeneracy of Tukey’s halfspace depth).

Suppose that XX has independent functional principal component (FPC) scores, that is, {X,ϕj}j=1\{\langle X,\phi_{j}\rangle\}_{j=1}^{\infty} are independent. Then, it holds that D(x;PX)=0D(x;P_{X})=0 for PXP_{X}-almost surely all xx\in\mathbb{H}.

We overcome this degeneracy issue and define a new notion of halfspace depth by introducing a regularization step. Let \mathbb{H} be an infinite-dimensional Hilbert space, and define by xy:x\otimes y:\mathbb{H}\to\mathbb{H}, the tensor product of two elements x,yx,y\in\mathbb{H}, as a bounded linear operator (xy)(z)=z,xy(x\otimes y)(z)=\langle z,x\rangle y for zz\in\mathbb{H}. Under a finite second moment assumption 𝖤[X2]<\mathsf{E}[\|X\|^{2}]<\infty, the covariance operator of XX is defined as Γ𝖤[(X𝖤[X])(X𝖤[X])]\Gamma\equiv\mathsf{E}[(X-\mathsf{E}[X])\otimes(X-\mathsf{E}[X])]. Since the covariance operator Γ\Gamma is self-adjoint, non-negative definite, and compact, by spectral decomposition, it admits the eigen-decomposition Γ=j=1γj(ϕjϕj)\Gamma=\sum_{j=1}^{\infty}\gamma_{j}(\phi_{j}\otimes\phi_{j}), where (λj,ϕj)(\lambda_{j},\phi_{j}) is the jj-th eigenvalue–eigenfunction pair of Γ\Gamma with the properties that γ1γ20\gamma_{1}\geq\gamma_{2}\geq\cdots\geq 0, γj0\gamma_{j}\to 0 as jj\to\infty, and the eigenfunctions {ϕj}j=1\{\phi_{j}\}_{j=1}^{\infty} form a complete orthonormal system for the closure of the image of Γ\Gamma (Hsing and Eubank, 2015). We further assume that the eigenvalues are positive and strictly decreasing, i.e., γ1>γ2>>0\gamma_{1}>\gamma_{2}>\cdots>0, in order to avoid non-interesting cases, such as where random elements supported only on a finite-dimensional subspace of \mathbb{H}.

The proposed regularized halfspace depth is defined next.

Definition 1.

The regularized halfspace depth (RHD) of xx\in\mathbb{H} with respect to PXP_{X} is defined as

Dλ(x)=Dλ(x;PX)=infv:v=1,Γ1/2vλ𝖯(Xx,v0),\displaystyle D_{\lambda}(x)=D_{\lambda}(x;P_{X})=\inf_{v\in\mathbb{H}:\|v\|=1,\|\Gamma^{-1/2}v\|\leq\lambda}\mathsf{P}(\langle X-x,v\rangle\geq 0), (2)

where λ(0,)\lambda\in(0,\infty) is a regularization parameter and Γ1/2j=1γj1/2(ϕjϕj)\Gamma^{-1/2}\equiv\sum_{j=1}^{\infty}\gamma_{j}^{-1/2}(\phi_{j}\otimes\phi_{j}).

We refer to the norm v(Γ)Γ1/2v\|v\|_{\mathbb{H}(\Gamma)}\equiv\|\Gamma^{-1/2}v\| as the reproducing kernel Hilbert space (RKHS) norm because if =L2([0,1])\mathbb{H}=L^{2}([0,1]) and XX has a continuous covariance function G(s,t)=𝖼𝗈𝗏(X(s),X(t))G(s,t)=\mathsf{cov}(X(s),X(t)), s,t[0,1]s,t\in[0,1], then (Γ){vL2([0,1]):Γ1/2v<}\mathbb{H}(\Gamma)\equiv\{v\in L^{2}([0,1]):\|\Gamma^{-1/2}v\|<\infty\} is an RKHS with reproducing kernel GG (cf. Wahba, 1973) and an infinite-dimensional dense subspace of \mathbb{H}. The definitions v(Γ)\|v\|_{\mathbb{H}(\Gamma)} and (Γ)\mathbb{H}(\Gamma) are generally valid given any Hilbert space \mathbb{H}.

However, the RKHS (Γ)\mathbb{H}(\Gamma) is still large since it contains all eigenfunctions, leading to extremely small halfspace probabilities and subsequent degeneracy. For this reason, we regularize the RKHS norm v(Γ)\|v\|_{\mathbb{H}(\Gamma)} of projection direction vv using a pre-specified number λ(0,)\lambda\in(0,\infty). The resulting set of projection directions used to define the RHD DλD_{\lambda} is written as

𝒱λ{v:v=1,Γ1/2vλ}.\displaystyle\mathcal{V}_{\lambda}\equiv\{v\in\mathbb{H}:\|v\|=1,\|\Gamma^{-1/2}v\|\leq\lambda\}. (3)

The set 𝒱λ\mathcal{V}_{\lambda} is then reasonably small in the sense that it is totally bounded (cf. Proposition S4 in the supplement).

The regularization is the key difference of our method from other existing depth notions. For a given λ(0,)\lambda\in(0,\infty), we restrict the set of unit-norm projection directions used in the halfspace probabilities to 𝒱λ\mathcal{V}_{\lambda} in (3), which requires the RKHS norm of the projections to be at most λ\lambda. This will resolve the degeneracy issue as shown below in Theorem 2. To the best of our knowledge, regularization has never been used in the depth literature in the context of either finite or infinite dimensional spaces.

Without regularization, degeneracy occurs for the original Tukey’s depth (1) in \mathbb{H} as explained in Theorem 1. In contrast, our projection set 𝒱λ\mathcal{V}_{\lambda} from (3) is moderately sized, since it contains a collection of infinite linear combinations of the eigenfunctions with the constraint that the coefficients of non-leading eigenfunctions must be small. For instance, let aj=[(γj2j)/(kγk2k)]1/2a_{j}=[(\gamma_{j}2^{-j})/(\sum_{k}\gamma_{k}2^{-k})]^{1/2} for j=1,2,j=1,2,\dots. If λ(γ1/2)1\lambda\geq(\gamma_{1}/2)^{-1}, then jajϕj𝒱λ\sum_{j}a_{j}\phi_{j}\in\mathcal{V}_{\lambda}. However, ϕj𝒱λ\phi_{j}\in\mathcal{V}_{\lambda} if and only if γj1/2λ\gamma_{j}^{-1/2}\leq\lambda, so 𝒱λ\mathcal{V}_{\lambda} contains only finitely many eigenfunctions; see also Proposition S1 in the supplement. This enables/allows our proposed RHD to avoid the degeneracy issue. We will further show in Section 2.2 that RHD is a flexible data depth method with desirable theoretical properties.

Theorem 2 shows that the RHD is free of degeneracy. To state the theorem, let μ=𝖤[X]\mu=\mathsf{E}[X] and FvF_{v} denote the cumulative distribution of the standardized projection Xμ,v/Γ1/2v\langle X-\mu,v\rangle/\|\Gamma^{1/2}v\|. For each λ,t(0,)\lambda,t\in(0,\infty), we consider the following condition on the tail probabilities of the standardized projections:

Condition E(λ,t)E(\lambda,t): supv𝒱λFv(t)<1\sup_{v\in\mathcal{V}_{\lambda}}F_{v}(t)<1.

Theorem 2 (Non-degeneracy of the RHD).

Let λ(0,)\lambda\in(0,\infty) be fixed. If there exists t(0,)t\in(0,\infty) such that Condition E(λ,t)E(\lambda,t) holds, then Dλ(x)>0D_{\lambda}(x)>0 for any xx\in\mathbb{H} with xμt/λ\|x-\mu\|\leq t/\lambda. Furthermore, if Condition E(λ,t)E(\lambda,t) holds for each t(0,)t\in(0,\infty), then the RHD, DλD_{\lambda}, is positive everywhere in \mathbb{H}.

Condition E(λ,t)E(\lambda,t) for Theorem 2 ensures that the upper tail probabilities of the standardized projection are not too small across the direction set 𝒱λ\mathcal{V}_{\lambda}. This tail probability condition is mild and general, which holds especially when the standardized projection distribution FvF_{v} does not depend on v𝒱λv\in\mathcal{V}_{\lambda}. This encompasses a wide range of random elements, particularly those whose standardized projections follow a well-known distribution, those whose FPC scores are dependent, or those whose higher moments are infinite, as listed below.

Example 1.

A variety of random elements can satisfy Condition E(λ,t)E(\lambda,t) if the projections follow a well-known distribution. In particular, the projection X,v\langle X,v\rangle could follow distributions such as normal, logistic, or double exponential distributions, which are determined only by location and scale parameters. For instance, if we define the random element XX such that its projection X,v\langle X,v\rangle follows the logistic distribution, then the standardized projection distribution FvF_{v} is independent of v𝒱λv\in\mathcal{V}_{\lambda} as Fv(u)={1+eu}1F_{v}(u)=\{1+e^{-u}\}^{-1} for each uu\in\mathbb{R}. Therefore, Condition E(λ,t)E(\lambda,t) is satisfied for each t(0,)t\in(0,\infty).

Example 2.

Condition E(λ,t)E(\lambda,t) can hold even for random elements with dependent FPC scores or with infinite higher moments. Suppose X=ξRX=\xi R where RR is the Gaussian element with mean zero and ξ\xi is the standardized t-distribution with degree of freedom ν>2\nu>2. Its FPC scores X,ϕj=ξR,ϕj\langle X,\phi_{j}\rangle=\xi\langle R,\phi_{j}\rangle are dependent through the common latent variable ξ\xi and the higher moments of XX might not exist depending on ν>2\nu>2. Since the covariances of XX and RR are equal, the distribution of the projection X,v=ξR,v\langle X,v\rangle=\xi\langle R,v\rangle is given as Fv(u)=𝖯(ξZu)F_{v}(u)=\mathsf{P}\left(\xi Z\leq u\right) for each uu\in\mathbb{R}, where ZZ denotes a standard normal variable. Consequently, the distribution function FvF_{v} does not depend on the direction v𝒱λv\in\mathcal{V}_{\lambda}, and hence, Condition E(λ,t)E(\lambda,t) holds for each t(0,)t\in(0,\infty).

Remark 1.

Condition E(λ,t)E(\lambda,t) in Theorem 2 allows handling the infinite dimensionality of \mathbb{H}. In functional data analysis, it is quite common to consider some distributional conditions similar to Condition E(λ,t)E(\lambda,t) for theoretical development as the class of random elements in an infinite dimensional space is redeemed extremely large. Common assumptions include finite fourth moments (Hall et al., 2007, Cardot et al., 2007, Gabrys and Kokoszka, 2007, Gabrys et al., 2010) or finite mixed moments between the FPC scores (Yao and Müller, 2010, Ferraty and Nagy, 2022), with some studies even assuming all moments to be finite (Delaigle and Hall, 2010). Other works impose more specific distributional assumptions such as continuity of XX (Cardot et al., 2013), (centered) elliptical processes (Liebl and Reimherr, 2023), or subgaussianity on the vector of the FPC scores Lin and Lin (2021). Compared to these, Condition E(λ,t)E(\lambda,t) is indeed a relatively mild assumption that encompasses many distributions applicable to practical problems (cf. Examples 1-2). Therefore, incorporating some distributional conditions on XX is reasonable to achieve non-degeneracy of the RHD, allowing us to leverage its practical benefits.

In addition to resolving degeneracy, regularization parameter λ\lambda also provides flexibility on emphasizing the shape and/or magnitude of the functional observations. Given a small λ\lambda, the RHD focuses more on the overall magnitude of the input functions since the projection set concentrates around the leading eigenfunctions, which tend to capture lower frequency variations. On the other hand, as λ\lambda increases, the RHD increases its emphasis on variation along the higher-frequency directions given by the non-leading eigenfunctions. This property of the RHD will be utilized later when we develop a new method for shape outlier detection. The regularization parameter λ(0,)\lambda\in(0,\infty) is a user-specific parameter that depends on the purposes of the analysis. The selection of λ\lambda will be discussed in Section 2.5.

Surprisingly, the RHD is shown to be robust against outliers despite utilizing the covariance, which is generally assumed to exist in functional data literature but typically considered non-robust. Moreover, the RHD can be made more robust by restricting the directions in 𝒱{v:v=1}\mathcal{V}\equiv\{v\in\mathbb{H}:\|v\|=1\} to a general subset of 𝒱\mathcal{V} that does not require the finite moments. One option is to substitute the RHKS norm Γ1/2v\|\Gamma^{-1/2}v\| with another RKHS norm with a different reproducing kernel. However, such general subsets may not reflect the variability in shape and magnitude of the functional observations. We show that by imposing a second moment condition, the RHD offers numerous practical benefits, such as detecting many types of shape outliers in complex functional data sets; see Section 4 for numerical performance. These advantages are usually not obtained when using a single functional depth such as modified band depth (López-Pintado and Romo, 2009), extremal depth (Narisetty and Nair, 2016), integrated depth (Nagy et al., 2017), and so on; nonetheless, using a combination of multiple depth functions can give similar performance but adds complexity (Nagy et al., 2017, Dai et al., 2020). Despite the covariance being non-robust, the proposed methods based on RHD exhibit resistance to many types of outliers and are shown to be useful in practice. This has been illustrated and discussed in Section 4.4 and Section 3.2, respectively.

Our proposed RHD can be constructed whenever the underlying function space is a separable Hilbert space, for example, L2L2([0,1])={f:[0,1]|01f(t)2𝑑t<}L^{2}\equiv L^{2}([0,1])=\left\{f:[0,1]\to\mathbb{R}|\int_{0}^{1}f(t)^{2}dt<\infty\right\} with inner product f1,f201f1(t)f2(t)𝑑t\langle f_{1},f_{2}\rangle\equiv\int_{0}^{1}f_{1}(t)f_{2}(t)dt for f1,f2L2f_{1},f_{2}\in L^{2}. More generally, the RHD is also applicable to functional data taking values in a multivariate space (cf. Claeskens et al., 2014, Hubert et al., 2015, López-Pintado et al., 2014) given an appropriately defined inner product. For solidity, in this work, we focus on univariate curves in =L2\mathbb{H}=L^{2}.

2.2 Theoretical Properties

For a depth function to be intuitively sound, it should provide a center-outward ordering of the data and satisfy some desirable properties. Liu (1990) discussed some properties satisfied by the proposed simplicial depth. Zuo and Serfling (2000) formally postulated a list of desirable properties that should be satisfied by a well-designed depth function and showed that the original Tukey’s halfspace depth satisfies all these properties. Nieto-Reyes and Battey (2016), Gijbels and Nagy (2017) among others extended such depth properties for functional data. We show some basic properties of the RHD in the following theorems which extend the postulated multivariate depth properties introduced by Zuo and Serfling (2000) to the infinite-dimensional Hilbert space \mathbb{H}. The proofs are deferred to Section S1.4 in the Supplementary Materials.

Theorem 3.
  1. (a)

    (Isometry invariance) Let A:A:\mathbb{H}\to\mathbb{H} is a bounded linear operator and bb\in\mathbb{H}. If AA is a surjective isometry, then we have Dλ(Ax+b,PAX+b)=Dλ(x,PX)D_{\lambda}(Ax+b,P_{AX+b})=D_{\lambda}(x,P_{X}) for each xx\in\mathbb{H}.

  2. (b)

    (Maximality at center) Suppose that XX is halfspace symmetric about a unique center μ\mu\in\mathbb{H} in the sense that PX(H)1/2P_{X}(H)\geq 1/2 for each H(μ)H\in\mathcal{H}(\mu), where (x)\mathcal{H}(x) denotes the collection of all closed halfspaces containing xx\in\mathbb{H}. Then, if λ>γ11/2\lambda>\gamma_{1}^{-1/2} where γ1\gamma_{1} is the largest eigenvalue, we have Dλ(μ)=supxDλ(x)D_{\lambda}(\mu)=\sup_{x\in\mathbb{H}}D_{\lambda}(x).

  3. (c)

    (Monotonicity relative to deepest point) Let θ\theta\in\mathbb{H} be such that Dλ(θ)=supxDλ(x)D_{\lambda}(\theta)=\sup_{x\in\mathbb{H}}D_{\lambda}(x). Then, it holds for any xx\in\mathbb{H} that Dλ(x)Dλ(θ+α(xθ))D_{\lambda}(x)\leq D_{\lambda}(\theta+\alpha(x-\theta)), α[0,1]\alpha\in[0,1].

  4. (d)

    (Vanishing at infinity) Let 𝒰λ{x:Γ1/2xλx}\mathcal{U}_{\lambda}\equiv\{x\in\mathbb{H}:\|\Gamma^{-1/2}x\|\leq\lambda\|x\|\}. Then, for any sequence {xn}𝒰λ\{x_{n}\}\subseteq\mathcal{U}_{\lambda} with xn\|x_{n}\|\to\infty as nn\to\infty, we have Dλ(xn)0D_{\lambda}(x_{n})\to 0 as nn\to\infty.

Theorem 3(a) extends the affine invariance property (cf. Zuo and Serfling, 2000) to an infinite dimensional space and establishes this property for the proposed RHD because any surjective isometry between two normed spaces is affine by the Mazui–Ulam theorem (Väisälä, 2003). Properties (b) and (c) in Theorem 3 respectively states that the RHD is maximized at the unique center μ\mu and it is nonincreasing along any ray leaving from the deepest curve μ\mu\in\mathbb{H}. These properties are analogous to their Euclidean versions in Zuo and Serfling (2000), but the proof for (b) is more involved and uses properties of the projection set 𝒱λ\mathcal{V}_{\lambda}. Theorem 3(d) is a weakened version of vanishing at infinity property where we need some restrictions such as the unnormalized direction set 𝒰λ\mathcal{U}_{\lambda} to limit the behavior of the depth function; see Section S1.4 in the Supplementary Materials for another version of the vanishing at infinity property with the finite-dimensional assumption and a counterexample.

The next theorem establishes the continuity of the depth function and topological properties of its level sets. For τ[0,)\tau\in[0,\infty), we define the upper level set Gτ,λG_{\tau,\lambda} of DλD_{\lambda} at level τ\tau as Gτ,λ{x:Dλ(x)τ}.G_{\tau,\lambda}\equiv\{x\in\mathbb{H}:D_{\lambda}(x)\geq\tau\}. The following theorem extends analogous properties for the finite-dimensional data case shown in Theorem 2.11 of Zuo and Serfling (2000).

Theorem 4.
  1. (a)

    The depth function Dλ:[0,)D_{\lambda}:\mathbb{H}\to[0,\infty) is upper semi-continuous.

  2. (b)

    The upper level sets 𝒢λ{Gτ,λ}τ[0,)\mathcal{G}_{\lambda}\equiv\{G_{\tau,\lambda}\}_{\tau\in[0,\infty)} are nested, closed (hence, complete), and convex in \mathbb{H}.

  3. (c)

    Let 𝕍span{ϕ1,,ϕm}\mathbb{V}\equiv\mathrm{span}\{\phi_{1},\dots,\phi_{m}\}\subseteq\mathbb{H} be a finite-dimensional subspace spanned by the first mm eigenfunctions. Then, for any τ\tau, Gτ,λ𝕍G_{\tau,\lambda}\cap\mathbb{V} is compact.

The upper semi-continuity of the RHD DλD_{\lambda} and the closedness (and hence, the completeness) of the level set Gτ,λG_{\tau,\lambda} are closely related. The compactness follows from vanishing at infinity and the restriction onto a finite dimensional subspace of \mathbb{H}. Without this restriction, the compactness of the level sets may not be achieved as described in Section S1.4 of the Supplementary Materials since vanishing at infinity fails.

2.3 Sample RHD and its Consistency

The population version of the RHD from (2) cannot be computed in practice. To define its sample version estimated from the observed functions, let X1,,XnX_{1},\dots,X_{n} be independently and identically distributed (iid) copies of XX with empirical distribution P^n\hat{P}_{n} defined as P^n(B)n1i=1n𝕀(XiB)\hat{P}_{n}(B)\equiv n^{-1}\sum_{i=1}^{n}\mathbb{I}(X_{i}\in B) for a Borel set B()B\in\mathcal{B}(\mathbb{H}), where 𝕀\mathbb{I} denotes the indicator function.. The sample covariance operator Γ^n\hat{\Gamma}_{n} is defined as Γ^nn1i=1n(XiX¯)(XiX¯)\hat{\Gamma}_{n}\equiv n^{-1}\sum_{i=1}^{n}(X_{i}-\bar{X})\otimes(X_{i}-\bar{X}) with sample mean X¯n1i=1nXi\bar{X}\equiv n^{-1}\sum_{i=1}^{n}X_{i}. Write the spectral decomposition of the operator Γ^n\hat{\Gamma}_{n} as Γ^n=j=1nγ^j(ϕ^jϕ^j)\hat{\Gamma}_{n}=\sum_{j=1}^{n}\hat{\gamma}_{j}(\hat{\phi}_{j}\otimes\hat{\phi}_{j}), where (λ^j,ϕ^j)(\hat{\lambda}_{j},\hat{\phi}_{j}) is the jj-th eigenpair of Γ^n\hat{\Gamma}_{n} for j=1,,nj=1,\dots,n, with γ^1γ^n0\hat{\gamma}_{1}\geq\cdots\geq\hat{\gamma}_{n}\geq 0 (Hsing and Eubank, 2015).

To define the sample version of the RHD, the set of projection directions is obtained as

𝒱^λ,J{vspan{ϕ^1,,ϕ^J}:v=1,Γ^J1/2vλ},\displaystyle\hat{\mathcal{V}}_{\lambda,J}\equiv\{v\in\mathrm{span}\{\hat{\phi}_{1},\dots,\hat{\phi}_{J}\}:\|v\|=1,\|\hat{\Gamma}_{J}^{-1/2}v\|\leq\lambda\}, (4)

where Γ^J1/2j=1Jγ^j1/2π^j\hat{\Gamma}_{J}^{-1/2}\equiv\sum_{j=1}^{J}\hat{\gamma}_{j}^{-1/2}\hat{\pi}_{j} is a JJ-truncated inverse of the square root operator Γ^n1/2=j=1nγ^j1/2π^j\hat{\Gamma}_{n}^{1/2}=\sum_{j=1}^{n}\hat{\gamma}_{j}^{1/2}\hat{\pi}_{j}. Here, the truncation parameter J=J(n)J=J(n) may depend on the sample size nn with J(n)J(n)\to\infty as nn\to\infty. We define the sample RHD of xx\in\mathbb{H} (with respect to P^n\hat{P}_{n}) as

D^λ,n(x)=D^λ,n(x,P^n)=infv𝒱^λ,Jn1i=1n𝕀(Xix,v0).\displaystyle\hat{D}_{\lambda,n}(x)=\hat{D}_{\lambda,n}(x,\hat{P}_{n})=\inf_{v\in\hat{\mathcal{V}}_{\lambda,J}}n^{-1}\sum_{i=1}^{n}\mathbb{I}(\langle X_{i}-x,v\rangle\geq 0). (5)

To establish the uniform consistency of the sample RHD D^λ,n(x)\hat{D}_{\lambda,n}(x) for the population RHD Dλ(x)D_{\lambda}(x), we need the following continuity/anti-concentration assumption imposed on the distribution of the projection of XX: for a regularization parameter λ(0,)\lambda\in(0,\infty) and a totally bounded subset 𝔹\mathbb{B}\subseteq\mathbb{H},

Condition C(λ,𝔹)C(\lambda,\mathbb{B}): limr0lim supksupx𝔹,v𝒱λ𝖯(|Xx,Πkv|r)=0\displaystyle\lim_{r\to 0}\limsup_{k\to\infty}\sup_{x\in\mathbb{B},v\in\mathcal{V}_{\lambda}}\mathsf{P}(|\langle X-x,\Pi_{k}v\rangle|\leq r)=0,

where Πkl=1k(ϕkϕk)\Pi_{k}\equiv\sum_{l=1}^{k}(\phi_{k}\otimes\phi_{k}) denotes the projection operator onto the linear space spanned by the first kk eigenfunctions for integer k1k\geq 1.

This condition is satisfied by a large class of random functions XX where the projection scores X,ϕj\langle X,\phi_{j}\rangle, j=1,2,j=1,2,\dots are independent and continuous; in particular, Gaussian random functions.

Example 3.

If XX is Gaussian, for each v𝒱λv\in\mathcal{V}_{\lambda} and x𝔹x\in\mathbb{B}, a truncated projection Xx,Πkv\langle X-x,\Pi_{k}v\rangle follows the normal distribution with mean μx,Πkv\langle\mu-x,\Pi_{k}v\rangle and variance Γ1/2Πkv2\|\Gamma^{1/2}\Pi_{k}v\|^{2} Here, Πkv0\Pi_{k}v\neq 0 for all 𝒱λ\mathcal{V}_{\lambda} and for sufficiently large kk by Lemma S1 in the supplement. Moreover, the mean and standard deviation are bounded by Lemma S2 as supx𝔹,v𝒱λ|μx,Πkv|μ+supx𝔹x<\sup_{x\in\mathbb{B},v\in\mathcal{V}_{\lambda}}|\langle\mu-x,\Pi_{k}v\rangle|\leq\|\mu\|+\sup_{x\in\mathbb{B}}\|x\|<\infty and lim infkinfv𝒱λΓ1/2Πkv(2λ)1>0.\liminf_{k\to\infty}\inf_{v\in\mathcal{V}_{\lambda}}\|\Gamma^{1/2}\Pi_{k}v\|\geq(2\lambda)^{-1}>0. By the properties of normal distribution,

lim supksupx𝔹,v𝒱λ𝖯(|Xx,Πkv|τ)Φ(2λτ)Φ(2λτ)0\limsup_{k\to\infty}\sup_{x\in\mathbb{B},v\in\mathcal{V}_{\lambda}}\mathsf{P}(|\langle X-x,\Pi_{k}v\rangle|\leq\tau)\leq\Phi(2\lambda\tau)-\Phi(-2\lambda\tau)\to 0

as τ0\tau\to 0. Hence, Condition C(λ,𝔹)C(\lambda,\mathbb{B}) holds for Gaussian element XX.

To verify the consistency theorem, in addition to Condition C(λ,𝔹)C(\lambda,\mathbb{B}), we impose some standard regularity conditions on XX, denoted as Conditions (A1)-(A4) in Section S1.5.3 of the supplement. Similar or stronger conditions are commonly imposed in the functional principal component analysis literature for consistent estimation of the eigen-components (Cardot et al., 2007, Hall et al., 2007). These conditions are applied in the proofs involving perturbation theory for functional data (cf. Sections S1.5.3-S1.5.4 of the supplement). The consistency result is established in the following theorem.

Theorem 5.

Let λ(0,)\lambda\in(0,\infty) be fixed and 𝔹\mathbb{B}\subseteq\mathbb{H} be a totally bounded subset. Suppose that Condition C(λ+λ0,𝔹)C(\lambda+\lambda_{0},\mathbb{B}) holds for some λ0(0,)\lambda_{0}\in(0,\infty), Conditions (A1)-(A4) in the supplement hold, and n1/2J7/2(logJ)2=O(1)n^{-1/2}J^{7/2}(\log J)^{2}=O(1) as nn\to\infty. Then, as nn\to\infty, we have

supx𝔹|D^λ,n(x)Dλ(x)|𝖯0.\displaystyle\sup_{x\in\mathbb{B}}|\hat{D}_{\lambda,n}(x)-D_{\lambda}(x)|\xrightarrow{\mathsf{P}}0.

The proof for this result is included in the supplement. A main component of the proof is to establish the Glivenko–Cantelli theorem for the empirical process {P^n(Hx,v):v𝒱λ,x𝔹}\{\hat{P}_{n}(H_{x,v}):v\in\mathcal{V}_{\lambda},x\in\mathbb{B}\} indexed by an infinite-dimensional class of halfspaces, where Hx,v{y:yx,v0}H_{x,v}\equiv\{y\in\mathbb{H}:\langle y-x,v\rangle\geq 0\} denote the closed halfspace with normal vector vv\in\mathbb{H} containing xx\in\mathbb{H} on its boundary. Empirical process results for infinite-dimensional space are challenging to show except for the results on bracketing numbers of the classes of either smooth or monotone functions by van der Vaart and Wellner (1996, Section 2.7) and Glivenko–Cantelli and Donsker theorems over some finite dimensional subspace by Chakraborty and Chaudhuri (2014b). Developing empirical process theory for functional data may not be feasible without such restriction on the class of functions to reasonably small subclasses. The main reason is that, in the infinite-dimensional Hilbert space \mathbb{H}, sufficiently small balls (e.g., balls with radius smaller than 2\sqrt{2}) fail to cover the closed unit ball; this implies the closed unit ball in \mathbb{H} is not only non-totally-bounded (e.g., MacCluer, 2009, Hsing and Eubank, 2015), but it may also not have a finite covering number. As a related discussion, we refer to Caramanis (2000), showing the closed unit ball in an infinite dimensional Banach space is not a Glivenko-Cantelli class. Here, we consider a totally bounded subset 𝔹\mathbb{B} for such restriction. The proof consists of showing the total boundedness of 𝒱λ\mathcal{V}_{\lambda} and asymptotic uniform equicontinuity of the stochastic process {P^n(Hx,v):v𝒱λ,x𝔹}\{\hat{P}_{n}(H_{x,v}):v\in\mathcal{V}_{\lambda},x\in\mathbb{B}\} (cf. Section S1.4.2 of the supplement), which is more involved than directly applying the covering or bracketing numbers (cf. van der Vaart and Wellner, 1996, Chapter 2). Please refer to the respective paragraphs following Theorems 5-6.

2.4 Practical Computation

The sample RHD involves an infimum over the projection set 𝒱^λ,J\hat{\mathcal{V}}_{\lambda,J} which makes the computation hard; even for the original Tukey’s depth in d\mathbb{R}^{d} the exact computation can be prohibitive for dimension d4d\geq 4 (Dyckerhoff and Mozharovskyi, 2016). To approximate the sample RHD, we adopt a random projection approach.

For v^=j=1Ja^jϕ^jspan{ϕ^1,,ϕ^J}\hat{v}=\sum_{j=1}^{J}\hat{a}_{j}\hat{\phi}_{j}\in\mathrm{span}\{\hat{\phi}_{1},\dots,\hat{\phi}_{J}\} with 𝒂^=(a^1,,a^J)J\hat{\bm{a}}=(\hat{a}_{1},\dots,\hat{a}_{J})^{\top}\in\mathbb{R}^{J}, it holds that

v^2=j=1Ja^j2=𝒂^J2andΓ^J1/2v^2=j=1Jγ^j1a^j2=^J1/2𝒂^J2\displaystyle\|\hat{v}\|^{2}=\sum_{j=1}^{J}\hat{a}_{j}^{2}=\|\hat{\bm{a}}\|_{\mathbb{R}^{J}}^{2}\quad\text{and}\quad\|\hat{\Gamma}_{J}^{-1/2}\hat{v}\|^{2}=\sum_{j=1}^{J}\hat{\gamma}_{j}^{-1}\hat{a}_{j}^{2}=\|\hat{\mathcal{R}}_{J}^{-1/2}\hat{\bm{a}}\|_{\mathbb{R}^{J}}^{2}

where ^Jdiag(γ^1,,γ^J)\hat{\mathcal{R}}_{J}\equiv\mathrm{diag}(\hat{\gamma}_{1},\dots,\hat{\gamma}_{J}). Then, 𝒱^λ,J\hat{\mathcal{V}}_{\lambda,J} can be written as

𝒱^λ,J={v^=j=1Ja^jϕ^jspan{ϕ^1,,ϕ^J}:𝒂^=(a^1,,a^J)𝒜^λ,J}\displaystyle\hat{\mathcal{V}}_{\lambda,J}=\left\{\hat{v}=\sum_{j=1}^{J}\hat{a}_{j}\hat{\phi}_{j}\in\mathrm{span}\{\hat{\phi}_{1},\dots,\hat{\phi}_{J}\}:\hat{\bm{a}}=(\hat{a}_{1},\dots,\hat{a}_{J})^{\top}\in\hat{\mathcal{A}}_{\lambda,J}\right\}

where 𝒜^λ,J{𝒂^J:𝒂^J=1,^J1/2𝒂^Jλ}\hat{\mathcal{A}}_{\lambda,J}\equiv\{\hat{\bm{a}}\in\mathbb{R}^{J}:\|\hat{\bm{a}}\|_{\mathbb{R}^{J}}=1,\|\hat{\mathcal{R}}_{J}^{-1/2}\hat{\bm{a}}\|_{\mathbb{R}^{J}}\leq\lambda\}. Note that the set 𝒜^λ,J\hat{\mathcal{A}}_{\lambda,J} is the intersection of the (surface of) unit sphere and a (solid) ellipsoidal region centered at the origin in J\mathbb{R}^{J}. Then, the sample RHD is written as

D^λ,n(x)=inf𝒂^𝒜^λ,Jn1i=1n𝕀((𝑿^i𝒙^)𝒂^0)\displaystyle\hat{D}_{\lambda,n}(x)=\inf_{\hat{\bm{a}}\in\hat{\mathcal{A}}_{\lambda,J}}n^{-1}\sum_{i=1}^{n}\mathbb{I}((\hat{\bm{X}}_{i}-\hat{\bm{x}})^{\top}\hat{\bm{a}}\geq 0) (6)

where 𝑿^i=[Xi,ϕ^j]1jJ\hat{\bm{X}}_{i}=[\langle X_{i},\hat{\phi}_{j}\rangle]_{1\leq j\leq J} and 𝒙^=[x,ϕ^j]1jJ\hat{\bm{x}}=[\langle x,\hat{\phi}_{j}\rangle]_{1\leq j\leq J} are JJ-dimensional vectors in J\mathbb{R}^{J}. Although (6) is reminiscent of sample Tukey’s depth applied on the data 𝑿^i\hat{\bm{X}}_{i}, they are remarkably different in their designs. The RHD targets infinite-dimensional data and allows for an increasing JJ. The depth values are insensitive to increase in JJ when it is moderately large (J6J\geq 6 as demonstrated in our numerical studies) thanks to the regularization set which stabilizes the depth. On the other hand, Tukey’s depth is most often applied in a fixed low dimension; when the dimension increases, Tukey’s depth become less capable of distinguishing points because of the concentration of measure and the degeneracy issue.

To approximate the infimum, we draw MM random vectors 𝒂^m=(a^m1,,a^mJ)𝗂𝗂𝖽ν\hat{\bm{a}}_{m}=(\hat{a}_{m1},\dots,\hat{a}_{mJ})^{\top}\overset{\mathsf{iid}}{\sim}\nu for m=1,,Mm=1,\dots,M from some continuous distribution ν\nu supported on 𝒜^λ,JJ\hat{\mathcal{A}}_{\lambda,J}\subseteq\mathbb{R}^{J}. The projection set is approximated by the finite set

𝒱~λ,J,M{v~=j=1Ja^mjϕ^j:m=1,,M}.\displaystyle\tilde{\mathcal{V}}_{\lambda,J,M}\equiv\left\{\tilde{v}=\sum_{j=1}^{J}\hat{a}_{mj}\hat{\phi}_{j}:m=1,\dots,M\right\}. (7)

Then, the sample depth D^λ,n(x)=D^λ,n(x,P^n)\hat{D}_{\lambda,n}(x)=\hat{D}_{\lambda,n}(x,\hat{P}_{n}) in (6) is approximated by

D~λ,n,M(x)=D~λ,n,M(x,P^n,{𝒂^m}m=1M)=minm=1,,Mn1i=1n𝕀((𝑿^i𝒙^)𝒂^m0).\displaystyle\tilde{D}_{\lambda,n,M}(x)=\tilde{D}_{\lambda,n,M}(x,\hat{P}_{n},\{\hat{\bm{a}}_{m}\}_{m=1}^{M})=\min_{m=1,\dots,M}n^{-1}\sum_{i=1}^{n}\mathbb{I}((\hat{\bm{X}}_{i}-\hat{\bm{x}})^{\top}\hat{\bm{a}}_{m}\geq 0). (8)

Rejection sampling is applied to draw the projection directions 𝒂^m\hat{\bm{a}}_{m}. To increase acceptance in the target region 𝒜^λ,J\hat{\mathcal{A}}_{\lambda,J}, we set the proposal distribution of projection directions 𝒂^m\hat{\bm{a}}_{m} to that of 𝒛/𝒛J\bm{z}/\|\bm{z}\|_{\mathbb{R}^{J}} where 𝒛𝖭(𝟎,^J)\bm{z}\sim\mathsf{N}(\bm{0},\hat{\mathcal{R}}_{J}), a non-isotropic distribution on the unit sphere.

The following theorem justifies the approximated sample RHD in (8) for a sufficiently large MM, for which the proof is given in Section S1.6 in the supplement.

Theorem 6.

Let λ(0,)\lambda\in(0,\infty) be fixed and 𝔹\mathbb{B}\subseteq\mathbb{H} be a totally bounded subset. Suppose that the assumptions in Theorem 5 hold. Then, for each ε>0\varepsilon>0, the following event occurs with probability tending to 1 as nn\to\infty: for a continuous probability distribution ν\nu supported on 𝒜^λ,J\hat{\mathcal{A}}_{\lambda,J} and a sequence {𝐚^m}m=1\{\hat{\bm{a}}_{m}\}_{m=1}^{\infty} of iid random vectors drawn from ν\nu, as MM\to\infty,

ν(supx𝔹|D~λ,n,M(x)D^λ,n(x)|>ε)0.\displaystyle\nu\left(\sup_{x\in\mathbb{B}}\left|\tilde{D}_{\lambda,n,M}(x)-\hat{D}_{\lambda,n}(x)\right|>\varepsilon\right)\to 0.

The key point of the proof is to show a type of asymptotic equicontinuity of a process {P^n(Hx,v^):x𝔹,v^𝒱^λ,J}\{\hat{P}_{n}(H_{x,\hat{v}}):x\in\mathbb{B},\hat{v}\in\hat{\mathcal{V}}_{\lambda,J}\}, where the index set also depends on the observed random functions 𝒳n{Xi}i=1n\mathcal{X}_{n}\equiv\{X_{i}\}_{i=1}^{n}. Such processes with a changing index set following the sample size nn have rarely been studied. We derived this theorem by constructing a correspondence from 𝒱^λ,J\hat{\mathcal{V}}_{\lambda,J} to 𝒱λ\mathcal{V}_{\lambda}, relating this correspondence with the asymptotic equicontinuity of the original process {P^n(Hx,v):x𝔹,v𝒱λ}\{\hat{P}_{n}(H_{x,v}):x\in\mathbb{B},v\in\mathcal{V}_{\lambda}\}, and using the total boundedness of 𝒱λ\mathcal{V}_{\lambda}.

The population (2) and sample RHD (5) are monotonically decreasing as λ\lambda increases. In order for the approximate RHD to enjoy the same property, namely λD~λ,n,M(x)\lambda\mapsto\tilde{D}_{\lambda,n,M}(x) is decreasing in λ\lambda when all other quantities are fixed, we employ the following procedure. Let Λ{λ1,,λQ}\Lambda\equiv\{\lambda_{1},\dots,\lambda_{Q}\} be a finite set of regularization parameters that we will use to evaluate the approximate RHD. We draw a single set of random directions 𝒛1,,𝒛M𝖭(0,^J)\bm{z}_{1},\dots,\bm{z}_{M}\sim\mathsf{N}(0,\hat{\mathcal{R}}_{J}) and set 𝒂^m=𝒛m/𝒛mJ\hat{\bm{a}}_{m}=\bm{z}_{m}/\|\bm{z}_{m}\|_{\mathbb{R}^{J}} as explained above, and use the directions {𝒂^m}m=1M\{\hat{\bm{a}}_{m}\}_{m=1}^{M} to compute all depth values {D~λ,n,M:λΛ}\{\tilde{D}_{\lambda,n,M}:\lambda\in\Lambda\}. The monotonicity then follows from the definition of D~λ,n,M\tilde{D}_{\lambda,n,M}.

In practice, instead of specifying λ\lambda for regularization, we use the uu-quantile of the RKHS norms 𝒩M{^J1/2𝒂^mJ}m=1M\mathcal{N}_{M}\equiv\{\|\hat{\mathcal{R}}_{J}^{-1/2}\hat{\bm{a}}_{m}\|_{\mathbb{R}^{J}}\}_{m=1}^{M} as λ\lambda, where u(0,1)u\in(0,1) is called the quantile level for regularization. The quantile level approach to determine λ\lambda has several advantages; it depends on the data and avoids choosing values of λ\lambda outsides of the range of the RKHS norms 𝒩M\mathcal{N}_{M}. In our numerical studies, we investigate the performance of RHD using different uu-quantiles of 𝒩M\mathcal{N}_{M} as regularization parameter λ\lambda.

Algorithm 1 summarizes the approximate computation.

Data: Random sample 𝒳\mathcal{X} and depth evaluation point xx
Parameters: Either regularization parameter set Λ\Lambda or quantile level set 𝒰\mathcal{U}. Also truncation parameter JJ
Result: Approximate sample RHD D~λ,n,M(x)\tilde{D}_{\lambda,n,M}(x) for λΛ\lambda\in\Lambda
1 for m=1,,Mm=1,\dots,M^{\prime} do
2       Generate 𝒛m𝖭(0,^J)\bm{z}_{m}\sim\mathsf{N}(0,\hat{\mathcal{R}}_{J}) and compute 𝒂^m=𝒛m/𝒛mJ\hat{\bm{a}}_{m}=\bm{z}_{m}/\|\bm{z}_{m}\|_{\mathbb{R}_{J}}
3 end for
4
5if  Λ\Lambda is unspecified  then
6       for u𝒰u\in\mathcal{U} do
7             Add the uu-quantile of {^J1/2𝒂^mJ}m=1M\{\|\hat{\mathcal{R}}_{J}^{-1/2}\hat{\bm{a}}_{m}\|_{\mathbb{R}^{J}}\}_{m=1}^{M^{\prime}} to Λ\Lambda
8       end for
9      
10 end if
11
12for  λΛ\lambda\in\Lambda do
13      
14      𝒜~λ,J,M\tilde{\mathcal{A}}_{\lambda,J,M}\leftarrow\emptyset
15      for m=1,,Mm=1,\dots,M^{\prime} do
16             if ^J1/2𝐚^mJλ\|\hat{\mathcal{R}}_{J}^{-1/2}\hat{\bm{a}}_{m}\|_{\mathbb{R}^{J}}\leq\lambda then
17                   Add 𝒂^m\hat{\bm{a}}_{m} to 𝒜~λ,J,M\tilde{\mathcal{A}}_{\lambda,J,M}
18             end if
19            if |𝒜~λ,J,M|=M|\tilde{\mathcal{A}}_{\lambda,J,M}|=M then
20                  
21             end if
22            
23       end for
24      
25      D~λ,n,M(x)min𝒂^𝒜~λ,J,Mn1i=1n𝕀((𝑿^i𝒙^)𝒂^0)\tilde{D}_{\lambda,n,M}(x)\leftarrow\min_{\hat{\bm{a}}\in\tilde{\mathcal{A}}_{\lambda,J,M}}n^{-1}\sum_{i=1}^{n}\mathbb{I}((\hat{\bm{X}}_{i}-\hat{\bm{x}})^{\top}\hat{\bm{a}}\geq 0);
26 end for
27
Algorithm 1 Approximate sample RHD
Remark 2.

The novelty in the RHD arises from the use of the regularization parameter λ(0,)\lambda\in(0,\infty). Although the approximation of the RHD in Algorithm 1 and the naive application of the classical halfspace depth to the first JJ FPC scores appear similar, they are clearly differentiated by the use of λ\lambda. In our work, the FPC scores serve as a practical method for computing the sample RHD. However, the “FPCA depth” without regularization will suffer from degeneracy as shown in Figure 1 and Figure S1 of the supplement. See Remark S1 of the supplement for a relevant discussion.

Remark 3.

The geometry of our projection set 𝒱^λ,J\hat{\mathcal{V}}_{\lambda,J}, as defined in (4), differs significantly from the classical finite-dimensional case 𝕊J{aJ:aJ=1}\mathbb{S}_{J}\equiv\{a\in\mathbb{R}^{J}:\|a\|_{\mathbb{R}^{J}}=1\}. Specifically, the direction set 𝒱^λ,J\hat{\mathcal{V}}_{\lambda,J} is an intersection of an ellipsoid and a sphere, which can be considerably smaller than the classical case 𝕊J\mathbb{S}_{J}, the entire JJ-dimensional hypersphere. As JJ increases, our sample direction set 𝒱^λ,J\hat{\mathcal{V}}_{\lambda,J} targets a constant set 𝒱λ\mathcal{V}_{\lambda} in (3), ensuring the non-degeneracy of the proposed RHD (cf. Theorem 2). On the other hand, the hypersphere 𝕊J\mathbb{S}_{J} in J\mathbb{R}^{J} used for the finite-dimensional halfspace depth expands to the non-totally-bounded set 𝕊{v:v=1}\mathbb{S}\equiv\{v\in\mathbb{H}:\|v\|=1\} in \mathbb{H}, leading to the degeneracy of the original halfspace depth without regularization (cf. Theorem 1). As a result, the approximation of the sample RHD by random projections, as justified in Theorem 6 and described in Algorithm 1, is reasonable, whereas the classical random projection approach (e.g., Cuesta-Albertos and Nieto-Reyes, 2008) struggles in higher dimensions JJ. Our simulation results indicate that more than M=1000M=1000 projections generally provides a sufficient approximation for RHD and perform well in various scenarios as seen in Section 4.

2.5 Selecting Tuning Parameters

The approximate sample RHD D~λ,n,M\tilde{D}_{\lambda,n,M} (8) involves three tuning parameters: the regularization parameter λ\lambda (or equivalently, the quantile level uu), the truncation level JJ, and the number MM of random projection directions. The choice of the most influential parameter λ(0,)\lambda\in(0,\infty) depends on the user’s analytic goals, as each λ\lambda can provide different perspectives (e.g., magnitude versus shape) and the optimal λ\lambda for one analysis (e.g., a rank test) may not be ideal for another (e.g., a classification method). Moreover, further insight about the data can be obtained by investigating a family of the RHDs {Dλ}λΛ\{D_{\lambda}\}_{\lambda\in\Lambda} with multiple regularization parameters Λ(0,)\Lambda\subseteq(0,\infty). The less sensitive tuning parameter JJ should be a sufficiently large integer, which may be determined by a common criterion such as the fraction of variance explained (FVE) (cf. Kokoszka and Reimherr, 2017, Section 12.2). Lastly, a reasonably large value for the final parameter MM ensures a sufficient approximation, as supported by Remark 3. We elaborate more details on the selection of these tuning parameters below.

The most important parameter, by far, is the regularization λ\lambda since it determines how sensitive the depth is to different modes of variation in the sample, which is a useful feature for analysis. When λ\lambda is small, projections are constrained to have large components only along the leading eigenfunctions, in which case the RHD emphasizes the overall magnitude rather than shape of the curves. In contrast, when λ\lambda is large, the RHD considers projections along both leading and later eigenfunctions, so it increases its emphasis in shape relative to magnitude. This flexibility of the RHD is used and highlighted in our outlier detection method proposed in Section 3 and is demonstrated numerically in Section 4.3.

The choice of λ\lambda may depend on the structure of a given dataset and the purpose of the study. If a dataset can be explained by a few modes of variation (say, mostly in terms of a vertical shift), a small regularization parameter can be chosen; otherwise a larger regularization parameter should be considered. If a class label response exists, the selection of λ\lambda can be tied to depth-based classification performance (Ghosh and Chaudhuri, 2005, López-Pintado and Romo, 2006, Hubert et al., 2017). Parameter λ\lambda could also be chosen by maximizing the power of RHD-based rank test (for general depth-based rank test, see Liu and Singh, 1993, López-Pintado and Romo, 2009) if training samples are available. In general, practitioners could inspect a range of λ\lambda values and obtain different rankings and outlyingness from the depth values. This would help understand the dataset from different perspectives in terms of magnitude vs shape. For practical uses, we recommend as default to set λ\lambda according to a quantile level u=0.95u=0.95 of the RKHS norms 𝒩M\mathcal{N}_{M}, which is implemented in our R package. This choice tends to display extremeness in both magnitude and shape while avoiding degeneracy as shown in our numerical studies.

The selection of the truncation level JJ is a secondary consideration. It should be chosen as a large enough value, e.g., based on the FVE, because JJ is needed for approximating the population RKHS norm Γ1/2v\|\Gamma^{-1/2}v\| by the sample version Γ^J1/2v\|\hat{\Gamma}_{J}^{-1/2}v\|. In Figure 2, we show the rankings of several types of outliers when using the approximate RHD based on different λ\lambda and JJ parameters to illustrate how sensitive the rankings are to the choice of these parameters. A simulated dataset containing non-Gaussian functional data is considered where one outlier is added at a time.

Refer to caption
Figure 2: RHD-based normalized rankings of outlying curves selected among the ones described in Section 4.1 when n=400n=400. The least deepest curves have 1/n01/n\approx 0 normalized rank. The results show different truncation levels J{2,4,6,8}J\in\{2,4,6,8\} (panels) and four regularization parameters corresponding to the quantile levels u{0.5,0.7,0.9,0.95}u\in\{0.5,0.7,0.9,0.95\} (x-axis).

A magnitude and three shape outliers of different types were ranked, where the xx-axis and different panels display varying λ\lambda and JJ, respectively (see Section 4.1 for more details). We define the RHD-based (normalized) rank of a functional observation XiX_{i} in the sample {Xi}i=1n\{X_{i}\}_{i=1}^{n} as R~i/n\tilde{R}_{i}/n, where R~i\tilde{R}_{i} denotes the rank of D~λ,n,M(Xi)\tilde{D}_{\lambda,n,M}(X_{i}) in {D~λ,n,M(Xi)}i=1n\{\tilde{D}_{\lambda,n,M}(X_{i})\}_{i=1}^{n}, and therefore, higher ranks correspond to larger depth. Minimum ranks are assigned to curves with the same depths. We observe that the results are highly similar when JJ is moderately large, for example, J6J\geq 6 in this case. In contrast, the rankings for the smallest and largest regularization λ\lambda are markedly different; the magnitude outlier gets the lowest rank for all λ\lambdas but this is not the case for the shape outliers. The shape outliers are only detected with higher λ\lambdas because larger λ\lambda emphasizes shape more strongly.

The least consequential parameter is MM, which should be chosen to be as large as computation affords. In practice, for example, M=1000M=1000 is often enough to get a reasonable approximation (cf. Remark 3).

In sum, we recommend to choose a large MM, a sufficiently large JJ that may depend on the sample size nn, and an appropriate λ\lambda for the purposes of the analysis.

3 Outlier Detection

In this section, we describe an outlier detection method developed from the proposed RHD, which detects shape as well as magnitude outliers. Since having a low depth value is not sufficient to be considered an outlier, we introduce a novel outlier detection method based on projection directions in (7). The overall idea is to first prescreen potential outliers using the RHD, and then detect outliers by applying univariate boxplots to data projected onto restricted directions that maximally separate inliers and outliers. Comprehensive numerical studies in Section 4.3 and the supplement show that our new outlier detection method works well and detects outliers with different complicated shapes.

3.1 Outlier detection procedure based on the RHD

The detailed outlier detection procedure is described as follows. We start by defining the most extreme functions with the least (approximate) RHD values as outlier candidates, for which the index set is denoted as

min,λ=min,λ(𝒳n)argmin1inD~λ,n,M(Xi).\displaystyle\mathcal{I}_{\min,\lambda}=\mathcal{I}_{\min,\lambda}(\mathcal{X}_{n})\equiv\operatornamewithlimits{argmin}_{1\leq i\leq n}\tilde{D}_{\lambda,n,M}(X_{i}).

Let Xi0X_{i_{0}}, i0min,λi_{0}\in\mathcal{I}_{\min,\lambda} be an outlier candidate in the sample 𝒳n{Xi}i=1n\mathcal{X}_{n}\equiv\{X_{i}\}_{i=1}^{n}. To construct univariate boxplots, we keep track of only the most extreme projection directions for evaluating the halfspace probabilities at x=Xi0x=X_{i_{0}} for outlier detection. Among directions in 𝒱~λ,J,M\tilde{\mathcal{V}}_{\lambda,J,M} (7), suppose that there are K=Ki0K=K_{i_{0}} directions, denoted as {vmk}k=1K\{v_{m_{k}}\}_{k=1}^{K}, that minimize the halfspace probability when computing D~λ,n,M(Xi0)\tilde{D}_{\lambda,n,M}(X_{i_{0}}) in (8), that is,

{vmk}k=1Kargmin1mMP^n(HXi0,vm).\displaystyle\{v_{m_{k}}\}_{k=1}^{K}\subset\operatornamewithlimits{argmin}_{1\leq m\leq M}\hat{P}_{n}(H_{X_{i_{0}},v_{m}}).

For each direction vmkv_{m_{k}}, we obtain projections {Xi,vmk}i=1n\{\langle X_{i},v_{m_{k}}\rangle\}_{i=1}^{n} and label outliers using a univariate boxplot with a given factor ff to construct the fence. The fence is constructed as [Q1f×IQR,Q3+f×IQR][Q_{1}-f\times IQR,Q_{3}+f\times IQR] where Q1Q_{1}, Q3Q_{3}, and IQR denoting the first quartile, third quartile, and the interquartile range of the projections, respectively. We label as outliers 𝒪i0,k\mathcal{O}_{i_{0},k} the functional observations with projection lying outside of the fence along this direction. The final set of all outliers is obtained as

𝒪λ(i0min,λk=1Ki0𝒪i0,k)min,λ.\displaystyle\mathcal{O}_{\lambda}\equiv\left(\bigcup_{i_{0}\in\mathcal{I}_{\min,\lambda}}\bigcup_{k=1}^{K_{i_{0}}}\mathcal{O}_{i_{0},k}\right)\cap\mathcal{I}_{\min,\lambda}.

The algorithm for outlier detection is given in Algorithm 2.

Data: Random sample 𝒳n\mathcal{X}_{n}
Parameters: Regularization λ\lambda, truncation JJ, and adjustment factor ff
Result: Indices 𝒪λ\mathcal{O}_{\lambda} of outliers in 𝒳n\mathcal{X}_{n}
1
2Obtain D~λ,n,M(x),x𝒳n\tilde{D}_{\lambda,n,M}(x),x\in\mathcal{X}_{n} by invoking Algorithm 1
3min,λargmin1inD~λ,n,M(Xi)\mathcal{I}_{\min,\lambda}\leftarrow\operatornamewithlimits{argmin}_{1\leq i\leq n}\tilde{D}_{\lambda,n,M}(X_{i})
4𝒪λ\mathcal{O}_{\lambda}\leftarrow\emptyset
5for  i0min,λi_{0}\in\mathcal{I}_{\min,\lambda} and 𝐚^𝒜~λ,J\hat{\bm{a}}\in\tilde{\mathcal{A}}_{\lambda,J}  do
6       xXi0x\leftarrow X_{i_{0}}, 𝑿^i[Xi,ϕ^j]1jJ\hat{\bm{X}}_{i}\leftarrow[\langle X_{i},\hat{\phi}_{j}\rangle]_{1\leq j\leq J}, and 𝒙^[x,ϕ^j]j=1J\hat{\bm{x}}\leftarrow[\langle x,\hat{\phi}_{j}\rangle]_{j=1}^{J}
7      if n1i=1n𝕀((𝐗^i𝐱^)𝐚^0)=D~λ,n,M(x)n^{-1}\sum_{i=1}^{n}\mathbb{I}((\hat{\bm{X}}_{i}-\hat{\bm{x}})^{\top}\hat{\bm{a}}\geq 0)=\tilde{D}_{\lambda,n,M}(x) then
8             Form projections {𝑿^i,𝒂^}i=1n\{\langle\hat{\bm{X}}_{i},\hat{\bm{a}}\rangle\}_{i=1}^{n} and obtain their first quartile, third quartile, and interquartile range as Q1Q_{1}, Q3Q_{3}, and IQRIQR
9            for  i=1,,ni=1,\dots,n  do
10                   if  𝐗^i,𝐚^[Q1fIQR,Q3+fIQR]\langle\hat{\bm{X}}_{i},\hat{\bm{a}}\rangle\notin[Q_{1}-f\cdot IQR,Q_{3}+f\cdot IQR]  then
11                         𝒪λ𝒪λ{i}\mathcal{O}_{\lambda}\leftarrow\mathcal{O}_{\lambda}\cup\{i\}
12                   end if
13                  
14             end for
15            
16       end if
17      
18 end for
𝒪λ𝒪λmin,λ\mathcal{O}_{\lambda}\leftarrow\mathcal{O}_{\lambda}\cap\mathcal{I}_{\min,\lambda};
Algorithm 2 Detect outliers in 𝒳n\mathcal{X}_{n} based on the RHD

We determine the adjustment factor ff in our outlier detection method by a simulation-based method, following the approach in Sun and Genton (2012). The factor ff in the functional boxplot is calibrated to the value that assigns 0.7% observations as outliers in simulated datasets containing only inliers. The simulated observations follow a Gaussian distribution matching the (empirical) mean and covariance functions of the actual input dataset. See Algorithm S1 in the supplement.

Remark 4.

This paper does not address the case of clustered functional observations that arise from mixture distributions. Such scenarios might need additional tools such as local depths (Paindaveine and Van Bever, 2013) or depth-based classification/clustering methods (Li et al., 2012). However, these problems are beyond the scope of our paper, and we defer to them for future directions.

3.2 Robustness of the outlier detection

With a broader interpretation of robustness, the proposed RHD is robust in the following sense. First, outliers, even very extreme ones, will not break down our RHD-based outlier detection method as shown in our simulation study (cf. Section 4). This might sound surprising because our procedure relies on the covariance operator Γ\Gamma which is non-robust, but the sensitivity to outliers actually helps reveal shape outliers by capturing the directions where these outliers lie. Second, the choice of the adjustment factor ff by Algorithm S1 in the supplement is shown to be resistant to outliers, as discussed in Section S4 of the supplement, even though the procedure depends on the sample covariance operator Γ^n\hat{\Gamma}_{n} which is (again) non-robust. This indicates that the whole procedure of our outlier detection method is robust even against very extreme outliers and useful for detecting all types of outliers under our consideration.

4 Numerical Studies

Section 4.1 describes the design of the simulation study along with different types of magnitude and shape outliers under consideration. The performances of the RHD rankings and the proposed outlier detection method are respectively evaluated in Section 4.2 and Section 4.3. Finally, Section 4.4 devotes to study the robustness of the RHD-based median.

The performance of the RHD is evaluated based on (i) the depth ranks of the simulated outliers in Section 4.2, (ii) the proportions of falsely/correctly detected outliers in Section 4.3, and (iii) the mean squared errors (MSEs) of the RHD-based medians in Section 4.4. For the first study, we compared RHD with other depth notions including MBD (López-Pintado and Romo, 2009), extremal depth (ED) (Narisetty and Nair, 2016), as well as the kk-th order integrated depth by Nagy et al. (2017), denoted as IDk\mathrm{ID}_{k} with order k{1,2,3}k\in\{1,2,3\}. In the second study, we compared our proposed outlier detection method against functional boxplot (Sun and Genton, 2011) constructed based on either the MBD or the ED, and with the outlier detection method introduced in Nagy et al. (2017). These methods are implemented in R packages fdaoutlier and ddalpha. Since all outlier detection methods are able to benefit from transformations, e.g., computing derivatives of the original curve, (cf. Nagy et al., 2017, Section 3.1; Dai et al. (2020)), we consider only detection methods based on the original curves for simplicity. The third study compares the MSEs of medians based on the different functional depths mentioned above. All these methods are implemented in R packages fdaoutlier and ddalpha.

4.1 Setting up different types of outliers

To evaluate the performance of our proposed RHD, we consider inliers that are relatively smooth curves and different types of outliers, which include outliers in either magnitude or shape. The shape outliers are outlying only in terms of their pattern but not in magnitude. All curves are evaluated at 50 equally spaced time points in [0,1][0,1].

Inlier curves 𝒳nin={Xi}i=1nin\mathcal{X}_{n_{in}}=\{X_{i}\}_{i=1}^{n_{in}} are generated as

X=j=1J0γjξjϕj\displaystyle X=\sum_{j=1}^{J_{0}}\sqrt{\gamma_{j}}\xi_{j}\phi_{j} (9)

with J0=15J_{0}=15, where {γj}j=1\{\gamma_{j}\}_{j=1}^{\infty} is a non-increasing sequence of positive numbers with j=1γj<\sum_{j=1}^{\infty}\gamma_{j}<\infty and {ϕj}j=1\{\phi_{j}\}_{j=1}^{\infty} is the set of trigonometric basis functions, which forms an orthonormal basis of L2([0,1])L^{2}([0,1]). The random variables ξj\xi_{j} are chosen as ξjξWj\xi_{j}\equiv\xi W_{j} where both WjW_{j} and ξ\xi are independent 𝖴𝗇𝗂𝖿(3,3)\mathsf{Unif}(-\sqrt{3},\sqrt{3}) random variables, so the random function XX is non-Gaussian with a bounded range. The eigenvalues are set as γj=2l=jl5\gamma_{j}=2\sum_{l=j}^{\infty}l^{-5}, so that the eigengaps γjγj+1=2j5\gamma_{j}-\gamma_{j+1}=2j^{-5} follow a polynomial decay and give rise to relatively smooth inlier curves.

Refer to caption
Figure 3: Illustration of the eight types of outliers (black and colored curves) and inliers (gray) considered in the simulation

Eight different types of outlier curves are considered, including one magnitude and seven shape outliers; Figure 3 displays a realization of the eight types of outliers while Section S2.2 in the supplement describes the details of their construction. Note that the shape outliers are set to lie within the range of inliers at any time point, so outlier detection based on single time points will not perform well. Similar outliers have been described in Dai et al. (2020) and the references therein.

4.2 Ranking outliers

We compute the RHD-based normalized rankings of a sample of curves as defined in Section 2.5. Each simulation scenario considers a sample contaminated by a different type of outlier. We consider different sample sizes n{50,200,400}n\in\{50,200,400\}, truncation levels J{2,4,6,8,10}J\in\{2,4,6,8,10\}, and regularization parameters corresponding to quantile levels u{0.5,,0.9,0.91,,0.99}u\in\{0.5,\dots,0.9,0.91,\dots,0.99\}. We only present the results for n=400n=400, J=6J=6, and u{0.5,0.7,0.9,0.95}u\in\{0.5,0.7,0.9,0.95\}, and the supplement includes extra results and algorithmic details.

To evaluate the performance of RHD for ranking observations and flagging possible outliers, the normalized rankings of the outlier using different depth notions are compared in Figure 4. The magnitude outlier always has the lowest depth using any depth notion, including the proposed RHD with the smallest regularization parameter (u=0.5u=0.5) which results in a depth sensitive to only magnitude but not shape outliers. In contrast, the shape outliers, which are harder to detect, have different rankings depending on the depth notions and the tuning parameter λ\lambda for RHD. The proposed RHD assigns the smallest depth to most of the shape outliers when λ\lambda is large, i.e., u=0.9u=0.9 or u=0.95u=0.95, except for the wiggle outlier when u=0.9u=0.9 and the non-differentiable outlier when either u=0.9u=0.9 or u=0.95u=0.95.  The MBD, ED, and the first order integrated depth, ID1\mathrm{ID}_{1}, do not give the lowest rank to any of the shape outliers, while the second and third order integrated depths, ID2\mathrm{ID}_{2} and ID3\mathrm{ID}_{3}, perform quite well in general except for the scenarios with either jump or peak outliers, which have sharp features only in a small time window.

Refer to caption
Figure 4: RHD-based normalized rankings of the eight outliers considered from different depth methods when n=400n=400. Lower rank is better since it shows that the outlier can be detected. For the proposed RHD, J=6J=6 and u{0.5,0.7,0.9,0.95}u\in\{0.5,0.7,0.9,0.95\} are considered.

4.3 Outlier detection

We next examine the performance of the proposed outlier detection method based on the RHD. The evaluation metrics are chosen to be the proportion pcp_{c} of correctly detected outliers and the proportion pfp_{f} of falsely detected outliers (Sun and Genton, 2011, Arribas-Gil and Romo, 2014, Dai et al., 2020) averaged over 1000 Monte Carlo experiments. Again, only one type of outlier is added in the sample in each simulation scenario, and for brevity, we display only results for the same case n=400n=400 as presented in Section 4.2; see the Supplementary Materials for the complete results and details. We calculate pcp_{c} and pfp_{f} for different factors f{1.5,2.0,2.5,3.0,3.5}f\in\{1.5,2.0,2.5,3.0,3.5\} for functional boxplots and for the univariate boxplot in our proposed outlier detection method, but we do not vary the factor f=1.5f=1.5 in the method by Nagy et al. (2017) since it is set fixed in the R package ddalpha.

Refer to caption
Figure 5: The ROC curves of the proportions of correctly versus falsely detected outliers based on 1000 Monte Carlo iterations. Eight types of outliers are studied, one outlier is included in each dataset and the sample size is n=400n=400.

In Figures 5-6, we present the results via the receiver operating characteristic (ROC) curve by drawing a scatterplot of pcp_{c} versus pfp_{f}, where different points on a curve correspond to different factors ff, and the color and shape of the dots in the lines indicate the outlier detection method used. The truncation level is set to be J=6J=6 and four quantile levels u{0.5,0.7,0.9,0.95}u\in\{0.5,0.7,0.9,0.95\} are considered for the proposed RHD, and MBD, ED, and NGH17 respectively stand for functional boxplots based on either MBD or ED and the method by Nagy et al. (2017), where the latter uses the integrated depths of the first three orders. In these figures, closer to the upper-left corner indicates better performance.

Figure 5 shows the results when we only include one outlier of the types described in Figure 3. First, for the magnitude outlier, our proposed method with smaller regularization parameter (u=0.5u=0.5) performs very well and is comparable with the other existing methods. Second, for the shape outliers, the functional boxplots with either MBD or ED and our outlier detection method with u=0.5u=0.5 (the smallest regularization parameter) are powerless in detecting the outliers. In contrast, under larger regularization u=0.9u=0.9 or u=0.95u=0.95, our proposed method mostly outperforms all the others. If a larger quantile level such as u=0.98u=0.98 is chosen, the correct detection rates pcp_{c} of RHD-based method for non-differentiable outlier become higher and our method is competitive to the method by Nagy et al. (2017) in this setting; these additional results are given in the supplement. The method proposed by Nagy et al. (2017) performs well in terms of correct detection rates pcp_{c} except for the jump and peak outliers, but the false detection rates pfp_{f} are relatively high. In summary, our proposed outlier detection method with larger regularization parameters exhibits high pcp_{c} and low pfp_{f} for all types of shape outliers.

Refer to caption
Figure 6: The ROC curves for different outlier detection methods. In the left panel, four shape outliers (magnitude, jump, wiggle, and linear outliers) are simultaneously included in each sample of size n=200n=200. In the right panel, phase outliers described in Section S2.5 are included. Scales in x-axes are different.

We also show a more practical scenario where different types of outliers are simultaneously included in the sample. Four types of outliers are selected for this numerical study: magnitude, jump, wiggle, and linear outliers. The ROC curve is displayed in the left panel of Figure 6 for sample size n=200n=200. The truncation level of the proposed RHD is again J=6J=6 and the quantile levels for the regularization parameter λ\lambda are u{0.5,0.7,0.9,0.95}u\in\{0.5,0.7,0.9,0.95\}. In this case, the proposed outlier detection method with large regularization u=0.9u=0.9 and u=0.95u=0.95 are superior to the other methods in terms of both pcp_{c} and pfp_{f}.

Inspired by the associate editor’s feedback, we implemented an additional simulation study to evaluate the robustness of our method against phase outliers, which have equal magnitude and shape but different phase variation (cf. Arribas-Gil and Romo (2012), Marron et al. (2015), Xie et al. (2017)). We adopt the third simulation set-up from Xie et al. (2017), with the sample size n=100n=100 and 10% outliers; the detailed data generation process is described in Section S2.5. The right panel of Figure 6 shows the ROC curve for this phase outlier scenario. The results there reveal that our method is superior in detecting phase outliers compared to the other methods. While the functional boxplots with MBD and ED fail to identify phase outliers since all the observations were considered outliers and the method by Nagy et al. (2017) still had fairly high pfp_{f}, our outlier detection method exhibits high pcp_{c} and low pfp_{f}, particularly when a small adjustment factor ff is chosen and uu is large. These findings underscore/emphasize the robustness and good performance of our proposed method in the presence of phase outliers relative to the other methods.

4.4 Median estimation

We explore the performance of our method from a different perspective: depth-based medians. We define the median element for the depth functions under consideration as the curve that attains the maximal depth value. In cases of ties, we use the average of the deepest curves to determine the median. In the simulation set-up outlined in Section 4.1, we added a 10% contaminated sample, with curves generated either as the magnitude and shape outliers depicted in Figure 3 or as the phase outliers described in Section S2.5 of the supplement. For each Monte Carlo iteration, we calculate the depth-based median and its L2L^{2} distance from the center. The average of all iterations is used to to approximate the MSEs of the resulting medians. The detailed description of the simulation procedure can be found in Section S2.6 of the supplement. See Dai and Lopez-Pintado (2023) for a similar simulation study. Tables 1-2 present the MSEs of the depth-based medians in various contamination scenarios. The MSEs of the sample average are also reported for comparison. The smallest MSEs among the RHD-based depths are emphasized in bold, while those among the other depths are italicized.

Table 1 provides the MSE results when the sample with size n=200n=200 is contaminated by 10% of magnitude or shape outliers. In these magnitude/shape contamination scenarios, since the inliers are generated following (9), which is halfspace symmetric about center 00\in\mathbb{H}, we use zero element for the true center. The proposed RHD shows strong performance against most outliers, outperforming our main competitors MBD, ED, and ID1, and showing comparable results with ID2 and ID3. This demonstrates the robustness of the RHD-median in the presence of various types of contamination.

Table 1: The mean squared errors (MSEs) of the depth-based median from different depth functions. The sample size is n=200n=200 with 10% contaminated sample.
Methods u=0.5u=0.5 u=0.7u=0.7 u=0.9u=0.9 u=0.95u=0.95 MBD ED ID1 ID2 ID3 X¯\bar{X}
Pure 4.54 3.94 2.65 2.63 3.68 3.80 3.71 2.02 1.43 9.84
Magnitude 21.83 20.55 21.79 21.54 19.30 19.37 19.44 11.01 5.42 345.88
Jump 5.08 4.83 4.35 3.95 7.93 6.70 8.33 2.37 1.74 50.50
Peak 20.34 19.29 7.02 4.96 13.85 13.72 13.76 8.02 4.31 65.20
Wiggle 5.27 3.71 2.96 2.82 3.98 3.99 3.97 2.33 1.83 26.67
Non-diff 5.41 4.00 2.94 2.74 3.87 4.18 3.78 2.40 1.80 31.33
Linear 5.55 5.41 5.05 4.55 13.15 6.42 14.96 3.23 1.85 37.53
Quadratic 5.36 4.41 3.75 3.62 9.46 4.89 13.22 3.13 2.24 48.60
Cubic 5.46 5.43 5.20 5.16 13.56 6.84 15.48 4.05 2.29 50.52

We conducted an extra median simulation study including contaminated samples with phase variation. We again adopted the third simulation set-up from Xie et al. (2017), and introduced an additional scenario with stronger phase variation. This additional case (denoted as “Strong”) features a stronger phase variation in the outliers than the existing set-up in Xie et al. (2017) (denoted here as “Weak”); see Figure S6 in the supplement for an illustration of curves generated from these strong and weak variations settings. This allows us to compare the robustness of the depth-based medians against changes in the strength of the phase contamination.

Table 2: The mean squared errors (MSEs) of the depth-based median from different depth functions. The sample size is n=100n=100 with 10% contaminated sample in phase variation; see Section S2.5 of the supplement for the data generation.
Methods u=0.5u=0.5 u=0.7u=0.7 u=0.9u=0.9 u=0.95u=0.95 MBD ED ID1 ID2 ID3 X¯\bar{X}
Weak 18.63 31.96 32.06 31.96 17.58 41.59 17.59 18.11 46.49 23.57
Strong 72.83 45.63 38.89 38.68 16.57 39.09 17.49 109.06 163.81 31.1

Table 2 displays the MSEs of the depth-based medians under phase contamination; the detailed data generation procedure and the true center location are given in the supplement. For weak phase variation, the MBD, ID1, and the RHD with u=0.5u=0.5 perform best while the RHD with higher regularization parameters show moderate performance. With strong phase variation, the MBD and ID1 continue to outperform the others, while the RHD with higher regularization parameters maintain moderate performance. Interestingly, the MSEs of medians from ID2 and ID3 are extremely large with strong phase contamination, suggesting that ID2 and ID3 are not robust against phase contamination. In contrast, the RHD-based median with high quantile levels yield similar MSE with moderate size regardless of the strength of phase variation, although they do not exhibit the best performance for either phase variation. Overall, we conclude that each depth-based median has their own relative merits and there is no depth that outperforms all the others in every scenario. However, the RHD-based median performs well in general with less sensitivity to the types of outliers.

5 Data Applications

We apply the RHD-based outlier detection method to real data that exhibit very different characteristics: medfly, world population growth (WPG), and sea surface temperature (SST) datasets. The curves in the medfly example are very dynamic and complex, while the other two datasets consist of smoother and relatively simpler curves that are well-represented by the first few eigenfunctions. For each data example, four regularization parameters λ1<λ2<λ3<λ4\lambda_{1}<\lambda_{2}<\lambda_{3}<\lambda_{4} based on the quantile levels u{0.4,0.6,0.8,0.95}u\in\{0.4,0.6,0.8,0.95\} and five adjustment factors {1.5,2.0,2.5,3.0,3.5}\{1.5,2.0,2.5,3.0,3.5\} are considered. In all illustrated figures, outliers are marked in red, while inliers are denoted as dashed grey curves. The deepest curve is marked in blue for reference; if there are multiple deepest curves, we display their average. In this section, we report only the results from the analysis of the medfly dataset while the other results regarding the WPG and SST datasets are deferred to the supplement.

The medfly dataset contains daily observations of number of eggs laid by female Mediterranean fruit flies (medflies, Ceratitis capitata) described in Carey et al. (1998). We analyze a subset of n=200n=200 egg-laying trajectories in the first 25 days after hatch (see Figure 7), randomly selected from a preprocessed dataset containing 789 medflies available in the R package fdapace.

Due to the complexity of the medfly trajectories, the functional boxplots constructed with either MBD (López-Pintado and Romo, 2009) or ED (Narisetty and Nair, 2016) detect all 200 curves as outliers. Note that MBD and ED are originally defined for smooth functions and might not be appropriate for analyzing very irregular and non-smooth curves. Functional boxplot flags a curve as outlier if it lies outside of the whisker band at any time point, but this does not make sense for this wiggly data because medflies tend to attain very high or low productivity at some random time in their life. In addition, the functional boxplot is hard to read when the curves are often oscillating.

Hence, for this type of data, alternative outlier detection methods are needed to reflect the high variability in the sample. The proposed RHD is defined in the space of square integrable functions, a larger class than the space of smooth functions. Furthermore, our outlier detection method is data-adaptive and measures the outlyingness of each curve depending on the extreme patterns inherent in the samples. Therefore, the proposed outlier detection method based on RHD is adequate to deal with such complex structured curves and in fact performs well as shown next.

Refer to caption
Figure 7: The medfly dataset, showing the outlying curves (red) with IDs 47, 111, and 135 detected by the proposed RHD outlier detection method with the truncation J=8J=8, adjustment factor f=3.5f=3.5, and regularization λ=λ4=0.0507\lambda=\lambda_{4}=0.0507 (from the quantile level u=0.95u=0.95). All 200 data curves are exhibited in the left panel with the outliers represented in red and the deepest trajectory in blue. The three flagged outliers are separately displayed in the three sub-panels on the right.

Figure 7 displays the outlier detection results from the proposed method with the truncation level J=8J=8 and the adjustment factor f=3.5f=3.5; the first eight FPCs (chosen by truncation J=8J=8) can explain 87.61% of total variability in the observed curves while the adjustment factor f=3.5f=3.5 is determined by Algorithm S1 in the supplement. Only the results from λ=λ4=0.0507\lambda=\lambda_{4}=0.0507 are reported here since the method detects no outliers when λ{λ1,λ2,λ3}\lambda\in\{\lambda_{1},\lambda_{2},\lambda_{3}\}. The proposed method flags three outliers denoted by their IDs in the subsample, 47, 111, and 135. To see their shapes more clearly, we display one outlier in each panel (along with the deepest curve) in the right side of Figure 7. The trajectory for the outlying Medfly 47 is more wiggly than the inliers; Medfly 111 appears to be outlying due to its late start of otherwise high fertility; Medfly 135 has a relatively narrow peak of fertility from day 7 to 12 and then stays lower than most inliers thereafter. These results demonstrate that the RHD and the associated outlier detection method are able to handle non-smooth trajectories without presmoothing and reveal shape outliers with distinct patterns.

Supplementary material

The supplement includes technical details, extra simulation studies, and additional results regarding data applications.

R package

The R-package consists of the following two functions: a function that computes the RHD values and implements the proposed outlier detection procedure as described in Algorithms 1-2; a function that selects the adjustment factor as described in Algorithm S1 in the supplement.

References

  • (1)
  • Arribas-Gil and Romo (2012) Arribas-Gil, A. and Romo, J. (2012). Robust depth-based estimation in the time warping model, Biostatistics 13(3): 398–414.
  • Arribas-Gil and Romo (2014) Arribas-Gil, A. and Romo, J. (2014). Shape outlier detection and visualization for functional data: the outliergram, Biostatistics 15(4): 603–619.
  • Caramanis (2000) Caramanis, C. (2000). Uniform glivenko–cantelli classes, Technical Report . https://users.ece.utexas.edu/~cmcaram/pubs/uniform_gl_cant.pdf.
  • Cardot et al. (2013) Cardot, H., Degras, D. and Josserand, E. (2013). Confidence bands for horvitz–thompson estimators using sampled noisy functional data, Bernoulli 19(5A): 2067–2097.
  • Cardot et al. (2007) Cardot, H., Mas, A. and Sarda, P. (2007). Clt in functional linear regression models, Probability Theory and Related Fields 138(3-4): 325–361.
  • Carey et al. (1998) Carey, J. R., Liedo, P., Müller, H.-G., Wang, J.-L. and Chiou, J.-M. (1998). Relationship of age patterns of fecundity to mortality, longevity, and lifetime reproduction in a large cohort of mediterranean fruit fly females, The Journals of Gerontology Series A: Biological Sciences and Medical Sciences 53(4): B245–B251.
  • Chakraborty and Chaudhuri (2014a) Chakraborty, A. and Chaudhuri, P. (2014a). On data depth in infinite dimensional spaces, Annals of the Institute of Statistical Mathematics 66(2): 303–324.
  • Chakraborty and Chaudhuri (2014b) Chakraborty, A. and Chaudhuri, P. (2014b). The spatial distribution in infinite dimensional spaces and related quantiles and depths, The Annals of Statistics 42(3): 1203–1231.
  • Claeskens et al. (2014) Claeskens, G., Hubert, M., Slaets, L. and Vakili, K. (2014). Multivariate functional halfspace depth, Journal of the American Statistical Association 109(505): 411–423.
  • Cuesta-Albertos and Nieto-Reyes (2008) Cuesta-Albertos, J. A. and Nieto-Reyes, A. (2008). The random Tukey depth, Computational Statistics & Data Analysis 52(11): 4979–4988.
  • Dai and Genton (2018) Dai, W. and Genton, M. G. (2018). Multivariate functional data visualization and outlier detection, Journal of Computational and Graphical Statistics 27(4): 923–934.
  • Dai et al. (2020) Dai, W., Mrkvička, T., Sun, Y. and Genton, M. G. (2020). Functional outlier detection and taxonomy by sequential transformations, Computational Statistics & Data Analysis 149: 106960.
  • Dai and Lopez-Pintado (2023) Dai, X. and Lopez-Pintado, S. (2023). Tukey’s depth for object data, Journal of the American Statistical Association 118(543): 1760–1772.
  • Delaigle and Hall (2010) Delaigle, A. and Hall, P. (2010). Defining probability density for a distribution of random functions, The Annals of Statistics pp. 1171–1193.
  • Donoho and Gasko (1992) Donoho, D. L. and Gasko, M. (1992). Breakdown properties of location estimates based on halfspace depth and projected outlyingness, The Annals of Statistics 20: 1803–1827.
  • Dutta et al. (2011) Dutta, S., Ghosh, A. K. and Chaudhuri, P. (2011). Some intriguing properties of Tukey’s half-space depth, Bernoulli 17(4): 1420–1434.
  • Dyckerhoff and Mozharovskyi (2016) Dyckerhoff, R. and Mozharovskyi, P. (2016). Exact computation of the halfspace depth, Computational Statistics & Data Analysis 98: 19–30.
  • Ferraty and Nagy (2022) Ferraty, F. and Nagy, S. (2022). Scalar-on-function local linear regression and beyond, Biometrika 109(2): 439–455.
  • Gabrys et al. (2010) Gabrys, R., Horváth, L. and Kokoszka, P. (2010). Tests for error correlation in the functional linear model, Journal of the American Statistical Association 105(491): 1113–1125.
  • Gabrys and Kokoszka (2007) Gabrys, R. and Kokoszka, P. (2007). Portmanteau test of independence for functional observations, Journal of the American Statistical Association 102(480): 1338–1348.
  • Ghosh and Chaudhuri (2005) Ghosh, A. K. and Chaudhuri, P. (2005). On data depth and distribution-free discriminant analysis using separating surfaces, Bernoulli 11(1): 1–27.
  • Gijbels and Nagy (2017) Gijbels, I. and Nagy, S. (2017). On a general definition of depth for functional data, Statistical Science 32(4): 630–639.
  • Hall et al. (2007) Hall, P., Horowitz, J. L. et al. (2007). Methodology and convergence rates for functional linear regression, Annals of Statistics 35(1): 70–91.
  • Hsing and Eubank (2015) Hsing, T. and Eubank, R. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators, Vol. 997, John Wiley & Sons.
  • Hubert et al. (2015) Hubert, M., Rousseeuw, P. J. and Segaert, P. (2015). Multivariate functional outlier detection, Statistical Methods & Applications 24(2): 177–202.
  • Hubert et al. (2017) Hubert, M., Rousseeuw, P. and Segaert, P. (2017). Multivariate and functional classification using depth and distance, Advances in Data Analysis and Classification 11(3): 445–466.
  • Hyndman and Shang (2010) Hyndman, R. J. and Shang, H. L. (2010). Rainbow plots, bagplots, and boxplots for functional data, Journal of Computational and Graphical Statistics 19(1): 29–45.
  • Kokoszka and Reimherr (2017) Kokoszka, P. and Reimherr, M. (2017). Introduction to functional data analysis, Chapman and Hall/CRC.
  • Kuelbs and Zinn (2013) Kuelbs, J. and Zinn, J. (2013). Concerns with functional depth, Latin American Journal of Probability and Mathematical Statistics .
  • Li et al. (2012) Li, J., Cuesta-Albertos, J. A. and Liu, R. Y. (2012). Dd-classifier: Nonparametric classification procedure based on dd-plot, Journal of the American statistical association 107(498): 737–753.
  • Liebl and Reimherr (2023) Liebl, D. and Reimherr, M. (2023). Fast and fair simultaneous confidence bands for functional parameters, Journal of the Royal Statistical Society Series B: Statistical Methodology 85(3): 842–868.
  • Lin and Lin (2021) Lin, Y. and Lin, Z. (2021). A unified approach to hypothesis testing for functional linear models, arXiv preprint arXiv:2109.02309 .
  • Liu (1990) Liu, R. Y. (1990). On a notion of data depth based on random simplices, The Annals of Statistics pp. 405–414.
  • Liu (1992) Liu, R. Y. (1992). Data depth and multivariate rank tests, L1-statistical analysis and related methods pp. 279–294.
  • Liu and Singh (1993) Liu, R. Y. and Singh, K. (1993). A quality index based on data depth and multivariate rank tests, Journal of the American Statistical Association 88(421): 252–260.
  • López-Pintado and Romo (2006) López-Pintado, S. and Romo, J. (2006). Depth-based classification for functional data, DIMACS Series in Discrete Mathematics and Theoretical Computer Science 72: 103.
  • López-Pintado and Romo (2009) López-Pintado, S. and Romo, J. (2009). On the concept of depth for functional data, Journal of the American statistical Association 104(486): 718–734.
  • López-Pintado et al. (2014) López-Pintado, S., Sun, Y., Lin, J. K. and Genton, M. G. (2014). Simplicial band depth for multivariate functional data, Advances in Data Analysis and Classification 8(3): 321–338.
  • MacCluer (2009) MacCluer, B. (2009). Elementary Functional Analysis, Vol. 253, Springer Science & Business Media.
  • Marron et al. (2015) Marron, J. S., Ramsay, J. O., Sangalli, L. M. and Srivastava, A. (2015). Functional data analysis of amplitude and phase variation, Statistical Science pp. 468–484.
  • Nagy et al. (2017) Nagy, S., Gijbels, I. and Hlubinka, D. (2017). Depth-based recognition of shape outlying functions, Journal of Computational and Graphical Statistics 26(4): 883–893.
  • Narisetty and Nair (2016) Narisetty, N. N. and Nair, V. N. (2016). Extremal depth for functional data and applications, Journal of the American Statistical Association 111(516): 1705–1714.
  • Nieto-Reyes and Battey (2016) Nieto-Reyes, A. and Battey, H. (2016). A topologically valid definition of depth for functional data, Statistical Science 31(1): 61–79.
  • Paindaveine and Van Bever (2013) Paindaveine, D. and Van Bever, G. (2013). From depth to local depth: a focus on centrality, Journal of the American Statistical Association 108(503): 1105–1119.
  • Rousseeuw et al. (1999) Rousseeuw, P. J., Ruts, I. and Tukey, J. W. (1999). The bagplot: a bivariate boxplot, The American Statistician 53(4): 382–387.
  • Sguera and López-Pintado (2021) Sguera, C. and López-Pintado, S. (2021). A notion of depth for sparse functional data, Test 30(3): 630–649.
  • Sun and Genton (2011) Sun, Y. and Genton, M. G. (2011). Functional boxplots, Journal of Computational and Graphical Statistics 20(2): 316–334.
  • Sun and Genton (2012) Sun, Y. and Genton, M. G. (2012). Adjusted functional boxplots for spatio-temporal data visualization and outlier detection, Environmetrics 23(1): 54–64.
  • Tukey (1975) Tukey, J. W. (1975). Mathematics and the picturing of data, Proceedings of the International Congress of Mathematicians, Vancouver, 1975, Vol. 2, pp. 523–531.
  • Tukey (1977) Tukey, J. W. (1977). Exploratory Data Analysis, Addison-Wesley: Reading, MA.
  • Väisälä (2003) Väisälä, J. (2003). A proof of the Mazur-Ulam theorem, The American mathematical monthly 110(7): 633–635.
  • van der Vaart and Wellner (1996) van der Vaart, A. and Wellner, J. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics, Springer Series in Statistics, Springer.
  • Wahba (1973) Wahba, G. (1973). Convergence rates of certain approximate solutions to Fredholm integral equations of the first kind, Journal of Approximation Theory 7(2): 167–185.
  • Xie et al. (2017) Xie, W., Kurtek, S., Bharath, K. and Sun, Y. (2017). A geometric approach to visualization of variability in functional data, Journal of the American Statistical Association 112(519): 979–993.
  • Yao and Müller (2010) Yao, F. and Müller, H.-G. (2010). Functional quadratic regression, Biometrika 97(1): 49–64.
  • Yeh and Singh (1997) Yeh, A. B. and Singh, K. (1997). Balanced confidence regions based on Tukey’s depth and the bootstrap, Journal of the Royal Statistical Society: Series B 59(3): 639–652.
  • Zuo and Serfling (2000) Zuo, Y. and Serfling, R. (2000). General notions of statistical depth function, The Annals of Statistics 28: 461–482.