This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Identifiability and inference for copula-based semiparametric models for random vectors with arbitrary marginal distributions

Bouchra R. Nasri Département de médecine sociale et préventive, École de santé publique, Université de Montréal, C.P. 6128, succursale Centre-ville Montréal (Québec) H3C 3J7 [email protected]  and  Bruno N. Rémillard GERAD and Department of Decision Sciences, HEC Montréal
3000, chemin de la Côte-Sainte-Catherine, Montréal (Québec), Canada H3T 2A7
[email protected]
Abstract.

In this paper, we study the identifiability and the estimation of the parameters of a copula-based multivariate model when the margins are unknown and are arbitrary, meaning that they can be continuous, discrete, or mixtures of continuous and discrete. When at least one margin is not continuous, the range of values determining the copula is not the entire unit square and this situation could lead to identifiability issues that are discussed here. Next, we propose estimation methods when the margins are unknown and arbitrary, using pseudo log-likelihood adapted to the case of discontinuities. In view of applications to large data sets, we also propose a pairwise composite pseudo log-likelihood. These methodologies can also be easily modified to cover the case of parametric margins. One of the main theoretical result is an extension to arbitrary distributions of known convergence results of rank-based statistics when the margins are continuous. As a by-product, under smoothness assumptions, we obtain that the asymptotic distribution of the estimation errors of our estimators are Gaussian. Finally, numerical experiments are presented to assess the finite sample performance of the estimators, and the usefulness of the proposed methodologies is illustrated with a copula-based regression model for hydrological data. The proposed estimation is implemented in the R package CopulaInference (Nasri and Rémillard,, 2023), together with a function for checking identifiability.

Key words and phrases:
Copula; Pseudo-observations; Estimation; Identifiability; Arbitrary distributions
Funding in partial support of this work was provided by the Fonds québécois de la recherche en santé and the Natural Sciences and Engineering Research Council of Canada.

1. Introduction

Copula-based models have been widely used in many applied fields such as finance (Embrechts et al.,, 2002, Nasri et al.,, 2020), hydrology (Genest and Favre,, 2007, Zhang and Singh,, 2019) and medicine (Clayton,, 1978, de Leon and Wu,, 2011), to cite a few. According to Sklar’s representation (Sklar,, 1959), for a random vector 𝐗=(X1,,Xd)\mathbf{X}=(X_{1},\ldots,X_{d}) with joint distribution function HH and margins F1,,FdF_{1},\ldots,F_{d}, there exists a non-empty set 𝒮H\mathcal{S}_{H} of copula functions CC so that for any x1,,xdx_{1},\ldots,x_{d},

H(x1,,xd)=C{F1(x1),,Fd(xd)}, for any C𝒮H.H(x_{1},\ldots,x_{d})=C\{F_{1}(x_{1}),\ldots,F_{d}(x_{d})\},\quad\text{ for any }C\in\mathcal{S}_{H}.

If all margins are continuous, then 𝒮H\mathcal{S}_{H} contains a unique copula which is the distribution function of the random vector 𝐔=(U1,,Ud)\mathbf{U}=(U_{1},\ldots,U_{d}), with 𝐔j=Fj(Xj)\mathbf{U}_{j}=F_{j}(X_{j}), j{1,,d}j\in\{1,\ldots,d\}. However, in many applications, discontinuities are often present in one or several margins. Whenever at least one margin is discontinuous, 𝒮H\mathcal{S}_{H} is infinite, and in this case, any C𝒮HC\in\mathcal{S}_{H} is only uniquely defined on the closure of the range 𝐅=1×d\mathcal{R}_{\mathbf{F}}=\mathcal{R}_{\mathcal{F}_{1}}\times\cdots\mathcal{R}_{\mathcal{F}_{d}} of 𝐅=(1,,d)\mathbf{F}=(\mathcal{F}_{1},\ldots,\mathcal{F}_{d}), where j={Fj(y):y}\mathcal{R}_{\mathcal{F}_{j}}=\{F_{j}(y):\;y\in\mathbb{R}\} is the range of FjF_{j}, j{1,,d}j\in\{1,\ldots,d\}. This can lead to identifiability issues raised in Faugeras, (2017), Geenens, (2020, 2021), creating also estimation problems that need to be addressed. However, even if the copula is not unique, it still makes sense to use a copula family {C𝜽:𝜽𝒫}\{C_{\boldsymbol{\theta}}:{\boldsymbol{\theta}}\in\mathcal{P}\}, to define multivariate models, provided one is aware of the possible limitations. Indeed, the copula-based model 𝒦𝜽(𝐱)=C𝜽{F1(x1),,Fd(xd)}\mathcal{K}_{\boldsymbol{\theta}}(\mathbf{x})=C_{\boldsymbol{\theta}}\{F_{1}(x_{1}),\ldots,F_{d}(x_{d})\}, 𝜽𝒫{\boldsymbol{\theta}}\in\mathcal{P}, is a well-defined family of distributions for which estimating 𝜽{\boldsymbol{\theta}} is a challenge.

In the literature, the case of discontinuous margins is not always treated properly. It is either ignored or, in some cases, continuous margins are fitted to the data (Chen et al.,, 2013). This procedure does not solve the problem since there still will be ties. An explicit example that underlines the problem with ignoring ties is given in Section 3 and Remark 2. A solution proposed in the literature is to use jittering, where data are perturbed by independent random variables, introducing extra variability. Our first aim is to address the identifiability issues for the model {𝒦𝜽:𝜽𝒫}\{\mathcal{K}_{\boldsymbol{\theta}}:{\boldsymbol{\theta}}\in\mathcal{P}\}, when the margins are unknown and arbitrary. Our second aim is to present formal inference methods for the estimation of 𝜽{\boldsymbol{\theta}}. More precisely, we consider a semiparametric approach for the estimation of 𝜽{\boldsymbol{\theta}}, when the margins are arbitrary, i.e. each margin can be continuous, discrete, or even a mixture of a discrete and continuous distribution. The estimation approach is based on pseudo log-likelihood taking into account the discontinuities. We also propose a pairwise composite log-likelihood. In the literature, few articles focused on the estimation of the copula parameters in the case of noncontinuous margins (Song et al.,, 2009, de Leon and Wu,, 2011, Zilko and Kurowicka,, 2016, Ery,, 2016, Li et al.,, 2020). Most of them have considered only the case when the components of the copula are discrete or continuous and a full parametric model, except Ery, (2016) and Li et al., (2020). In Ery, (2016), in the bivariate discrete case, the author considered a semiparametric approach and studied the asymptotic properties. In Li et al., (2020), the authors have proposed a semiparametric approach for the estimation of the copula parameters for bivariate data having arbitrary distributions, without presenting any asymptotic results, neither discussing identifiability issues.

The article is organized as follows: In Section 2, we discuss the important topic of identifiability, as well as the limitations of the copula-based approach for multivariate data with arbitrary margins. Conditions are stated in order to have identifiability so that the estimation problem is well-posed. Next, in Section 3, we describe the estimation methodology, using the pseudo log-likelihood approach as well as the composite pairwise approach, while the estimation error is studied in Section 3.3. By using an extension of the asymptotic behavior of rank-based statistics (Ruymgaart et al.,, 1972) to data with arbitrary distributions (Theorem 2), we show that the limiting distribution of the pseudo log-likelihood estimator is Gaussian. Similar results are obtained for pairwise composite pseudo log-likelihood. In addition, we can obtain similar asymptotic results if the margins are estimated from parametric families, instead of using the empirical margins, i.e., we consider the multivariate model 𝒦𝜽,𝜸1,,𝜸d(x1,,xd)=C𝜽{F1,𝜸1(x1),,Fd,𝜸d(xd)}\mathcal{K}_{{\boldsymbol{\theta}},{\boldsymbol{\gamma}}_{1},\ldots,{\boldsymbol{\gamma}}_{d}}(x_{1},\ldots,x_{d})=C_{\boldsymbol{\theta}}\{F_{1,{\boldsymbol{\gamma}}_{1}}(x_{1}),\ldots,F_{d,{\boldsymbol{\gamma}}_{d}}(x_{d})\}, where the margins are estimated first, and then the copula parameter 𝜽{\boldsymbol{\theta}}. This is discussed in Remark 5. Finally, in Section 4, numerical experiments are performed to assess the convergence and precision of the proposed estimators in bivariate and trivariate settings, while in Section 5, as an example of application, a copula-based regression approach is proposed to investigate the relationship between the duration and severity, using hydrological data (Shiau,, 2006).

2. Identifiability and limitations

As exemplified in Faugeras, (2017), Geenens, (2020), there are several problems and limitations when applying copula-based models to data with arbitrary distributions, the most important being identifiability. This is discussed first. Then, we will examine some limitations of these models.

2.1. Identifiability

For the discussion about identifiability, we consider two cases: copula family identifiability, and copula parameter identifiability. For copula family identifiability, it may happen that two copulas from different families, say C𝜽C_{\boldsymbol{\theta}} and D𝝍D_{\boldsymbol{\psi}}, belong to 𝒮H\mathcal{S}_{H}. For example, in Faugeras, (2017), the author considered the bivariate Bernoulli case, whose distribution is completely determined by h(0,0)=P(X1=0,X2=0)h(0,0)=P(X_{1}=0,X_{2}=0), p1=P(X1=0)p_{1}=P(X_{1}=0), and p2=P(X2=0)p_{2}=P(X_{2}=0). Provided that the copula families CθC_{\theta} and DψD_{\psi} are rich enough, there exist unique parameters θ0\theta_{0} and ψ0\psi_{0} so that h(0,0)=Cθ0(p1,p2)=Dψ0(p1,p2)h(0,0)=C_{\theta_{0}}(p_{1},p_{2})=D_{\psi_{0}}(p_{1},p_{2}). For calculation purposes, the choice of the copula in this case is immaterial, and there is no possible interpretation of the copula or its parameter or the type of dependence induced by each copula. All that matters is h(0,0)h(0,0), or the associated odd ratio (Geenens,, 2020), or Kendall’s tau of CC^{\maltese}, given in this case by τ=2{h(0,0)p1p2}\tau^{\maltese}=2\{h(0,0)-p_{1}p_{2}\}. Here, CC^{\maltese} is the so-called multilinear copula, depending on the margins and belonging to 𝒮H\mathcal{S}_{H}; see, e.g., Genest et al., (2017). To illustrate the fact that the computations are the same for any C𝒮HC\in\mathcal{S}_{H}, consider the conditional distribution P(X2x2|X1=x1)P(X_{2}\leq x_{2}|X_{1}=x_{1}) of X2X_{2} given X1=x1X_{1}=x_{1}, given by uC(u,v)|u=F1(x1),v=F2(x2)\left.\dfrac{\partial}{\partial_{u}}C(u,v)\right|_{u=F_{1}(x_{1}),v=F_{2}(x_{2})}, if F1F_{1} is continuous at x1x_{1} and C{F1(x1),F2(x2)}C{F1(x1),F2(x2)}F1(x1)F1(x1)\dfrac{C\{F_{1}(x_{1}),F_{2}(x_{2})\}-C\{F_{1}(x_{1}-),F_{2}(x_{2})\}}{F_{1}(x_{1})-F_{1}(x_{1}-)}, if F1F_{1} is not continuous at x1x_{1}, where F1(x1)=P(X1<x1)F_{1}(x_{1}-)=P(X_{1}<x_{1}). The value of P(X2x2|X1=x1)P(X_{2}\leq x_{2}|X_{1}=x_{1}) is independent of the choice of C𝒮HC\in\mathcal{S}_{H}, since all copulas in 𝒮H\mathcal{S}_{H} have the same value on the closure of the range 𝐅\mathcal{R}_{\mathbf{F}}. So apart from the lack of interpretation of the type of dependence, there is no problem for calculations, as long as for the chosen copula family, its parameter is identifiable, as defined next.

We now consider the identifiability of the copula parameter for a given copula family {C𝜽:𝜽𝒫}\{C_{\boldsymbol{\theta}}:{\boldsymbol{\theta}}\in\mathcal{P}\}. Since one of the aims is to estimate the parameter of the family {𝒦𝜽=C𝜽𝐅:𝜽𝒫}\{\mathcal{K}_{\boldsymbol{\theta}}=C_{\boldsymbol{\theta}}\circ\mathbf{F}:{\boldsymbol{\theta}}\in\mathcal{P}\}, the following definition of identifiability is needed.

Definition 1.

For a copula family {C𝜽:𝜽𝒫}\{C_{\boldsymbol{\theta}}:{\boldsymbol{\theta}}\in\mathcal{P}\} and a vector of margins 𝐅=(F1,,Fd)\mathbf{F}=(F_{1},\ldots,F_{d}), the parameter 𝜽{\boldsymbol{\theta}} is identifiable with respect to 𝐅\mathbf{F} if the mapping 𝜽𝒦𝜽=C𝜽𝐅{\boldsymbol{\theta}}\mapsto\mathcal{K}_{\boldsymbol{\theta}}=C_{\boldsymbol{\theta}}\circ\mathbf{F} is injective, i.e., if for 𝜽1,𝜽2𝒫{\boldsymbol{\theta}}_{1},{\boldsymbol{\theta}}_{2}\in\mathcal{P}, 𝜽1𝜽2{\boldsymbol{\theta}}_{1}\neq{\boldsymbol{\theta}}_{2} implies that 𝒦𝜽1𝒦𝜽2\mathcal{K}_{{\boldsymbol{\theta}}_{1}}\neq\mathcal{K}_{{\boldsymbol{\theta}}_{2}}. This is equivalent to the existence of 𝐮𝐅\mathbf{u}\in\mathcal{R}_{\mathbf{F}}^{*} such that C𝜽1(𝐮)C𝜽2(𝐮)C_{{\boldsymbol{\theta}}_{1}}(\mathbf{u})\neq C_{{\boldsymbol{\theta}}_{2}}(\mathbf{u}), where for a vector of margins 𝐆\mathbf{G},

𝐆={𝐮𝐆(0,1]d:uj<1 for at least two indices j}.\mathcal{R}_{\mathbf{G}}^{*}=\left\{\mathbf{u}\in\mathcal{R}_{\mathbf{G}}\cap(0,1]^{d}:u_{j}<1\text{ for at least two indices }j\right\}.

The following result, proven in the Supplementary Material, is essential for checking that a given mapping is injective whenever 𝐅\mathcal{R}_{\mathbf{F}}^{*} is finite.

Proposition 1.

Suppose 𝐓\mathbf{T} is a continuously differentiable mapping from a convex open set OpO\subset\mathbb{R}^{p} to q\mathbb{R}^{q}. If p>qp>q, then 𝐓\mathbf{T} is not injective. Also, 𝐓\mathbf{T} is injective if the rank of the derivative 𝐓\mathbf{T}^{\prime} is pqp\leq q. Furthermore, if the rank of 𝐓(𝛉0)\mathbf{T}^{\prime}({\boldsymbol{\theta}}_{0}) is pp for some 𝛉0O{\boldsymbol{\theta}}_{0}\in O, the rank of 𝐓(𝛉)\mathbf{T}^{\prime}({\boldsymbol{\theta}}) is pp for any 𝛉{\boldsymbol{\theta}} in some neighborhood of 𝛉0{\boldsymbol{\theta}}_{0}. Finally, if the maximal rank is r<pr<p, attained at some 𝛉0O{\boldsymbol{\theta}}_{0}\in O, then the rank of 𝐓\mathbf{T}^{\prime} is rr in some neighborhood of 𝛉0{\boldsymbol{\theta}}_{0}, and TT is not injective.

Example 1.

A 1-dimensional parameter is identifiable for any 𝐅\mathbf{F} for the following copula families and their rotations: the bivariate Gaussian copula, the Plackett copula, as well as the multivariate Archimedean copulas with one parameter like the Clayton, Frank, Gumbel, Joe. This is because for any fixed 𝐮(0,1)d\mathbf{u}\in(0,1)^{d}, 𝜽C𝜽(𝐮){\boldsymbol{\theta}}\mapsto C_{\boldsymbol{\theta}}(\mathbf{u}) is strictly monotonic. Note also that by (Joe,, 1990, Theorem A2), every meta-elliptic copula C𝝆C_{\boldsymbol{\rho}}, with 𝝆{\boldsymbol{\rho}} the correlation matrix parameter, is ordered with respect to ρij\rho_{ij}.

In practice, for copula models with bivariate or multivariate parameters, it might be impossibly difficult to verify injectivity since the margins are never known. In this case, since 𝐅n=(Fn1,,Fnd)\mathbf{F}_{n}=(F_{n1},\ldots,F_{nd}) converges to 𝐅\mathbf{F}, where Fnj(y)=1n+1i=1n𝕀(Xijy)\displaystyle F_{nj}(y)=\dfrac{1}{n+1}\sum_{i=1}^{n}\mathbb{I}(X_{ij}\leq y), yy\in\mathbb{R}, j{1,,d}j\in\{1,\ldots,d\}, one could verify that the parameter is identifiable with respect to 𝐅n\mathbf{F}_{n}, i.e., that the mapping 𝜽C𝜽𝐅n{\boldsymbol{\theta}}\mapsto C_{\boldsymbol{\theta}}\circ\mathbf{F}_{n} is injective. The latter does not necessarily imply identifiability with respect to 𝐅\mathbf{F} but if the parameter is not identifiable with respect to 𝐅n\mathbf{F}_{n}, one should choose another parametric family C𝜽C_{\boldsymbol{\theta}}. Next, the cardinality of 𝐅n\mathcal{R}_{\mathbf{F}_{n}}^{*} is qn=mn1××mndj=1dmnj+d1q_{n}=m_{n1}\times\cdots\times m_{nd}-\sum_{j=1}^{d}m_{nj}+d-1, where mnjm_{nj} is the size of the support of FnjF_{nj}, j{1,,d}j\in\{1,\ldots,d\}. Let {𝐮i:1iqn}\{\mathbf{u}_{i}:1\leq i\leq q_{n}\} be an enumeration of 𝐅n\mathcal{R}_{\mathbf{F}_{n}}^{*}. As a result, 𝜽C𝜽𝐅n{\boldsymbol{\theta}}\mapsto C_{\boldsymbol{\theta}}\circ\mathbf{F}_{n} is injective iff the mapping 𝜽Tn(𝜽)={C𝜽(𝐮1),,C𝜽(𝐮qn)}{\boldsymbol{\theta}}\mapsto T_{n}({\boldsymbol{\theta}})=\{C_{\boldsymbol{\theta}}(\mathbf{u}_{1}),\ldots,C_{\boldsymbol{\theta}}(\mathbf{u}_{q_{n}})\}^{\top} is injective, There are two cases to consider: p>qnp>q_{n} or pqnp\leq q_{n}.

  • First, one should not choose a copula family with p>qnp>q_{n}, since in this case, according to Proposition 1, the mapping 𝐓n\mathbf{T}_{n} cannot be injective, so the parameter is not identifiable.

  • Next, if pqnp\leq q_{n}, it follows from Proposition 1, that TnT_{n} is injective if 𝐓n\mathbf{T}_{n}^{\prime} has rank pp. Also, it follows from Proposition 1 that if the rank of 𝐓n(𝜽)\mathbf{T}_{n}^{\prime}({\boldsymbol{\theta}}) is pp, then there is a neighborhood 𝒩𝒫\mathcal{N}\subset\mathcal{P} of 𝜽{\boldsymbol{\theta}} for which the rank of 𝐓n(𝜽~)\mathbf{T}_{n}^{\prime}(\tilde{\boldsymbol{\theta}}) is pp, for any 𝜽~𝒩\tilde{\boldsymbol{\theta}}\in\mathcal{N}. One could then restrict the estimation to this neighborhood if necessary. Note that the matrix 𝐓n(𝜽)\mathbf{T}_{n}^{\prime}({\boldsymbol{\theta}}) is composed of the (column) gradient vectors C˙𝜽(uj)\dot{C}_{\boldsymbol{\theta}}(u_{j}), j{1,,qn}j\in\{1,\ldots,q_{n}\}. The latter can be calculated explicitly or it can be approximated numerically. To check that the rank of TT^{\prime} is pp, one needs to choose a bounded neighborhood 𝒩\mathcal{N} for 𝜽{\boldsymbol{\theta}}, choose an appropriate covering of 𝒩\mathcal{N} by balls of radius δ\delta, with respect to an appropriate metric, corresponding to a given accuracy δ\delta needed for the estimation, and then compute the rank of the mappings Tn(𝜽)T_{n}^{\prime}({\boldsymbol{\theta}}), for all the centers 𝜽{\boldsymbol{\theta}} of the covering balls. This appears in the R package CopulaInference (Nasri and Rémillard,, 2023).

Example 2.

In the bivariate Bernoulli case, q=qn=1q=q_{n}=1, so a copula parameter is identifiable if for any fixed 𝐮(0,1)2\mathbf{u}\in(0,1)^{2}, the mapping 𝜽C𝜽(𝐮){\boldsymbol{\theta}}\mapsto C_{\boldsymbol{\theta}}(\mathbf{u}) is injective. As a result of the previous discussion, p=1p=1, i.e., the parameter should be 1-dimensional and for any fixed 𝐮(0,1)2\mathbf{u}\in(0,1)^{2}, the mapping θCθ(𝐮)\theta\mapsto C_{\theta}(\mathbf{u}) must be strictly monotonic since θCθ(𝐮)\partial_{\theta}C_{\theta}(\mathbf{u}) must be always positive or always negative.

Remark 1 (Student copula).

There are cases when p<qnp<q_{n} is needed. For example, for the bivariate Student copula, qn3>p=2q_{n}\geq 3>p=2 is sometimes necessary. To see why, note that the point 𝐮=(1/2,1/2)\mathbf{u}=(1/2,1/2) determines ρ\rho uniquely since for any ν>0\nu>0, Cρ,ν(12,12)=14+12πarcsinρC_{\rho,\nu}\left(\dfrac{1}{2},\dfrac{1}{2}\right)=\dfrac{1}{4}+\dfrac{1}{2\pi}\arcsin{\rho}. Take for example ρ=0.3\rho=0.3. Then C0.3,ν(0.75,0.55)=0.452C_{0.3,\nu}(0.75,0.55)=0.452 is not injective, having 22 solutions, namely ν=0.224965\nu=0.224965 and ν=0.79944\nu=0.79944, so the rank of T(ρ,ν)T^{\prime}(\rho,\nu) is 1<p1<p at points 𝐮1=(0.5,0.5)\mathbf{u}_{1}=(0.5,0.5) and 𝐮2=(0.75,0.55)\mathbf{u}_{2}=(0.75,0.55). However, if ν\nu is restricted to a finite set like {k: 1k50}\{k:\;1\leq k\leq 50\}, then the mapping is injective. This restriction of the values of ν\nu makes sense if one wants to use the Student copula. In fact, since exact values of Cρ,νC_{\rho,\nu} can be computed explicitly only when ν\nu is an integer (Dunnett and Sobel,, 1954). It is the case for example in the R package copula. Otherwise, one needs to use numerical integration (Genz,, 2004), with might lead to numerical inconsistencies while differentiating the copula with respect to ν\nu in a given open interval.

2.2. Limitations

Having discussed identifiability issues, we are now in a position to discuss limitations of the copula-based approach for modeling multivariate data. Two main issues have been identified when one or some margins have discontinuities: interpretation and dependence on margins.

2.2.1. Interpretation

As exemplified in the extreme case of the bivariate Bernoulli distribution discussed in Example 2, interpretation of the copula or its parameter can be hopeless. Recall that the object of interest is the family {𝒦𝜽:𝜽𝒫}\{\mathcal{K}_{\boldsymbol{\theta}}:{\boldsymbol{\theta}}\in\mathcal{P}\}, not the copula family. Even if X1X_{1} has only one atom at 0 and X2X_{2} is continuous, then one only “knows” C𝜽C_{\boldsymbol{\theta}} on ¯={[0,F1(0)][F1(0),1]}×[0,1]\bar{}\mathcal{R}=\{[0,F_{1}(0-)]\cup[F_{1}(0),1]\}\times[0,1]. For example, how can we interpret the form of dependence induced by a Gaussian copula or a Student copula restricted to such ¯\bar{}\mathcal{R}? Apart from tail dependence, there is not much one can say. As it is done in the case for bivariate copula, one could plot the graph of a large sample, say n=1000n=1000. The scatter plot is not very informative. This is even worse if the support of HH is finite. As an illustration, we generated 1000 pairs of observations from Gaussian, Student(2.5), Clayton, and Gumbel copulas with τ=0.7\tau=0.7, where the margin F1F_{1} is a mixture of a Gaussian and an atom at 0 (with probability 0.10.1), and F2F_{2} is Gaussian. The scatter plots are given in Figure 1, together with the scatter plots of the associated pseudo-observations (Fn1(Xi1),Fn2(Xi2))\left(F_{n1}(X_{i1}),F_{n2}(X_{i2})\right). The raw datasets in panel a) of Figure 1 do not say much, apart from the fact that X1X_{1} has an atom at 0, while pabel b) of Figure 1 illustrates the fact that the copula is only known on ¯\bar{}\mathcal{R}. All the graphs are similar but for the Clayton copula. So one can see that even if there is only one atom, one cannot really interpret the type of dependence.

a) Refer to caption b) Refer to caption

Figure 1. Panel a): scatter plots of 1000 pairs from Gaussian, Student(2.5), Clayton, and Gumbel copulas with τ=0.7\tau=0.7, where F1F_{1} is a mixture of a Gaussian (with probability 0.90.9) and a Dirac at 0, and F2F_{2} is Gaussian. Panel b): scatter plots of the associated pseudo-observations.

2.2.2. Dependence on margins

In the bivariate Bernoulli case, Geenens, (2020) proposed the odds-ratio ω\omega as a “margin-free” measure of dependence. In fact, Geenens, (2020) quotes Item 3 of (Rudas,, 2018, Theorem 6.3 (p. 116)) which says that if 𝜽1=(p1,p2){\boldsymbol{\theta}}_{1}=(p_{1},p_{2}) are the margins’ parameters and θ\theta is a parameter, whose range does not depend on 𝜽1{\boldsymbol{\theta}}_{1} (called variation independent parametrization in Rudas, (2018)), and if (𝜽1,θ)({\boldsymbol{\theta}}_{1},\theta) determines the full distribution HH, then θ\theta is a one-to-one function of the odd-ratio ω\omega. However, as a proof of this statement, Rudas, (2018) says that it is because (𝜽1,ω)({\boldsymbol{\theta}}_{1},\omega) determines HH. This only proves that θ\theta is a one-to-one function of ω\omega, for a fixed 𝜽1{\boldsymbol{\theta}}_{1}, i.e., for fixed margins, not that it is a one-to-one function of ω\omega alone. In fact, according to Rudas’ definition, for the Gaussian copula CρC_{\rho}, ρ[1,1]\rho\in[-1,1], it follows that (𝜽1,ρ)({\boldsymbol{\theta}}_{1},\rho) is also a variation independent parametrization of HH, by simply setting H(p1,p2)=Cρ(p1,p2)H(p_{1},p_{2})=C_{\rho}(p_{1},p_{2}). Therefore, ρ\rho qualifies as a “margin-free” association measure as well, if one agrees that margin-free means “variation independent in the sense of Rudas, (2018)”. Using the same argument as in Rudas, (2018), it follows that any variation independent parameter is a function of ρ\rho. This construction works for any rich enough copula family such as Clayton, Frank, or Plackett (yielding the omega ratio), and that it also applies to all margins, not only Bernoulli margins. In fact, the property of being one-to-one is the same as what we defined as “parameter identifiability” in Section 2.1.

3. Estimation of parameters

In this section, we will show how to estimate the parameter 𝜽{\boldsymbol{\theta}} of the semiparametric model {𝒦𝜽=C𝜽F:𝜽𝒫}\{\mathcal{K}_{\boldsymbol{\theta}}=C_{\boldsymbol{\theta}}\circ F:{\boldsymbol{\theta}}\in\mathcal{P}\}, where 𝒫p\mathcal{P}\subset\mathbb{R}^{p} is convex. It is assumed that the given copula family {C𝜽:𝜽𝒫}\{C_{\boldsymbol{\theta}}:{\boldsymbol{\theta}}\in\mathcal{P}\} satisfies the following assumption.

Assumption 1.

𝜽C𝜽{\boldsymbol{\theta}}\mapsto C_{\boldsymbol{\theta}} on 𝒫\mathcal{P} is thrice continuously differentiable with respect to 𝛉{\boldsymbol{\theta}}, for any 𝛉𝒫{\boldsymbol{\theta}}\in\mathcal{P}, the density c𝛉c_{\boldsymbol{\theta}} exists, is thrice continuously differentiable, and is strictly positive on (0,1)d(0,1)^{d}. Furthermore, for a given vector of margins 𝐅\mathbf{F}, 𝛉C𝛉𝐅{\boldsymbol{\theta}}\mapsto C_{\boldsymbol{\theta}}\circ\mathbf{F} is injective on 𝒫\mathcal{P}.

Suppose that 𝐗1,,𝐗n\mathbf{X}_{1},\ldots,\mathbf{X}_{n} are iid with 𝐗i𝒦𝜽0\mathbf{X}_{i}\sim\mathcal{K}_{{\boldsymbol{\theta}}_{0}}, for some 𝜽0𝒫{\boldsymbol{\theta}}_{0}\in\mathcal{P}. Without loss of generality, we may suppose that 𝐗i=𝐅1(𝐔i)\mathbf{X}_{i}=\mathbf{F}^{-1}(\mathbf{U}_{i}), where 𝐔1,,𝐔n\mathbf{U}_{1},\ldots,\mathbf{U}_{n} are iid with 𝐔iC𝜽0\mathbf{U}_{i}\sim C_{{\boldsymbol{\theta}}_{0}}. Estimating parameters for arbitrary distributions is more challenging that in the continuous case. For continuous (unknown) margins, copula parameters can be estimated using different approaches, the most popular ones based on pseudo log-likelihood being the normalized ranks method (Genest et al.,, 1995, Shih and Louis,, 1995) and the IFM method (Inference Functions for Margin) (Joe and Xu,, 1996, Joe,, 2005). Since the margins are unknown, it is tempting to ignore that some margins are discontinuous and consider maximizing the (averaged) pseudo log-likelihood Ln(𝜽)=1ni=1nlogc𝜽(𝐔n,i)L_{n}({\boldsymbol{\theta}})=\dfrac{1}{n}\sum_{i=1}^{n}\log{c_{\boldsymbol{\theta}}(\mathbf{U}_{n,i})}, 𝐔n,i=𝐅n(𝐗i)\mathbf{U}_{n,i}=\mathbf{F}_{n}(\mathbf{X}_{i}), i{1,,n}i\in\{1,\ldots,n\}. Note than one could also replace the nonparametric margins with parametric ones, as in the IFM approach. However, in either cases, there is a problem with using LnL_{n} when there are atoms. To see this, consider a simple bivariate model with Bernoulli margins, i.e., P(Xij=0)=pjP(X_{ij}=0)=p_{j}, P(Xij=1)=1pjP(X_{ij}=1)=1-p_{j}, j{1,2}j\in\{1,2\}. Then, the full (averaged) log-likelihood is

n(𝜽,p1,p2)\displaystyle\ell_{n}^{\star}({\boldsymbol{\theta}},p_{1},p_{2}) =\displaystyle= hn(0,0)log{C𝜽(p1,p2)}+hn(0,1)log{p1C𝜽(p1,p2)}\displaystyle h_{n}(0,0)\log\{C_{\boldsymbol{\theta}}(p_{1},p_{2})\}+h_{n}(0,1)\log\{p_{1}-C_{\boldsymbol{\theta}}(p_{1},p_{2})\}
+hn(1,0)log{p2C𝜽(p1,p2)}+hn(1,1)log{1p1p2+C𝜽(p1,p2)},\displaystyle+h_{n}(1,0)\log\{p_{2}-C_{\boldsymbol{\theta}}(p_{1},p_{2})\}+h_{n}(1,1)\log\{1-p_{1}-p_{2}+C_{\boldsymbol{\theta}}(p_{1},p_{2})\},

where hn(x1,x2)=1ni=1n𝕀{Xi1=x1,Xi2=x2}h_{n}(x_{1},x_{2})=\dfrac{1}{n}\sum_{i=1}^{n}\mathbb{I}\{X_{i1}=x_{1},X_{i2}=x_{2}\}. Hence, the pseudo log-likelihood n\ell_{n} for 𝜽{\boldsymbol{\theta}}, when p1p_{1} and p2p_{2} are estimated by pn1=hn(0,0)+hn(0,1)p_{n1}=h_{n}(0,0)+h_{n}(0,1) and pn2=hn(0,0)+hn(1,0)p_{n2}=h_{n}(0,0)+h_{n}(1,0), is

n(𝜽)\displaystyle{\ell}_{n}({\boldsymbol{\theta}}) =\displaystyle= hn(0,0)log{C𝜽(pn1,pn2)}+hn(0,1)log{pn1C𝜽(pn1,pn2)}\displaystyle h_{n}(0,0)\log\{C_{\boldsymbol{\theta}}(p_{n1},p_{n2})\}+h_{n}(0,1)\log\{p_{n1}-C_{\boldsymbol{\theta}}(p_{n1},p_{n2})\}
+hn(1,0)log{pn2C𝜽(pn1,pn2)}+hn(1,1)log{1pn1pn2+C𝜽(pn1,pn2)}.\displaystyle+h_{n}(1,0)\log\{p_{n2}-C_{\boldsymbol{\theta}}(p_{n1},p_{n2})\}+h_{n}(1,1)\log\{1-p_{n1}-p_{n2}+C_{\boldsymbol{\theta}}(p_{n1},p_{n2})\}.

It is clear that under general conditions, when the parameter is 1-dimensional, and the copula family is rich enough, maximizing n\ell_{n} produces a consistent estimator for 𝜽{\boldsymbol{\theta}} while maximizing LnL_{n} will not. In fact, for n\ell_{n}, the estimator θn\theta_{n} of θ\theta is the solution of Cθ(pn1,pn2)=hn(0,0)C_{\theta}(p_{n1},p_{n2})=h_{n}(0,0) (Genest and Nešlehová,, 2007, Example 13). Note also that n\ell_{n} converges almost surely to

(𝜽)\displaystyle\ell_{\infty}({\boldsymbol{\theta}}) =\displaystyle= h(0,0)log{C𝜽(p1,p2)}+h(0,1)log{p1C𝜽(p1,p2)}\displaystyle h(0,0)\log\{C_{\boldsymbol{\theta}}(p_{1},p_{2})\}+h(0,1)\log\{p_{1}-C_{\boldsymbol{\theta}}(p_{1},p_{2})\}
+h(1,0)log{p2C𝜽(p1,p2)}+h(1,1)log{1p1p2+C𝜽(p1,p2)},\displaystyle\qquad+h(1,0)\log\{p_{2}-C_{\boldsymbol{\theta}}(p_{1},p_{2})\}+h(1,1)\log\{1-p_{1}-p_{2}+C_{\boldsymbol{\theta}}(p_{1},p_{2})\},

where h(i,j)=P(X1=i,X2=j)h(i,j)=P(X_{1}=i,X_{2}=j), i,j{0,1}i,j\in\{0,1\}. However, the estimator obtained by maximizing LnL_{n} is not consistent in general. In fact, for any copula CθC_{\theta} with a density cθc_{\theta} which is continuous and non-vanishing on (0,1]2(0,1]^{2},

Ln(𝜽)\displaystyle L_{n}({\boldsymbol{\theta}}) =\displaystyle= hn(0,0)logcθ(nn+1pn,1,nn+1pn,2)+hn(1,0)logcθ(nn+1,nn+1pn,2)\displaystyle h_{n}(0,0)\log{c_{\theta}\left(\dfrac{n}{n+1}p_{n,1},\dfrac{n}{n+1}p_{n,2}\right)}+h_{n}(1,0)\log{c_{\theta}\left(\dfrac{n}{n+1},\dfrac{n}{n+1}p_{n,2}\right)}
+hn(0,1)logcθ(nn+1pn,1,nn+1)+hn(1,1)logcθ(nn+1,nn+1)\displaystyle\quad+h_{n}(0,1)\log{c_{\theta}\left(\dfrac{n}{n+1}p_{n,1},\dfrac{n}{n+1}\right)}+h_{n}(1,1)\log{c_{\theta}\left(\dfrac{n}{n+1},\dfrac{n}{n+1}\right)}

converges almost surely to L(𝜽)=h(0,0)logcθ(p1,p2)+h(1,0)logcθ(1,p2)+h(0,1)logcθ(p1,1)+h(1,1)logcθ(1,1).L_{\infty}({\boldsymbol{\theta}})=h(0,0)\log{c_{\theta}\left(p_{1},p_{2}\right)}+h(1,0)\log{c_{\theta}\left(1,p_{2}\right)}+h(0,1)\log{c_{\theta}\left(p_{1},1\right)}+h(1,1)\log{c_{\theta}\left(1,1\right)}. In particular, for a Clayton copula with θ0=2\theta_{0}=2, one gets L(𝜽)=log(1+θ)+θ(p1logp1+p2logp2)+h(0,0)(1+2θ){logCθ(p1,p2)log(p1p2)}L_{\infty}({\boldsymbol{\theta}})=\log(1+\theta)+\theta(p_{1}\log{p_{1}}+p_{2}\log{p_{2}})+h(0,0)(1+2\theta)\left\{\log{C_{\theta}(p_{1},p_{2})}-\log(p_{1}p_{2})\right\}, with h(0,0)=Cθ0(1/2,1/2)=1/7h(0,0)=C_{\theta_{0}}(1/2,1/2)=1/\sqrt{7}. As displayed in Figure 2, for the limit LL_{\infty} of LnL_{n}, the supremum is attained at 𝜽=4.9439{\boldsymbol{\theta}}=4.9439, while for the limit \ell_{\infty} of n\ell_{n}, the supremum is attained at the correct value θ=θ0=2\theta=\theta_{0}=2.

a) Refer to caption b) Refer to caption

Figure 2. Graphs of L(θ)L_{\infty}(\theta) (a) and (θ)\ell_{\infty}(\theta) (b) for the bivariate Bernoulli case with Clayton copula (θ0=2\theta_{0}=2.)
Remark 2.

This simple example shows that one must be very careful in estimating copula parameters when the margins are not continuous. In particular, one should not use the usual pseudo-MLE method based on LnL_{n} or the IFM method with continuous margins fitted to data with ties.

3.1. Pseudo log-likelihoods

For any j{1,,d}j\in\{1,\ldots,d\}, let 𝒜j={x:ΔFj(x)>0}\mathcal{A}_{j}=\{x\in\mathbb{R}:\Delta F_{j}(x)>0\} be the (countable) set of atoms of FjF_{j}, where for a general univariate distribution function 𝒢\mathcal{G}, Δ𝒢(x)=𝒢(x)𝒢(x)\Delta\mathcal{G}(x)=\mathcal{G}(x)-\mathcal{G}({x}-), with 𝒢(x)=limn𝒢(x1/n)\mathcal{G}({x}-)=\lim_{n\to\infty}\mathcal{G}(x-1/n). Throughout this paper, we assume that 𝒜j\mathcal{A}_{j} is closed. For j{1,,d}j\in\{1,\ldots,d\}, μcj\mu_{cj} denotes the counting measure on 𝒜j\mathcal{A}_{j}, and let \mathcal{L} be Lebesgue’s measure; both measures are defined on (,)(\mathbb{R},\mathcal{B}_{\mathbb{R}}), and μcj+\mu_{cj}+\mathcal{L}, also defined on (,)(\mathbb{R},\mathcal{B}_{\mathbb{R}}), is σ\sigma-finite. Further let μ\mu be the product measure on d\mathbb{R}^{d} defined by μ=(μc1+)××(μcd+)\mu=(\mu_{c1}+\mathcal{L})\times\cdots\times(\mu_{cd}+\mathcal{L}), which is also σ\sigma-finite. In what follows, 𝒢1(u)=inf{x:𝒢(x)u}\mathcal{G}^{-1}(u)=\inf\{x:\mathcal{G}(x)\geq u\}, u(0,1)u\in(0,1). Since our aim is to estimate the parameter of the copula family without knowing the margins, the following assumption is necessary.

Assumption 2.

The margins F1,,FdF_{1},\ldots,F_{d} do not depend on the parameter 𝛉𝒫p{\boldsymbol{\theta}}\in\mathcal{P}\subset\mathbb{R}^{p}. In addition, for any j{1,,d}j\in\{1,\ldots,d\}, FjF_{j} has density fjf_{j} with respect to the measure μcj+\mu_{cj}+\mathcal{L}.

What is meant in Assumption 2 is that if the margins were parametric (e.g., Poisson), their parameters are not related to the parameter of the copula. This assumption is needed when one wants to estimate the margins first, and then estimate 𝜽{\boldsymbol{\theta}}. Assumption 2 really means that 𝜽Fj(xj)0\nabla_{\boldsymbol{\theta}}F_{j}(x_{j})\equiv 0, which is a natural assumption. This assumption is also implicit in the continuous case.

To estimate the copula parameters, one could use at least two different pseudo log-likelihoods: an informed one, if 𝒜1,,𝒜d\mathcal{A}_{1},\ldots,\mathcal{A}_{d} are known, and a non-informed one, if some atoms are not known. The latter is the approach proposed in Li et al., (2020) in the bivariate case. From a practical point of view, it is easier to implement, since there is no need to define the atoms. However, it requires more assumptions, and one could argue that in practice, atoms should be known. Before writing these pseudo log-likelihoods, we need to introduce some notations. Suppose that CC is a copula having a continuous density cc on (0,1)d(0,1)^{d}. For any B{1,,d}B\subset\{1,\ldots,d\} and for any 𝐮=(u1,,ud)(0,1)d\mathbf{u}=(u_{1},\ldots,u_{d})\in(0,1)^{d}, set BC(𝐮)={jBuj}C(𝐮)\partial_{B}C(\mathbf{u})=\{\prod_{j\in B}\partial_{u_{j}}\}C(\mathbf{u}), where {1,,d}C(𝐮)=c(𝐮)\partial_{\{1,\ldots,d\}}C(\mathbf{u})=c(\mathbf{u}) is the density of C𝜽C_{\boldsymbol{\theta}} and C(𝐮)=C(𝐮)\partial_{\emptyset}C(\mathbf{u})=C(\mathbf{u}). Finally, for any vector 𝐆=(𝒢1,,𝒢d)\mathbf{G}=(\mathcal{G}_{1},\ldots,\mathcal{G}_{d}) of margins, and any B{1,,d}B\subset\{1,\ldots,d\}, set

(1) (𝐆~(B)(𝐱))j={𝒢j(xj), if jB;𝒢j(xj), if jB={1,,d}B.\left(\tilde{\mathbf{G}}^{(B)}(\mathbf{x})\right)_{j}=\left\{\begin{array}[]{ll}\mathcal{G}_{j}({x_{j}}-)&\text{, if }j\in B;\\ \mathcal{G}_{j}(x_{j})&\text{, if }j\in B^{\complement}=\{1,\ldots,d\}\setminus B.\end{array}\right.

Under Assumptions 12, it follows from Equation (3) in the Supplementary Material, that the full (averaged) log-likelihood is given by

n(𝜽)=1nA{1,,d}i=1nJA(𝐗i)logKA,𝜽(𝐗i)+1nA{1,,d}i=1njAlogfj(Xij),\ell_{n}^{\star}({\boldsymbol{\theta}})=\dfrac{1}{n}\sum_{A\subset\{1,\ldots,d\}}\sum_{i=1}^{n}J_{A}(\mathbf{X}_{i})\log{K_{A,{\boldsymbol{\theta}}}(\mathbf{X}_{i})}+\dfrac{1}{n}\sum_{A\subset\{1,\ldots,d\}}\sum_{i=1}^{n}\sum_{j\in A^{\complement}}\log f_{j}(X_{ij}),

where, for any A{1,,d}A\subset\{1,\ldots,d\}, JA(𝐱)=[jA𝕀{xj𝒜j}][jA𝕀{xj𝒜j}]J_{A}(\mathbf{x})=\left[\prod_{j\in A}\mathbb{I}\{x_{j}\in\mathcal{A}_{j}\}\right]\left[\prod_{j\in A^{\complement}}\mathbb{I}\{x_{j}\not\in\mathcal{A}_{j}\}\right] and

(2) KA,𝜽(𝐱)=BA(1)|B|AC𝜽{𝐅~(B)(𝐱)},K_{A,{\boldsymbol{\theta}}}(\mathbf{x})=\sum_{B\subset A}(-1)^{|B|}\partial_{A^{\complement}}C_{\boldsymbol{\theta}}\left\{\tilde{\mathbf{F}}^{(B)}(\mathbf{x})\right\},

with the usual convention that a product over the empty set is 11. If the margins were known or estimated first, then maximizing n(𝜽)\ell_{n}^{\star}({\boldsymbol{\theta}}) or maximizing A{1,,d}i=1nJA(𝐗i)logKA,𝜽(𝐗i)\displaystyle\sum_{A\subset\{1,\ldots,d\}}\sum_{i=1}^{n}J_{A}(\mathbf{X}_{i})\log{K_{A,{\boldsymbol{\theta}}}(\mathbf{X}_{i})} with respect to 𝜽{\boldsymbol{\theta}} would be equivalent, since by Assumption 2, the margins do not depend on 𝜽{\boldsymbol{\theta}}. As a result, one gets the “informed” (averaged) pseudo log-likelihood n\ell_{n}, defined by

(3) n(𝜽)=1nA{1,,d}i=1nJA(𝐗i)logKn,A,𝜽(𝐗i),\ell_{n}({\boldsymbol{\theta}})=\dfrac{1}{n}\sum_{A\subset\{1,\ldots,d\}}\sum_{i=1}^{n}J_{A}(\mathbf{X}_{i})\log{K_{n,A,{\boldsymbol{\theta}}}(\mathbf{X}_{i})},

where for any A{1,,d}A\subset\{1,\ldots,d\}, Kn,A,𝜽(𝐱)=BA(1)|B|AC𝜽{𝐅~n(B)(𝐱)}\displaystyle K_{n,A,{\boldsymbol{\theta}}}(\mathbf{x})=\sum_{B\subset A}(-1)^{|B|}\partial_{A^{\complement}}C_{\boldsymbol{\theta}}\left\{\tilde{\mathbf{F}}_{n}^{(B)}(\mathbf{x})\right\}, and 𝐅~n(B)(𝐱)\tilde{\mathbf{F}}_{n}^{(B)}(\mathbf{x}) is defined by (12), with 𝐆=𝐅n\mathbf{G}=\mathbf{F}_{n}. Setting 𝜽n=argmax𝜽𝒫n(𝜽){\boldsymbol{\theta}}_{n}=\displaystyle\arg\max_{{\boldsymbol{\theta}}\in\mathcal{P}}\ell_{n}({\boldsymbol{\theta}}), then we define 𝚯n=n1/2(𝜽n𝜽0){\boldsymbol{\Theta}}_{n}=n^{1/2}({\boldsymbol{\theta}}_{n}-{\boldsymbol{\theta}}_{0}). As Li et al., (2020) did in the bivariate case, we can also define the “non-informed” (averaged) pseudo log-likelihood ~n(𝜽)\tilde{\ell}_{n}({\boldsymbol{\theta}}), where Jn,A(𝐱)=[jA𝕀{ΔFnj(xj)>1/(n+1)}][jA𝕀{ΔFnj(xj)=1/(n+1)}]J_{n,A}(\mathbf{x})=\left[\prod_{j\in A}\mathbb{I}\{\Delta F_{nj}(x_{j})>1/(n+1)\}\right]\left[\prod_{j\in A^{\complement}}\mathbb{I}\{\Delta F_{nj}(x_{j})=1/(n+1)\}\right] replaces JAJ_{A} in (3). Of course, for a given ii, if XijX_{ij} is an atom, then, when nn is large enough, ΔFnj(Xij)>1/(n+1)\Delta F_{nj}(X_{ij})>1/(n+1). However, for a given nn, it is possible that ΔFnj(Xij)=1/(n+1)\Delta F_{nj}(X_{ij})=1/(n+1) even when XijX_{ij} is an atom. If the real value of the parameter is 𝜽0{\boldsymbol{\theta}}_{0} and 𝜽~n=argmax𝜽𝒫~n(𝜽)\tilde{\boldsymbol{\theta}}_{n}=\displaystyle\arg\max_{{\boldsymbol{\theta}}\in\mathcal{P}}\tilde{\ell}_{n}({\boldsymbol{\theta}}), we set 𝚯~n=n1/2(𝜽~n𝜽0)\tilde{\boldsymbol{\Theta}}_{n}=n^{1/2}\left(\tilde{\boldsymbol{\theta}}_{n}-{\boldsymbol{\theta}}_{0}\right).

Example 3.

If d=2d=2, then Kn,,𝜽(x1,x2)=c𝜽{Fn1(x1),Fn2(x2)}K_{n,\emptyset,{\boldsymbol{\theta}}}(x_{1},x_{2})=c_{\boldsymbol{\theta}}\{F_{n1}(x_{1}),F_{n2}(x_{2})\}, Kn,{1},𝜽(x1,x2)=u2C𝜽{Fn1(x1),Fn2(x2)}u2C𝜽{Fn1(x1),Fn2(x2)}K_{n,\{1\},{\boldsymbol{\theta}}}(x_{1},x_{2})=\partial_{u_{2}}C_{\boldsymbol{\theta}}\{F_{n1}(x_{1}),F_{n2}(x_{2})\}-\partial_{u_{2}}C_{\boldsymbol{\theta}}\{F_{n1}(x_{1}-),F_{n2}(x_{2})\}, Kn,{2},𝜽(x1,x2)=u1C𝜽{Fn1(x1),Fn2(x2)}u1C𝜽{Fn1(x1),Fn2(x2)}K_{n,\{2\},{\boldsymbol{\theta}}}(x_{1},x_{2})=\partial_{u_{1}}C_{\boldsymbol{\theta}}\{F_{n1}(x_{1}),F_{n2}(x_{2})\}-\partial_{u_{1}}C_{\boldsymbol{\theta}}\{F_{n1}(x_{1}),F_{n2}(x_{2}-)\}, and Kn,{1,2},𝜽(x1,x2)=C𝜽{Fn1(x1),Fn2(x2)}C𝜽{Fn1(x1),Fn2(x2)}C𝜽{Fn1(x1),Fn2(x2)}+C𝜽{Fn1(x1),Fn2(x2)}K_{n,\{1,2\},{\boldsymbol{\theta}}}(x_{1},x_{2})=C_{\boldsymbol{\theta}}\{F_{n1}(x_{1}),F_{n2}(x_{2})\}-C_{\boldsymbol{\theta}}\{F_{n1}(x_{1}-),F_{n2}(x_{2})\}-C_{\boldsymbol{\theta}}\{F_{n1}(x_{1}),F_{n2}(x_{2}-)\}+C_{\boldsymbol{\theta}}\{F_{n1}(x_{1}-),F_{n2}(x_{2}-)\}.

Remark 3.

In Li et al., (2020), instead of using Fnj(Xij)F_{nj}(X_{ij}-), the authors used Fnj(Xij)+1n+1F_{nj}(X_{ij}-)+\dfrac{1}{n+1}. The difference between our pseudo log-likelihood and theirs is negligible. However, our choice seems more natural and simplifies notations in the multivariate case, which was not considered in Li et al., (2020).

One can see that computing the pseudo log-likelihoods n\ell_{n} might be cumbersome when dd is large. To overcome this problem, we propose to use a pairwise composite pseudo log-likelihood. For a review on this approach in other settings, see, e.g., Varin et al., (2011). See also Oh and Patton, (2016) for the composite method in a particular copula context. Here, the pairwise composite pseudo log-likelihood is simply defined as ˇn=1k<ldn(k,l)\displaystyle\check{\ell}_{n}=\sum_{1\leq k<l\leq d}\ell_{n}^{(k,l)}, where n(k,l)\ell_{n}^{(k,l)} is the pseudo log-likelihood defined by (3) for the pairs (Xik,Xil)(X_{ik},X_{il}), i{1,,n}i\in\{1,\ldots,n\}. One can also replace n\ell_{n} by ~n\tilde{\ell}_{n} is the previous expression. If 𝜽ˇn=argmax𝜽𝒫ˇn(𝜽)\check{\boldsymbol{\theta}}_{n}=\displaystyle\arg\max_{{\boldsymbol{\theta}}\in\mathcal{P}}\check{\ell}_{n}({\boldsymbol{\theta}}), set 𝚯ˇn=n1/2(𝜽ˇn𝜽0)\check{\boldsymbol{\Theta}}_{n}=n^{1/2}(\check{\boldsymbol{\theta}}_{n}-{\boldsymbol{\theta}}_{0}). The asymptotic behavior of the estimators 𝜽n{\boldsymbol{\theta}}_{n}, 𝜽~n\tilde{\boldsymbol{\theta}}_{n}, and 𝜽ˇn\check{\boldsymbol{\theta}}_{n} is studied next.

3.2. Notations and other assumptions

For sake of simplicity, the gradient column vector of a function gg with respect to 𝜽{\boldsymbol{\theta}} is often denoted g˙\dot{g} or 𝜽g\nabla_{\boldsymbol{\theta}}g, while the associated Hessian matrix is denoted by g¨\ddot{g} or 𝜽2g\nabla_{\boldsymbol{\theta}}^{2}g. Before stating the main convergence results, for any A{1,,d}A\subset\{1,\ldots,d\}, 0aj<bj10\leq a_{j}<b_{j}\leq 1, jAj\in A, uj(0,1)u_{j}\in(0,1), jAj\in A^{\complement}, define

(4) 𝝋A,𝜽(𝐚,𝐛,𝐮)=BA(1)|B|AC˙𝜽((𝐚,𝐛,𝐮)(B,A))BA(1)|B|AC𝜽((𝐚,𝐛,𝐮)(B,A)),{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\left(\mathbf{a},\mathbf{b},\mathbf{u}\right)=\frac{\sum_{B\subset A}(-1)^{|B|}\partial_{A^{\complement}}\dot{C}_{\boldsymbol{\theta}}\left((\mathbf{a},\mathbf{b},\mathbf{u})^{(B,A)}\right)}{\sum_{B\subset A}(-1)^{|B|}\partial_{A^{\complement}}C_{\boldsymbol{\theta}}\left((\mathbf{a},\mathbf{b},\mathbf{u})^{(B,A)}\right)},

where for BAB\subset A, ((𝐚,𝐛,𝐮)(B,A))j={aj,jB,bj,jAB,uj,jA.\left((\mathbf{a},\mathbf{b},\mathbf{u})^{(B,A)}\right)_{j}=\left\{\begin{array}[]{ll}a_{j},&j\in B,\\ b_{j},&j\in A\setminus B,\\ u_{j},&j\in A^{\complement}.\end{array}\right. In particular, 𝝋(𝐮)=c˙𝜽(𝐮)c𝜽(𝐮){\boldsymbol{\varphi}}_{\emptyset}(\mathbf{u})=\frac{\dot{c}_{\boldsymbol{\theta}}(\mathbf{u})}{c_{\boldsymbol{\theta}}(\mathbf{u})}, and if A={1,,k}A=\{1,\ldots,k\}, k{1,,d}k\in\{1,\ldots,d\}, one has

𝝋A,𝜽(𝐚,𝐛,𝐮)=𝝋A,𝜽(a1,,ak,b1,bk,uk+1,,ud)=a1b1akbkc˙𝜽(s1,,sk,uk+1,,ud)𝑑s1𝑑ska1b1akbkc𝜽(s1,,sk,uk+1,,ud)𝑑s1𝑑sk.{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}(\mathbf{a},\mathbf{b},\mathbf{u})={\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}(a_{1},\ldots,a_{k},b_{1},\ldots b_{k},u_{k+1},\ldots,u_{d})\\ =\frac{\int_{a_{1}}^{b_{1}}\cdots\int_{a_{k}}^{b_{k}}\dot{c}_{\boldsymbol{\theta}}(s_{1},\ldots,s_{k},u_{k+1},\ldots,u_{d})ds_{1}\cdots ds_{k}}{\int_{a_{1}}^{b_{1}}\cdots\int_{a_{k}}^{b_{k}}c_{\boldsymbol{\theta}}(s_{1},\ldots,s_{k},u_{k+1},\ldots,u_{d})ds_{1}\cdots ds_{k}}.

Next, set 𝐇A,𝜽(𝐱)=JA(𝐱)𝝋A,𝜽{𝐅A(𝐱),𝐅A(𝐱),𝐅A(𝐱)}\mathbf{H}_{A,{\boldsymbol{\theta}}}(\mathbf{x})=J_{A}(\mathbf{x}){\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\left\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}_{A}(\mathbf{x}),\mathbf{F}_{A^{\complement}}(\mathbf{x})\right\}. Finally, set 𝒦=𝒦𝜽0\mathcal{K}=\mathcal{K}_{{\boldsymbol{\theta}}_{0}} and 𝐇𝜽=A{1,,d}𝐇A,𝜽\mathbf{H}_{\boldsymbol{\theta}}=\sum_{A\subset\{1,\ldots,d\}}\mathbf{H}_{A,{\boldsymbol{\theta}}}.

Assumption 3.

The functions 𝛗A,𝛉{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}, A{1,,d}A\subset\{1,\ldots,d\}, satisfy Assumptions 67.

Next, to be able to deal with ~n\tilde{\ell}_{n} and the associated composite pseudo log-likelihood, one needs extra assumptions. First, one needs to control the gradient of the likelihood.

Assumption 4.

There is a neighborhood 𝒩\mathcal{N} of 𝛉0{\boldsymbol{\theta}}_{0} such that for any A{1,,d}A\subset\{1,\ldots,d\},

n1/2max𝜽𝒩max1in|BA(1)|B|˙AC𝜽{𝐅n(B)(𝐗i)}BA(1)|B|AC𝜽{𝐅n(B)(𝐗i)}|Pr0.n^{-1/2}\max_{{\boldsymbol{\theta}}\in\mathcal{N}}\max_{1\leq i\leq n}\left|\frac{\sum_{B\subset A}(-1)^{|B|}\dot{\partial}_{A^{\complement}}C_{{\boldsymbol{\theta}}}\left\{\mathbf{F}_{n}^{(B)}(\mathbf{X}_{i})\right\}}{\sum_{B\subset A}(-1)^{|B|}\partial_{A^{\complement}}C_{{\boldsymbol{\theta}}}\left\{\mathbf{F}_{n}^{(B)}(\mathbf{X}_{i})\right\}}\right|\stackrel{{\scriptstyle Pr}}{{\to}}0.

Second, in order to measure the errors one makes by considering Jn,AJ_{n,A} instead of JAJ_{A}, set

nj=i=1n𝕀{ΔFnj(Xij)=1/(n+1)}𝕀(Xij𝒜j).\mathcal{E}_{nj}=\sum_{i=1}^{n}\mathbb{I}\{\Delta F_{nj}(X_{ij})=1/(n+1)\}\mathbb{I}(X_{ij}\in\mathcal{A}_{j}).

Then, E(nj)=Vn(Fj)=nx𝒜jΔFj(x){1ΔFj(x)}n1\displaystyle E(\mathcal{E}_{nj})=V_{n}(F_{j})=n\sum_{x\in\mathcal{A}_{j}}\Delta F_{j}(x)\{1-\Delta F_{j}(x)\}^{n-1}, j{1,,d}j\in\{1,\ldots,d\}.

Assumption 5.

For any j{1,,d}j\in\{1,\ldots,d\}, lim supnVn(Fj)<\displaystyle\limsup_{n\to\infty}V_{n}(F_{j})<\infty.

Assumption 5 means that the average number of indices ii so that ΔFnj(Xij)=1/(n+1)\Delta F_{nj}(X_{ij})=1/(n+1) when Xij𝒜jX_{ij}\in\mathcal{A}_{j} is bounded. This assumption holds true when the discrete part of the margin is a finite discrete distribution, a Geometric distribution, a Negative Binomial distribution, or a Poisson distribution, using Remark 1 in the Supplementary Material.

3.3. Convergence of estimators

Recall that for any j{1,,d}j\in\{1,\ldots,d\} and any i{1,,n}i\in\{1,\ldots,n\}, Xij=Fj1(Uij)X_{ij}=F_{j}^{-1}(U_{ij}), where 𝐔1=(U11,,U1d)\mathbf{U}_{1}=(U_{11},\ldots,U_{1d}), ,𝐔n=(Un1,,Und)\ldots,\mathbf{U}_{n}=(U_{n1},\ldots,U_{nd}) are iid observations from copula C𝜽0C_{{\boldsymbol{\theta}}_{0}}. Also, let P𝜽0P_{{\boldsymbol{\theta}}_{0}} be the associated probability distribution of 𝐗i\mathbf{X}_{i}, for any i{1,,n}i\in\{1,\ldots,n\}.
Now, for any j{1,,d}j\in\{1,\ldots,d\}, and any yy\in\mathbb{R}, Fnj(y)=BnjFj(y)F_{nj}(y)=B_{nj}\circ F_{j}(y), where Bnj(v)=1n+1i=1n𝕀(Uijv)\displaystyle B_{nj}(v)=\dfrac{1}{n+1}\sum_{i=1}^{n}\mathbb{I}(U_{ij}\leq v), v[0,1]v\in[0,1]. Note that if for some i{1,,n}i\in\{1,\ldots,n\}, j{1,,d}j\in\{1,\ldots,d\}, Xij𝒜jX_{ij}\not\in\mathcal{A}_{j}, i.e., ΔFj(Xij)=0\Delta F_{j}(X_{ij})=0, then Fnj(Xij)=Bnj(Uij)F_{nj}(X_{ij})=B_{nj}(U_{ij}). It is well-known that the processes 𝔹nj(uj)=n1/2{Bnj(uj)uj}\mathbb{B}_{nj}(u_{j})=n^{1/2}\{B_{nj}(u_{j})-u_{j}\}, uj[0,1]u_{j}\in[0,1], j{1,,d}j\in\{1,\ldots,d\}, converge jointly in C([0,1])C([0,1]) to 𝔹j\mathbb{B}_{j}, denoted 𝔹nj𝔹j\mathbb{B}_{nj}\rightsquigarrow\mathbb{B}_{j}, where 𝔹j\mathbb{B}_{j} are Brownian bridges, i.e., 𝔹1,,𝔹d\mathbb{B}_{1},\ldots,\mathbb{B}_{d} are continuous centered Gaussian processes with Cov{𝔹j(s),𝔹k(t)}=P𝜽0(U1js,U1kt)st{\rm Cov}\{\mathbb{B}_{j}(s),\mathbb{B}_{k}(t)\}=P_{{\boldsymbol{\theta}}_{0}}(U_{1j}\leq s,U_{1k}\leq t)-st, s,t[0,1]s,t\in[0,1]. In particular, for any j{1,,d}j\in\{1,\ldots,d\}, Cov{𝔹j(s),𝔹j(t)}=min(s,t)st{\rm Cov}\{\mathbb{B}_{j}(s),\mathbb{B}_{j}(t)\}=\min(s,t)-st. Before stating the main convergence results for the estimation errors 𝚯n=n1/2(𝜽n𝜽0){\boldsymbol{\Theta}}_{n}=n^{1/2}({\boldsymbol{\theta}}_{n}-{\boldsymbol{\theta}}_{0}) and 𝚯~n=n1/2(𝜽~n𝜽0)\tilde{\boldsymbol{\Theta}}_{n}=n^{1/2}\left(\tilde{\boldsymbol{\theta}}_{n}-{\boldsymbol{\theta}}_{0}\right), for any A{1,,d}A\subset\{1,\ldots,d\}, define 𝜻A,i=HA,𝜽0(Xi){\boldsymbol{\zeta}}_{A,i}=H_{A,{\boldsymbol{\theta}}_{0}}(X_{i}) and set 𝜻i=A{1,,d}𝜻A,i\displaystyle{\boldsymbol{\zeta}}_{i}=\sum_{A\in\{1,\ldots,d\}}{\boldsymbol{\zeta}}_{A,i}. Next, set 𝒲n,1=n1/2i=1n𝜻i\displaystyle\mathcal{W}_{n,1}=n^{-1/2}\sum_{i=1}^{n}{\boldsymbol{\zeta}}_{i} and let

𝒲n,2\displaystyle\mathcal{W}_{n,2} =\displaystyle= j=1dA∌jc𝜽0(𝐮)𝜼A,j,2(𝐮)𝔹nj(uj)𝑑𝐮\displaystyle-\sum_{j=1}^{d}\sum_{A\not\ni j}\int c_{{\boldsymbol{\theta}}_{0}}(\mathbf{u}){\boldsymbol{\eta}}_{A,j,2}(\mathbf{u})\mathbb{B}_{nj}(u_{j})d\mathbf{u}
j=1dxj𝒜j𝔹nj{Fj(xj)}Ajc𝜽0(𝐮)𝜼A,j,2+(xj,𝐮)𝑑𝐮\displaystyle\quad-\sum_{j=1}^{d}\sum_{x_{j}\in\mathcal{A}_{j}}\mathbb{B}_{nj}\{F_{j}(x_{j})\}\sum_{A\ni j}\int c_{{\boldsymbol{\theta}}_{0}}(\mathbf{u}){\boldsymbol{\eta}}_{A,j,2+}(x_{j},\mathbf{u})d\mathbf{u}
+j=1dxj𝒜j𝔹nj{Fj(xj)}Ajc𝜽0(𝐮)𝜼A,j,2(xj,𝐮)𝑑𝐮,\displaystyle\qquad+\sum_{j=1}^{d}\sum_{x_{j}\in\mathcal{A}_{j}}\mathbb{B}_{nj}\{F_{j}(x_{j}-)\}\sum_{A\ni j}\int c_{{\boldsymbol{\theta}}_{0}}(\mathbf{u}){\boldsymbol{\eta}}_{A,j,2-}(x_{j},\mathbf{u})d\mathbf{u},

where the functions 𝜼A,j,𝜼A,j,±,𝜼A,j,1,±,𝜼A,j,2±{\boldsymbol{\eta}}_{A,j},{\boldsymbol{\eta}}_{A,j,\pm},{\boldsymbol{\eta}}_{A,j,1,\pm},{\boldsymbol{\eta}}_{A,j,2\pm} are defined in Appendix B.1. Finally, set 𝒲n,0=n1/2i=1nc˙𝜽0(𝐔i)c𝜽0(𝐔i)\mathcal{W}_{n,0}=n^{-1/2}\sum_{i=1}^{n}\dfrac{\dot{c}_{{\boldsymbol{\theta}}_{0}}(\mathbf{U}_{i})}{c_{{\boldsymbol{\theta}}_{0}}(\mathbf{U}_{i})}. Basically, 𝒲n,1\mathcal{W}_{n,1} is what one should have if the margins were known, while 𝒲n,2\mathcal{W}_{n,2} is the price to pay for not knowing the margins. Finally, 𝒲n,0\mathcal{W}_{n,0} is needed for obtaining bootstrapping results (Genest and Rémillard,, 2008, Nasri and Rémillard,, 2019). The following result is a consequence of Theorem 3 and Theorem 4, proven in Appendix A and Appendix B.2 respectively.

Theorem 1.

Under Assumptions 13, 𝚯n{\boldsymbol{\Theta}}_{n} converges in law to 𝚯=𝒥11(𝒲1+𝒲2){\boldsymbol{\Theta}}=\mathcal{J}_{1}^{-1}(\mathcal{W}_{1}+\mathcal{W}_{2}), where 𝒲1N(0,𝒥1)\mathcal{W}_{1}\sim N(0,\mathcal{J}_{1}), E(𝒲1𝒲0)=𝒥1E\left(\mathcal{W}_{1}\mathcal{W}_{0}^{\top}\right)=\mathcal{J}_{1}, and E(𝒲2𝒲0)=0E\left(\mathcal{W}_{2}\mathcal{W}_{0}^{\top}\right)=0. If in addition Assumptions 45 hold true, then 𝚯~n𝚯n\tilde{\boldsymbol{\Theta}}_{n}-{\boldsymbol{\Theta}}_{n} converges in probability to 0, so 𝚯~n\tilde{\boldsymbol{\Theta}}_{n} converges in law to 𝚯{\boldsymbol{\Theta}}.

Corollary 1.

𝜽n{\boldsymbol{\theta}}_{n} and 𝛉~n\tilde{\boldsymbol{\theta}}_{n} are regular estimator of 𝛉0{\boldsymbol{\theta}}_{0} in the sense that 𝚯n{\boldsymbol{\Theta}}_{n} and 𝚯~n\tilde{\boldsymbol{\Theta}}_{n} converge to 𝚯{\boldsymbol{\Theta}} and E(𝚯𝒲0)=IdE\left({\boldsymbol{\Theta}}\mathcal{W}_{0}^{\top}\right)=I_{d}.

Proof.

Theorem 1 yields E(𝚯𝒲0)=𝒥11E{(𝒲1+𝒲2)𝒲0}=IdE\left({\boldsymbol{\Theta}}\mathcal{W}_{0}^{\top}\right)=\mathcal{J}_{1}^{-1}E\left\{(\mathcal{W}_{1}+\mathcal{W}_{2})\mathcal{W}_{0}^{\top}\right\}=I_{d}. ∎

Remark 4.

As shown in Genest and Rémillard, (2008), regularity of estimators is necessary for parametric bootstrap to work, using LeCam’s third lemma.

Having got the asymptotic behavior of 𝜽n{\boldsymbol{\theta}}_{n} and 𝜽~n\tilde{\boldsymbol{\theta}}_{n}, it is easy to obtain the following result.

Corollary 2.

Under Assumptions 13, 𝚯ˇn=n1/2(𝛉ˇn𝛉0)\check{\boldsymbol{\Theta}}_{n}=n^{1/2}(\check{\boldsymbol{\theta}}_{n}-{\boldsymbol{\theta}}_{0}) converges in law to

𝚯ˇ=(1k<ld𝒥(k,l))11k<ld𝒲(k,l),\check{\boldsymbol{\Theta}}=\left(\sum_{1\leq k<l\leq d}\mathcal{J}^{(k,l)}\right)^{-1}\sum_{1\leq k<l\leq d}\mathcal{W}^{(k,l)},

where 𝒲(k,l)=𝒲1(k,l)+𝒲2(k,l)\mathcal{W}^{(k,l)}=\mathcal{W}_{1}^{(k,l)}+\mathcal{W}_{2}^{(k,l)} are defined as 𝒲1\mathcal{W}_{1} and 𝒲2\mathcal{W}_{2} but restricted to the pairs (Xik,Xil)(X_{ik},X_{il}), i{1,,n}i\in\{1,\ldots,n\}. Moreover, 𝛉ˇn\check{\boldsymbol{\theta}}_{n} is regular. The same result holds if n\ell_{n} is replaced by ~n\tilde{\ell}_{n}, and if in addition, Assumptions 45 hold true.

Remark 5.

The previous results hold if one uses parametric margins instead of nonparametric margins, provided the margins are smooth enough and the estimated parameters converge in law. In fact, assume that for j{1,,d}j\in\{1,\ldots,d\}, Fnj=Fj,𝜸njF_{nj}=F_{j,{\boldsymbol{\gamma}}_{nj}}, Fj=Fj,𝜸0jF_{j}=F_{j,{\boldsymbol{\gamma}}_{0j}}, and 𝔽nj=n1/2{FnjFj}\mathbb{F}_{nj}=n^{1/2}\left\{F_{nj}-F_{j}\right\} converges in law to 𝔽j=𝚪jF˙j\mathbb{F}_{j}={\boldsymbol{\Gamma}}_{j}^{\top}\dot{F}_{j}, with F˙j=𝜸jFj,𝜸j|𝜸j=𝜸0j\dot{F}_{j}=\left.\nabla_{{\boldsymbol{\gamma}}_{j}}F_{j,{\boldsymbol{\gamma}}_{j}}\right|_{{\boldsymbol{\gamma}}_{j}={\boldsymbol{\gamma}}_{0j}}, and 𝚪j{\boldsymbol{\Gamma}}_{j} is a centered Gaussian random vector. Then, under Assumptions 13, replacing respectively 𝔹j(uj)\mathbb{B}_{j}(u_{j}), 𝔹j{Fj(xj)}\mathbb{B}_{j}\{F_{j}(x_{j})\}, and 𝔹j{Fj(xj)}\mathbb{B}_{j}\{F_{j}(x_{j}-)\} with 𝔽j{Fj1(uj)}\mathbb{F}_{j}\left\{F_{j}^{-1}(u_{j})\right\}, 𝔽j(xj)\mathbb{F}_{j}(x_{j}), and 𝔽j(xj)\mathbb{F}_{j}(x_{j}-), one obtains the analogs of Theorem 1 and Corollaries 12. Furthermore, this shows that one could also consider pair copula models.

4. Numerical experiments

In this section, we study the quality and performance of the proposed estimators, for various choices of copula families, margins and sample sizes. Note that all simulations were done with the more demanding case 𝜽~n\tilde{\boldsymbol{\theta}}_{n}. In the first set of experiments, we deal with the bivariate case, and in a second set of experiments, we compute the composite estimator in a trivariate setting. First, in the bivariate case, we consider five copula families and five pairs of margins for each copula family. For the first experiment (Exp1), both margins are standard Gaussian, i.e., F1,F2N(0,1)F_{1},F_{2}\sim N(0,1). In the second experiment (Exp2), the margins are Poisson with parameters 55 and 1010 respectively, i.e., F1𝒫(5)F_{1}\sim\mathcal{P}(5) and F2𝒫(10)F_{2}\sim\mathcal{P}(10), while in the third experiment (Exp3), F1𝒫(10)F_{1}\sim\mathcal{P}(10) and F2N(0,1)F_{2}\sim N(0,1). In the fourth experiment (Exp4), F1F_{1} is a rounded Gaussian, namely X1=1000Z1X_{1}=\lfloor 1000Z_{1}\rfloor, with Z1N(0,1)Z_{1}\sim N(0,1), and F2N(0,1)F_{2}\sim N(0,1). Finally, for the fifth experiment (Exp5), F1F_{1} is zero-inflated, with F1(0)=0F_{1}(0-)=0, F1(x)=0.05+0.95(2F2(x)1)F_{1}(x)=0.05+0.95(2F_{2}(x)-1), x0x\geq 0, and F2N(0,1)F_{2}\sim N(0,1). To estimate the parameter θ\theta of the Clayton, Frank, Gumbel, Gaussian and Student (with ν=5\nu=5) copula families corresponding to a Kendall’s tau τ0=0.5\tau_{0}=0.5, samples of size n{100,250,500}n\in\{100,250,500\} were generated for each copula family, and 𝜽n{\boldsymbol{\theta}}_{n} was computed. Here, τ\tau is Kendall’s tau of the copula family CθC_{\theta}, written τ(Cθ)\tau(C_{\theta}). For results to be comparable throughout copula families, we computed the relative bias and the relative root mean square error (RMSE) of τ(Cθn)\tau(C_{\theta_{n}}), instead of θn\theta_{n}. The results for 10001000 samples are reported in Table 1. As one can see, the estimator performs quite well for the five numerical experiments and the five copula families. Furthermore, the precision depends on the copula family, but for a given copula family, the precision does not vary significantly with the margins. Even the case when both margins are continuous (Exp1) does not yield the best results. When the sample size is 250 or more, the relative bias is always smaller than 2%. Finally, as expected, both the bias and the RMSE decrease when the sample size increases.

Table 1. Relative bias and relative RMSE (in parentheses) in percent for τ(Cθn)\tau(C_{\theta_{n}}) vs τ0\tau_{0} when n{100,250,500}n\in\{100,250,500\}, based on 1000 samples. For the Student copula, ν=5\nu=5 is assumed known.
Copula
Margins Clayton Frank Gumbel Gaussian Student
n=100n=100
Exp1 -0.42 (9.62) 0.40 (9.30) 3.02 (10.9) 2.92 (9.40) 2.18 (11.1)
Exp2 0.52 (10.2) 0.86 (9.64) 3.70 (11.4) 3.16 (9.80) 2.64 (11.5)
Exp3 0.02 (9.84) 0.58 (9.40) 3.34 (11.1) 2.96 (9.52) 2.36 (11.3)
Exp4 -0.44 (9.62) 0.40 (9.30) 3.02 (10.9) 2.88 (9.38) 2.14 (11.1)
Exp5 -1.00 (9.90) 0.36 (9.30) 2.76 (10.8) 2.42 (9.28) 1.68 (11.0)
n=250n=250
Exp1 0.18 (6.06) -0.32 (5.76) 0.56 (6.12) 1.30 (5.92) 0.92 (6.52)
Exp2 0.50 (6.44) -0.00 (5.92) 1.52 (6.36) 1.72 (6.18) 1.52 (6.82)
Exp3 0.40 (6.22) -0.20 (5.84) 0.96 (6.16) 1.42 (6.02) 1.16 (6.60)
Exp4 0.12 (6.08) -0.32 (5.76) 0.60 (6.12) 1.30 (5.92) 0.92 (6.52)
Exp5 -0.10 (6.22) -0.34 (5.76) 0.42 (6.12) 1.06 (5.90) 0.62 (6.56)
n=500n=500
Exp1 0.08 (4.44) -0.04 (3.88) 0.38 (4.44) 0.64 (4.40) 0.34 (4.48)
Exp2 0.36 (4.60) 0.06 (4.02) 0.72 (4.64) 0.76 (4.30) 0.54 (4.68)
Exp3 0.28 (4.50) -0.00 (3.90) 0.52 (4.52) 0.70 (4.24) 0.42 (4.56)
Exp4 0.04 (4.44) -0.04 (3.88) 0.40 (4.44) 0.62 (4.19) 0.32 (4.48)
Exp5 0.10 (4.48) -0.04 (3.88) 0.34 (4.44) 0.54 (4.20) 0.24 (4.48)

In the second set of experiments, using the pairwise composite estimator 𝜽ˇn\check{\boldsymbol{\theta}}_{n}, we estimated the parameters of a trivariate non-central squared Clayton copula (Nasri,, 2020) with Kendall’s tau τ0=0.5\tau_{0}=0.5, and non-centrality parameters a10=0.9a_{10}=0.9, a20=2.3a_{20}=2.3, and a30=1.4a_{30}=1.4. In this case, for all five experiments, F1F_{1} and F2F_{2} are defined as before, while F3N(0,1)F_{3}\sim N(0,1). The results are displayed in Table 2. As one could have guessed, the estimation of τ\tau is not as good as in the bivariate case, but the results are good enough. As for the non-centrality parameters, the estimation of a2a_{2}, which has a large value (the upper bound being 33), is not as good as the other values, but this is coherent with the simulations in Nasri, (2020). All in all, the composite method yields quite satisfactory results.

Table 2. Relative bias and RMSE (in parentheses) in percent for the estimation errors of τ0\tau_{0} and aja_{j} for j=1,2,3j=1,2,3 in the case of the trivariate non-central squared Clayton copula when n{100,250,500}n\in\{100,250,500\}, based on 1000 samples.
Parameters
Margins τ\tau a1a_{1} a2a_{2} a3a_{3}
n=100n=100
Exp1 4.01 (11.21) -4.92 (24.95) -30.67 (41.09) -5.08 (21.56)
Exp2 4.24 (11.68) -4.70 (25.27) -28.71 (40.58) -4.83 (21.33)
Exp3 4.31 (11.50) -5.74 (23.72) -30.02 (40.95) -5.32 (21.08)
Exp4 3.99 (11.22) -4.93 (25.04) -30.74 (41.10) -5.12 (21.52)
Exp5 3.99 (11.18) -4.98 (24.34) -30.63 (41.12) -5.22 (21.18)
n=250n=250
Exp1 1.34 (6.80) -4.10 (13.05) -21.93 (34.13) -3.82 (13.07)
Exp2 1.32 (6.97) -2.82 (13.24) -20.59 (33.63) -3.15 (13.07)
Exp3 1.41 (6.85) -3.47 (13.47) -20.46 (33.45) -3.66 (13.08)
Exp4 1.33 (6.77) -4.01 (12.98) -22.01 (34.03) -3.77 (12.83)
Exp5 1.34 (6.80) -4.10 (13.04) -21.93 (34.14) -3.82 (13.08)
n=500n=500
Exp1 0.63 (4.56) -2.07 (8.81) -12.30 (25.44) -2.19 (8.09)
Exp2 0.74 (4.74) -1.98 (9.13) -12.81 (26.67) -2.19 (8.54)
Exp3 0.73 (4.61) -2.07 (8.93) -12.24 (25.45) -2.30 (8.20)
Exp4 0.63 (4.56) -2.06 (8.84) -12.31 (25.44) -2.21 (8.13)
Exp5 0.63 (4.56) -2.05 (8.78) -12.20 (25.33) -2.17 (8.06)

5. Example of application

In this section, we propose a rigorous method to study the relationship between duration and severity for hydrological data used in Shiau, (2006). The data were kindly provided by the author. There are many articles in the hydrology literature about modeling drought duration and severity with copulas; see, e.g., Chen et al., (2013), Shiau, (2006). One of the main tools to compute the drought duration and severity is the so-called Standardized Precipitation Index (SPI) (McKee et al.,, 1993). Basically, McKee et al., (1993) suggest to fit a gamma distribution over a moving average (1-month, 3-month, etc.) of the precipitations and then transform them into a Gaussian distribution. However, it may happen that there are several zero values in the observations so fitting a continuous distribution is not possible. Using the data kindly provided by Professor Shiau (daily precipitations in millimeters for the Wushantou gauge station from 1932 to 2001), we see from Figure 3 that even taking a 1-month moving average leads to a zero-inflated distribution. So, instead of fitting a gamma distribution to the moving average, as it is often done, we suggest to simply apply the inverse Gaussian distribution to the empirical distribution. Then, one can compute the duration and severity: a drought is defined as a sequence of consecutive days with negative SPI values, say SPIi,,SPIjSPI_{i},\dots,SPI_{j}: the length the sequence is the duration DD, i.e., D=ji+1D=j-i+1, and the severity is defined by S=k=ijSPIkS=-\sum_{k=i}^{j}SPI_{k}. It makes sense to consider the severity SS as a continuous random variable but the duration DD is integer-valued. Again, in the literature, a continuous distribution is usually fitted to DD, which is incorrect. These variables are then divided by 3030 in order to represent months. With the dataset, we obtained 175 drought periods. A non-parametric estimation of the density of the severity per month is displayed in Figure 3 which seems to be a mixture of at least two distributions. We tried mixtures of up to 44 gamma distributions without success. A scatter plot of the duration and the severity also appears in Figure 3. With the copula-based methodology developed here, based on a measure of fit, we chose the Frank copula. In contrast, the preferred copula families in Shiau, (2006) were the Galambos and Gumbel families. Using a smoothed distribution for the severity DD, we can compute the conditional probability P(D>y|S=s)P(D>y|S=s) for y=1y=1 to 88 months, in addition to the conditional expectation E(D|S=s)E(D|S=s). These functions are displayed in Figure 4.

Refer to caption
Refer to caption
Refer to caption
Figure 3. Estimated zero-inflated density for the 1-month moving average of precipitations (top left) and severity per month (top right), together with a scatter plot of the duration and severity (bottom).
Refer to caption
Refer to caption
Figure 4. Conditional probability P(Dy)P(D\leq y) for y{1,,8}y\in\{1,\ldots,8\} months and conditional expectation of the duration given severity per month.

6. Conclusion

We presented methods based on pseudo log-likelihood for estimating the parameter of copula-based models for arbitrary multivariate data. These pseudo log-likelihoods depend on the non-parametric margins and are adapted to take into account the ties. We have also shown that the methodology can be extended to the case of parametric margins. According to numerical experiments, the proposed estimators perform quite well. As a example of application, we estimated the relationship between drought duration and severity in hydrological data, where the problem of ties if often ignored. The proposed methodologies can also be applied to high dimensional data. For this reason, we have shown in Corollary 2 that the pairwise composite applied to our bivariate pseudo-likelihood is valid. Finally, in a future work, we will also develop bootstrapping methods and formal tests of goodness-of-fit.

Acknowledgements

The first author is supported by the Fonds de recherche du Québec – Nature et technologies, the Fonds de recherche du Québec – santé, the École de santé publique de l’Université de Montréal, the Centre de recherche en santé publique (CRESP), and the Natural Sciences and Engineering Research Council of Canada. The second author is supported by the Natural Sciences and Engineering Research of Canada. We would also like to thank Professor Jenq-Tzong Shiau of National Cheng Kung University for sharing his data with us.

Appendix A Rank-based Z-estimators when margins are arbitrary

Suppose that 𝐗i=𝐅1(𝐔i)\mathbf{X}_{i}=\mathbf{F}^{-1}(\mathbf{U}_{i}), where 𝐔1,,𝐔n\mathbf{U}_{1},\ldots,\mathbf{U}_{n} are iid observations from copula C=C𝜽0C=C_{{\boldsymbol{\theta}}_{0}}. For any A{1,,d}A\subset\{1,\ldots,d\}, set JA(𝐱)=[jA𝕀{xj𝒜j}][jA𝕀{xj𝒜j}]J_{A}(\mathbf{x})=\left[\prod_{j\in A}\mathbb{I}\{x_{j}\in\mathcal{A}_{j}\}\right]\left[\prod_{j\in A^{\complement}}\mathbb{I}\{x_{j}\not\in\mathcal{A}_{j}\}\right]. Also, the set of atoms 𝒜j\mathcal{A}_{j} of FjF_{j} is assumed to be closed for any j{1,,d}j\in\{1,\ldots,d\}. This implies that if xnxx_{n}\downarrow x, and ΔFj(xn)>0\Delta F_{j}(x_{n})>0, then ΔFj(x)>0\Delta F_{j}(x)>0. Further set 𝐇n,A,𝜽(𝐱)=JA(𝐱)𝝋A,𝜽{𝐅n,A(𝐱),𝐅n,A(𝐱),𝐅n,A(𝐱)}=JA(𝐱)φA,𝜽{𝐅n,A(𝐱),𝐅n(𝐱)}\displaystyle\mathbf{H}_{n,A,{\boldsymbol{\theta}}}(\mathbf{x})=J_{A}(\mathbf{x}){\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\left\{\mathbf{F}_{n,A}(\mathbf{x}-),\mathbf{F}_{n,A}(\mathbf{x}),\mathbf{F}_{n,A^{\complement}}(\mathbf{x})\right\}=J_{A}(\mathbf{x})\varphi_{A,{\boldsymbol{\theta}}}\left\{\mathbf{F}_{n,A}(\mathbf{x}-),\mathbf{F}_{n}(\mathbf{x})\right\}, and 𝐇A,𝜽(𝐱)=JA(𝐱)𝝋A,𝜽{𝐅A(𝐱),𝐅A(𝐱),𝐅A(𝐱)}=JA(𝐱)𝝋A,𝜽{𝐅A(𝐱),𝐅(𝐱)}\displaystyle\mathbf{H}_{A,{\boldsymbol{\theta}}}(\mathbf{x})=J_{A}(\mathbf{x}){\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\left\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}_{A}(\mathbf{x}),\mathbf{F}_{A^{\complement}}(\mathbf{x})\right\}=\\ J_{A}(\mathbf{x}){\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\left\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}(\mathbf{x})\right\}, where 𝐅n,A(𝐱)={Fnj(xj):jA}\mathbf{F}_{n,A}(\mathbf{x}-)=\{F_{nj}(x_{j}-):j\in A\} and 𝐅A(𝐱)={Fj(xj):jA}\mathbf{F}_{A}(\mathbf{x}-)=\{F_{j}(x_{j}-):j\in A\}. Here, 𝝋A,𝜽{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}} is a p\mathbb{R}^{p}-value continuously differentiable function defined on {(𝐚,𝐛,𝐮):0aj<bj1,jA, 0<uj<1,jA}\{(\mathbf{a},\mathbf{b},\mathbf{u}):0\leq a_{j}<b_{j}\leq 1,\;j\in A,\;0<u_{j}<1,\;j\not\in A\}. To simplify notations, one also writes φA,𝜽(𝐚,𝐮)\varphi_{A,{\boldsymbol{\theta}}}\left(\mathbf{a},\mathbf{u}\right) if bj=ujb_{j}=u_{j} for all jAj\in A. Finally, set 𝒦=𝒦𝜽0\mathcal{K}=\mathcal{K}_{{\boldsymbol{\theta}}_{0}}, 𝐇n,𝜽(𝐱)=A{1,,d}𝐇n,A,𝜽(𝐱)\displaystyle\mathbf{H}_{n,{\boldsymbol{\theta}}}(\mathbf{x})=\sum_{A\subset\{1,\ldots,d\}}\mathbf{H}_{n,A,{\boldsymbol{\theta}}}(\mathbf{x}), and 𝐇𝜽(𝐱)=A{1,,d}𝐇A,𝜽(𝐱)\displaystyle\mathbf{H}_{\boldsymbol{\theta}}(\mathbf{x})=\sum_{A\subset\{1,\ldots,d\}}\mathbf{H}_{A,{\boldsymbol{\theta}}}(\mathbf{x}).

Example 4.

Take φA(𝐚,𝐛,𝐮)={jA(aj+bj212)}{jA(uj12)}\varphi_{A}(\mathbf{a},\mathbf{b},\mathbf{u})=\left\{\prod_{j\in A}\left(\frac{a_{j}+b_{j}}{2}-\dfrac{1}{2}\right)\right\}\left\{\prod_{j\in A^{\complement}}\left(u_{j}-\dfrac{1}{2}\right)\right\}. This can be used to define Spearman’s rho.

Example 5 (Estimation of copula parameter).

In this case, φA,𝜽\varphi_{A,{\boldsymbol{\theta}}} is defined by (4).

Next, define 𝐓n(𝜽)=𝐇n,𝜽𝑑𝒦n\mathbf{T}_{n}({\boldsymbol{\theta}})=\int\mathbf{H}_{n,{\boldsymbol{\theta}}}d\mathcal{K}_{n} and 𝝁(𝜽)=𝐇𝜽𝑑𝒦{\boldsymbol{\mu}}({\boldsymbol{\theta}})=\int\mathbf{H}_{\boldsymbol{\theta}}d\mathcal{K}. We are interested in the convergence in law of 𝕋n(𝜽)=n1/2{𝐓n(𝜽)𝝁(𝜽)}\mathbb{T}_{n}({\boldsymbol{\theta}})=n^{1/2}\{\mathbf{T}_{n}({\boldsymbol{\theta}})-{\boldsymbol{\mu}}({\boldsymbol{\theta}})\}, which is a multivariate extension of Ruymgaart et al., (1972) to arbitrary distributions, where Ruymgaart et al., (1972) considered the bivariate case with continuous margins, and where φ\varphi is a product and independent of 𝜽{\boldsymbol{\theta}}. Before stating the result, one needs to define the following class of functions.

Definition 2.

Let 𝒬p\mathcal{Q}_{p} is the set of all positive continuous functions qq on (0,1)p(0,1)^{p}, such that for any ω(0,1)\omega\in(0,1), the exists a constant c(ω)c(\omega) independent of 𝐮(0,1)p\mathbf{u}\in(0,1)^{p} so that

(5) supωujtjω(1uj)1tjj{1,,p}q(𝐭)c(ω)q(𝐮).\sup_{\begin{subarray}{c}\omega u_{j}\leq t_{j}\\ \omega(1-u_{j})\leq 1-t_{j}\\ j\in\{1,\ldots,p\}\end{subarray}}q(\mathbf{t})\leq c(\omega)q(\mathbf{u}).

Note that (5) is an extension of reproducing u-shaped functions defined in Tsukahara, (2005). One can see that 𝒬p\mathcal{Q}_{p} is closed under finite sums and finite products. In particular, if qj𝒬1q_{j}\in\mathcal{Q}_{1} for all j{1,,p}j\in\{1,\ldots,p\}, then q(𝐭)=j=1pqj(tj)𝒬p\displaystyle q(\mathbf{t})=\sum_{j=1}^{p}q_{j}(t_{j})\in\mathcal{Q}_{p} and q(𝐭)=j=1pqj(tj)𝒬p\displaystyle q(\mathbf{t})=\prod_{j=1}^{p}q_{j}(t_{j})\in\mathcal{Q}_{p}.

Remark 6.

Note that r(t)=r0(t)r1(1t)𝒬1r(t)=r_{0}(t)r_{1}(1-t)\in\mathcal{Q}_{1}, if r0,r1r_{0},r_{1} are positive, continuous, non-increasing functions such that ri(ωt)ci(ω)ri(t)r_{i}(\omega t)\leq c_{i}(\omega)r_{i}(t), ω,t(0,1)\omega,t\in(0,1), i{0,1}i\in\{0,1\}. For example, q(t)=1logtq(t)=1-\log{t} and q(t)=ta(1t)bq(t)=t^{-a}(1-t)^{-b} belong to 𝒬1\mathcal{Q}_{1} if a,b0a,b\geq 0. In Ruymgaart et al., (1972), it was assumed that a=ba=b.

For any γ(0,0.5)\gamma\in(0,0.5), set 𝒰γ={(0,b):γ<b<1γ}{(a,b):γ<a<b<1γ}{(a,1):γ<a<1γ}\displaystyle\mathcal{U}_{\gamma}=\{(0,b):\gamma<b<1-\gamma\}\cup\{(a,b):\gamma<a<b<1-\gamma\}\cup\{(a,1):\gamma<a<1-\gamma\}, and define rϵ(t)=tϵ(1t)ϵr_{\epsilon}(t)=t^{\epsilon}(1-t)^{\epsilon}, t[0,1]t\in[0,1]. The following hypotheses are needed in the proof.

Assumption 6.

For any A{1,,d}A\subset\{1,\ldots,d\}, γ(0,1/2)\gamma\in(0,1/2), there exists a neighborhood 𝒩\mathcal{N} of 𝛉0{\boldsymbol{\theta}}_{0} and a constant κγ,𝒩\kappa_{\gamma,\mathcal{N}} such that for any jAj\in A, lAl\in A^{\complement}, and 𝐮(0,1)A\mathbf{u}\in(0,1)^{A^{\complement}},

(6) sup𝜽𝒩sup(𝐚,𝐛)𝒰γ|A||𝝋A,𝜽(𝐚,𝐛,𝐮)|\displaystyle\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\sup_{(\mathbf{a},\mathbf{b})\in\mathcal{U}_{\gamma}^{|A|}}|{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}(\mathbf{a},\mathbf{b},\mathbf{u})| \displaystyle\leq κγ,𝒩qA(𝐮),\displaystyle\kappa_{\gamma,\mathcal{N}}\;q_{A^{\complement}}(\mathbf{u}),
(7) sup𝜽𝒩sup(𝐚,𝐛)𝒰γ|A||aj𝝋A,𝜽(𝐚,𝐛,𝐮)|\displaystyle\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\sup_{(\mathbf{a},\mathbf{b})\in\mathcal{U}_{\gamma}^{|A|}}|\partial_{a_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}(\mathbf{a},\mathbf{b},\mathbf{u})| \displaystyle\leq κγ,𝒩qA(𝐮),\displaystyle\kappa_{\gamma,\mathcal{N}}\;q_{A^{\complement}}(\mathbf{u}),
(8) sup𝜽𝒩sup(𝐚,𝐛)𝒰γ|A||bj𝝋A,𝜽(𝐚,𝐛,𝐮)|\displaystyle\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\sup_{(\mathbf{a},\mathbf{b})\in\mathcal{U}_{\gamma}^{|A|}}|\partial_{b_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}(\mathbf{a},\mathbf{b},\mathbf{u})| \displaystyle\leq κγ,𝒩qA(𝐮),\displaystyle\kappa_{\gamma,\mathcal{N}}\;q_{A^{\complement}}(\mathbf{u}),
(9) sup𝜽𝒩sup(𝐚,𝐛)𝒰γ|A||ul𝝋A,𝜽(𝐚,𝐛,𝐮)|\displaystyle\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\sup_{(\mathbf{a},\mathbf{b})\in\mathcal{U}_{\gamma}^{|A|}}|\partial_{u_{l}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}(\mathbf{a},\mathbf{b},\mathbf{u})| \displaystyle\leq κγ,𝒩ql,A(𝐮),\displaystyle\kappa_{\gamma,\mathcal{N}}\;q_{l,A^{\complement}}(\mathbf{u}),

where qA𝒬Aq_{A^{\complement}}\in\mathcal{Q}_{A^{\complement}} is integrable with respect to C=C𝛉0C=C_{{\boldsymbol{\theta}}_{0}}, and for some 0<ϵ<0.50<\epsilon<0.5, rϵ(ul)ql,Ar_{\epsilon}(u_{l})q_{l,A^{\complement}} is integrable with respect to CC and rϵ(ul)ql,A𝒬Ar_{\epsilon}(u_{l})q_{l,A^{\complement}}\in\mathcal{Q}_{A^{\complement}}. Also, sup𝛉𝒩|𝐇𝛉|\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}|\mathbf{H}_{\boldsymbol{\theta}}| is square integrable with respect to KK.

Remark 7.

If qA(𝐮)=jArj(uj)q_{A}(\mathbf{u})=\sum_{j\in A}r_{j}(u_{j}), with rjr_{j} integrable for any j{1,,d}j\in\{1,\ldots,d\}, then qAq_{A} is integrable with respect to any copula, since its margins are uniform! In the copula literature, when the margins are continuous, as well as in Ruymgaart et al., (1972) for 𝒩={𝜽0}\mathcal{N}=\{{\boldsymbol{\theta}}_{0}\}, all results about the convergence are proven on the assumption that q(𝐮)=j=1drj(uj)q(\mathbf{u})=\prod_{j=1}^{d}r_{j}(u_{j}), rjr_{j} symmetric, (Genest et al.,, 1995, Fermanian et al.,, 2004, Tsukahara,, 2005). In some cases, those hypotheses are too restrictive, for example for Clayton, Gaussian, and Gumbel families.

Before stating the main result, define 𝕋1,n(𝜽)=n1/2𝐇𝜽(d𝒦nd𝒦)\mathbb{T}_{1,n}({\boldsymbol{\theta}})=n^{1/2}\int\mathbf{H}_{\boldsymbol{\theta}}(d\mathcal{K}_{n}-d\mathcal{K}), and set 𝕋2,n(𝜽)=A{1,,d}n,A,𝜽𝑑𝒦\displaystyle\mathbb{T}_{2,n}({\boldsymbol{\theta}})=\sum_{A\in\{1,\ldots,d\}}\int\mathbb{H}_{n,A,{\boldsymbol{\theta}}}d\mathcal{K}, where

n,A,𝜽(𝐱)\displaystyle\mathbb{H}_{n,A,{\boldsymbol{\theta}}}(\mathbf{x}) =\displaystyle= JA(𝐱)jA𝔽nj(xj)bj𝝋A,𝜽{𝐅A(𝐱),𝐅(𝐱)}𝕀{Fnj(xj)<1}\displaystyle J_{A}(\mathbf{x})\sum_{j\in A}\mathbb{F}_{nj}(x_{j})\partial_{b_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}(\mathbf{x})\}\mathbb{I}\{F_{nj}(x_{j})<1\}
+JA(𝐱)jA𝔽nj(xj)aj𝝋A,𝜽{𝐅A(𝐱),𝐅(𝐱)}𝕀{Fnj(xj)>0}\displaystyle\qquad+J_{A}(\mathbf{x})\sum_{j\in A}\mathbb{F}_{nj}(x_{j}-)\partial_{a_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}(\mathbf{x})\}\mathbb{I}\{F_{nj}(x_{j}-)>0\}
+JA(𝐱)jA𝔽nj(xj)uj𝝋A,𝜽{𝐅A(𝐱),𝐅(𝐱)}.\displaystyle\qquad\qquad+J_{A}(\mathbf{x})\sum_{j\in A^{\complement}}\mathbb{F}_{nj}(x_{j})\partial_{u_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}(\mathbf{x})\}.
Theorem 2.

Under Assumption 6, there exists a neighborhood 𝒩\mathcal{N} of 𝛉0{\boldsymbol{\theta}}_{0} such that as nn\to\infty,

sup𝜽𝒩|𝕋n(𝜽)𝕋1,n(𝜽)𝕋2,n(𝜽)|Pr0.\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\left|\mathbb{T}_{n}({\boldsymbol{\theta}})-\mathbb{T}_{1,n}({\boldsymbol{\theta}})-\mathbb{T}_{2,n}({\boldsymbol{\theta}})\right|\stackrel{{\scriptstyle Pr}}{{\to}}0.

Furthermore, 𝕋1,n\mathbb{T}_{1,n}, 𝕋2,n\mathbb{T}_{2,n}, and 𝕋n\mathbb{T}_{n} converge jointly in C(𝒩)C(\mathcal{N}) to continuous centered Gaussian processes 𝕋1\mathbb{T}_{1}, 𝕋2\mathbb{T}_{2}, and 𝕋\mathbb{T}, where 𝕋=𝕋1+𝕋2\mathbb{T}=\mathbb{T}_{1}+\mathbb{T}_{2}, and

𝕋2(𝜽)=A{1,,d}jAJA(𝐱)𝔽j(xj)bj𝝋A,𝜽{𝐅A(𝐱),𝐅(𝐱)}d𝒦(𝐱)+A{1,,d}jAJA(𝐱)𝔽j(xj)aj𝝋A,𝜽{𝐅A(𝐱),𝐅(𝐱)}d𝒦(𝐱)+A{1,,d}jAJA(𝐱)𝔽j(xj)uj𝝋A,𝜽{𝐅A(𝐱),𝐅(𝐱)}d𝒦(𝐱).\mathbb{T}_{2}({\boldsymbol{\theta}})=\sum_{A\subset\{1,\ldots,d\}}\sum_{j\in A}\int J_{A}(\mathbf{x})\mathbb{F}_{j}(x_{j})\partial_{b_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}(\mathbf{x})\}d\mathcal{K}(\mathbf{x})\\ +\sum_{A\subset\{1,\ldots,d\}}\sum_{j\in A}\int J_{A}(\mathbf{x})\mathbb{F}_{j}(x_{j}-)\partial_{a_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}(\mathbf{x})\}d\mathcal{K}(\mathbf{x})\\ +\sum_{A\subset\{1,\ldots,d\}}\sum_{j\in A^{\complement}}\int J_{A}(\mathbf{x})\mathbb{F}_{j}(x_{j})\partial_{u_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}(\mathbf{x})\}d\mathcal{K}(\mathbf{x}).
Proof.

Set Sγ={𝐱=(x1,,xd)d:γFj(xj),Fj(xj)1γ}S_{\gamma}=\left\{\mathbf{x}=(x_{1},\ldots,x_{d})\in\mathbb{R}^{d}:\gamma\leq F_{j}(x_{j}),F_{j}(x_{j}-)\leq 1-\gamma\right\}. On SγS_{\gamma} and 𝜽𝒩{\boldsymbol{\theta}}\in\mathcal{N}, for some neighborhood 𝒩\mathcal{N} of 𝜽0{\boldsymbol{\theta}}_{0}, n1/2(𝐇n,𝜽𝐇𝜽)n,𝜽n^{1/2}(\mathbf{H}_{n,{\boldsymbol{\theta}}}-\mathbf{H}_{\boldsymbol{\theta}})-\mathbb{H}_{n,{\boldsymbol{\theta}}} converges uniformly in probability to 0. As a result,

𝕋n(𝜽)\displaystyle\mathbb{T}_{n}({\boldsymbol{\theta}}) =\displaystyle= n1/2{𝐓n(𝜽)𝝁(𝜽)}=𝕋1,n(𝜽)+n1/2(𝐇n,𝜽𝐇𝜽)𝑑𝒦n\displaystyle n^{1/2}\{\mathbf{T}_{n}({\boldsymbol{\theta}})-{\boldsymbol{\mu}}({\boldsymbol{\theta}})\}=\mathbb{T}_{1,n}({\boldsymbol{\theta}})+n^{1/2}\int(\mathbf{H}_{n,{\boldsymbol{\theta}}}-\mathbf{H}_{\boldsymbol{\theta}})d\mathcal{K}_{n}
=\displaystyle= 𝕋1,n(𝜽)+𝕋2,n(𝜽)+Sγ{n1/2(𝐇n,𝜽𝐇𝜽)n,𝜽}𝑑𝒦n+n1/2Sγ(𝐇n,𝜽𝐇𝜽)𝑑𝒦n\displaystyle\mathbb{T}_{1,n}({\boldsymbol{\theta}})+\mathbb{T}_{2,n}({\boldsymbol{\theta}})+\int_{S_{\gamma}}\left\{n^{1/2}(\mathbf{H}_{n,{\boldsymbol{\theta}}}-\mathbf{H}_{\boldsymbol{\theta}})-\mathbb{H}_{n,{\boldsymbol{\theta}}}\right\}d\mathcal{K}_{n}+n^{1/2}\int_{S_{\gamma}^{\complement}}(\mathbf{H}_{n,{\boldsymbol{\theta}}}-\mathbf{H}_{\boldsymbol{\theta}})d\mathcal{K}_{n}
+Sγn,𝜽(d𝒦nd𝒦)Sγn,𝜽𝑑𝒦\displaystyle\qquad+\int_{S_{\gamma}}\mathbb{H}_{n,{\boldsymbol{\theta}}}(d\mathcal{K}_{n}-d\mathcal{K})-\int_{S_{\gamma}^{\complement}}\mathbb{H}_{n,{\boldsymbol{\theta}}}d\mathcal{K}
=\displaystyle= 𝕋1,n(𝜽)+𝕋2,n(𝜽)+n1/2Sγ(𝐇n,𝜽𝐇𝜽)𝑑𝒦nSγn,𝜽𝑑𝒦+oP(1).\displaystyle\mathbb{T}_{1,n}({\boldsymbol{\theta}})+\mathbb{T}_{2,n}({\boldsymbol{\theta}})+n^{1/2}\int_{S_{\gamma}^{\complement}}(\mathbf{H}_{n,{\boldsymbol{\theta}}}-\mathbf{H}_{\boldsymbol{\theta}})d\mathcal{K}_{n}-\int_{S_{\gamma}^{\complement}}\mathbb{H}_{n,{\boldsymbol{\theta}}}d\mathcal{K}+o_{P}(1).

For ϵ(0,1/2)\epsilon\in\left(0,1/2\right) fixed, set n,M,ϵ=j=1d{sup0<u<1|βn,j(u)|r0ϵ(u)M}{sup0<u<1|βn,j(u)|M}\mathcal{B}_{n,M,\epsilon}=\bigcap_{j=1}^{d}\left\{\sup_{0<u<1}\dfrac{|\beta_{n,j}(u)|}{r_{0}^{\epsilon}(u)}\leq M\right\}\cap\left\{\sup_{0<u<1}|\beta_{n,j}(u)|\leq M\right\}, where r0(u)=u(1u)r_{0}(u)=u(1-u), u[0,1]u\in[0,1]. Then, for δ>0\delta>0 given, one can find MM so that for any n1n\geq 1, P(n,M,ϵ)>1δ/2P(\mathcal{B}_{n,M,\epsilon})>1-\delta/2. Then, setting D1,n=sup𝜽𝒩Sγ|n,𝜽|𝑑𝒦D_{1,n}=\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\displaystyle\int_{S_{\gamma}^{\complement}}\left|\mathbb{H}_{n,{\boldsymbol{\theta}}}\right|d\mathcal{K}, one gets

P(|D1,n|>δ)δ/2+P(n,M,ϵ{|D1,n|>δ})δ/2+A{1,,d}{jA(|BA,j1|+|BA,j2|)+jA|BA,j3|},P\left(\left|D_{1,n}\right|>\delta\right)\leq\delta/2+P\left(\mathcal{B}_{n,M,\epsilon}\cap\left\{\left|D_{1,n}\right|>\delta\right\}\right)\\ \leq\delta/2+\sum_{A\in\{1,\ldots,d\}}\left\{\sum_{j\in A}(|B_{A,j1}|+|B_{A,j2}|)+\sum_{j\in A^{\complement}}|B_{A,j3}|\right\},

where, for any j{1,,d}j\in\{1,\ldots,d\},

BA,j1=MSγJA(𝐱)sup𝜽𝒩|bj𝝋A,𝜽{𝐅A(𝐱),𝐅(𝐱)}|𝕀{Fj(xj)<1}d𝒦(𝐱),B_{A,j1}=M\int_{S_{\gamma}^{\complement}}J_{A}(\mathbf{x})\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\left|\partial_{b_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}(\mathbf{x})\}\right|\mathbb{I}\{F_{j}(x_{j})<1\}d\mathcal{K}(\mathbf{x}),
BA,j2=MSγJA(𝐱)sup𝜽𝒩|bj𝝋A,𝜽{𝐅A(𝐱),𝐅(𝐱)}|𝕀{Fj(xj)>0}d𝒦(𝐱),B_{A,j2}=M\int_{S_{\gamma}^{\complement}}J_{A}(\mathbf{x})\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\left|\partial_{b_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}(\mathbf{x})\}\right|\mathbb{I}\{F_{j}(x_{j}-)>0\}d\mathcal{K}(\mathbf{x}),
BA,j3=SγJA(𝐱)r0ϵ{Fj(xj)}sup𝜽𝒩|uj𝝋A,𝜽{𝐅A(𝐱),𝐅(𝐱)}|d𝒦(𝐱),B_{A,j3}=\int_{S_{\gamma}^{\complement}}J_{A}(\mathbf{x})r_{0}^{\epsilon}\{F_{j}(x_{j})\}\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\left|\partial_{u_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}(\mathbf{x})\}\right|d\mathcal{K}(\mathbf{x}),

since 𝔽nj(xj±)=βnjFj(xj±)\mathbb{F}_{nj}(x_{j}\pm)=\beta_{nj}\circ F_{j}(x_{j}\pm) a.s., j{1,,d}j\in\{1,\ldots,d\}. By hypothesis (7)–(8), it follows that for i{1,2}i\in\{1,2\}, 𝐔C=C𝜽0\mathbf{U}\sim C=C_{{\boldsymbol{\theta}}_{0}}, BA,jiMκγ,𝒩E{𝕀(𝐔[γ,1γ])qA(𝐔)}\displaystyle B_{A,ji}\leq M\kappa_{\gamma,\mathcal{N}}E\left\{\mathbb{I}\left(\mathbf{U}\in[\gamma,1-\gamma]^{\complement}\right)\;q_{A^{\complement}}(\mathbf{U})\right\}. By hypothesis, qAq_{A^{\complement}} is integrable with respect to CC. Also, by hypothesis (9), for 𝐔C\mathbf{U}\sim C, BA,j3Mκγ,𝒩E{𝕀(𝐔[γ,1γ])r0ϵ(Uj)qj,A(𝐔)}\displaystyle B_{A,j3}\leq M\kappa_{\gamma,\mathcal{N}}E\left\{\mathbb{I}\left(\mathbf{U}\in[\gamma,1-\gamma]^{\complement}\right)r_{0}^{\epsilon}(U_{j})q_{j,A^{\complement}}(\mathbf{U})\right\}, and by hypothesis, one can find ϵ(0,0.5)\epsilon\in(0,0.5) so that for any A{1,,d}A\subset\{1,\ldots,d\} and jAj\in A^{\complement}, r0ϵ(Uj)qj,A(𝐔)r_{0}^{\epsilon}(U_{j})q_{j,A^{\complement}}(\mathbf{U}) is integrable with respect to CC. As a result, for i{1,2,3}i\in\{1,2,3\}, BA,j,i<2d2δB_{A,j,i}<2^{-d-2}\delta if γ\gamma is small enough. Therefore, P(|D1,n|>δ)<δP\left(\left|D_{1,n}\right|>\delta\right)<\delta by choosing ϵ\epsilon, MM, and γ\gamma appropriately. Next, set

D2n=sup𝜽𝒩n1/2|Sγ(Hn,𝜽H𝜽)𝑑𝒦n|A{1,,d}j=15D2,A,nj,D_{2n}=\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}n^{1/2}\left|\int_{S_{\gamma}^{\complement}}(H_{n,{\boldsymbol{\theta}}}-H_{\boldsymbol{\theta}})d\mathcal{K}_{n}\right|\leq\sum_{A\subset\{1,\ldots,d\}}\sum_{j=1}^{5}D_{2,A,nj},

with

D2,A,n1\displaystyle D_{2,A,n1} =\displaystyle= 1njAi=1nJA(𝐗i)𝕀(𝐗iSγ)|𝔽nj(Xij)|sup𝜽𝒩|bj𝝋A,𝜽(𝐯~n,i,𝐮~n,i)|𝕀{Fj(Xij)=0},\displaystyle\dfrac{1}{n}\sum_{j\in A}\sum_{i=1}^{n}J_{A}(\mathbf{X}_{i})\mathbb{I}(\mathbf{X}_{i}\not\in S_{\gamma})|\mathbb{F}_{nj}(X_{ij})|\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\left|\partial_{b_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\left(\tilde{\mathbf{v}}_{n,i},\tilde{\mathbf{u}}_{n,i}\right)\right|\mathbb{I}\{F_{j}(X_{ij}-)=0\},
D2,A,n2\displaystyle D_{2,A,n2} =\displaystyle= 1njAi=1nJA(𝐗i)𝕀(𝐗iSγ)|𝔽nj(Xij)|sup𝜽𝒩|bj𝝋A,𝜽(𝐯~n,i,𝐮~n,i)|𝕀{Fj(Xij)<1}\displaystyle\dfrac{1}{n}\sum_{j\in A}\sum_{i=1}^{n}J_{A}(\mathbf{X}_{i})\mathbb{I}(\mathbf{X}_{i}\not\in S_{\gamma})|\mathbb{F}_{nj}(X_{ij})|\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\left|\partial_{b_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\left(\tilde{\mathbf{v}}_{n,i},\tilde{\mathbf{u}}_{n,i}\right)\right|\mathbb{I}\{F_{j}(X_{ij})<1\}
D2,A,n3\displaystyle D_{2,A,n3} =\displaystyle= 1njAi=1nJA(𝐗i)𝕀(𝐗iSγ)|𝔽nj(Xij)|sup𝜽𝒩|aj𝝋A,𝜽(𝐯~n,i,𝐮~n,i)|𝕀{Fj(Xij)=1}\displaystyle\dfrac{1}{n}\sum_{j\in A}\sum_{i=1}^{n}J_{A}(\mathbf{X}_{i})\mathbb{I}(\mathbf{X}_{i}\not\in S_{\gamma})|\mathbb{F}_{nj}(X_{ij}-)|\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\left|\partial_{a_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\left(\tilde{\mathbf{v}}_{n,i},\tilde{\mathbf{u}}_{n,i}\right)\right|\mathbb{I}\{F_{j}(X_{ij})=1\}
D2,A,n4\displaystyle D_{2,A,n4} =\displaystyle= 1njAi=1nJA(𝐗i)𝕀(𝐗iSγ)|𝔽nj(Xij)|sup𝜽𝒩|aj𝝋A,𝜽(𝐯~n,i,𝐮~n,i)|𝕀{Fj(Xij)>0},\displaystyle\dfrac{1}{n}\sum_{j\in A}\sum_{i=1}^{n}J_{A}(\mathbf{X}_{i})\mathbb{I}(\mathbf{X}_{i}\not\in S_{\gamma})|\mathbb{F}_{nj}(X_{ij}-)|\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\left|\partial_{a_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\left(\tilde{\mathbf{v}}_{n,i},\tilde{\mathbf{u}}_{n,i}\right)\right|\mathbb{I}\{F_{j}(X_{ij}-)>0\},
D2,A,n5\displaystyle D_{2,A,n5} =\displaystyle= 1njAi=1nJA(𝐗i)𝕀(𝐗iSγ)|𝔽nj(Xij)|sup𝜽𝒩|uj𝝋A,𝜽(𝐯~n,i,𝐮~n,i)|,\displaystyle\dfrac{1}{n}\sum_{j\in A^{\complement}}\sum_{i=1}^{n}J_{A}(\mathbf{X}_{i})\mathbb{I}(\mathbf{X}_{i}\not\in S_{\gamma})|\mathbb{F}_{nj}(X_{ij})|\sup_{{\boldsymbol{\theta}}\in\mathcal{N}}\left|\partial_{u_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\left(\tilde{\mathbf{v}}_{n,i},\tilde{\mathbf{u}}_{n,i}\right)\right|,

where, for any i{1,,n}i\in\{1,\ldots,n\}, there exists λi(0,1)\lambda_{i}\in(0,1) such that 𝐮~n,i=λi𝐅(𝐗i)+(1λi)𝐅n(𝐗i)\tilde{\mathbf{u}}_{n,i}=\lambda_{i}\mathbf{F}(\mathbf{X}_{i})+(1-\lambda_{i})\mathbf{F}_{n}(\mathbf{X}_{i}) and 𝐯~n,i=λi𝐅(𝐗i)+(1λi)𝐅n(𝐗i)\tilde{\mathbf{v}}_{n,i}=\lambda_{i}\mathbf{F}(\mathbf{X}_{i}-)+(1-\lambda_{i})\mathbf{F}_{n}(\mathbf{X}_{i}-). Now, for δ>0\delta>0, one can ω(0,1)\omega\in(0,1) such that P(𝒟n)>1δ/2P(\mathcal{D}_{n})>1-\delta/2, where 𝒟n=j=1di=1{ωFj(Xij)Fnj(Xij)1ωF¯j(Xij)}\displaystyle\mathcal{D}_{n}=\cap_{j=1}^{d}\cap_{i=1}\left\{\omega F_{j}(X_{ij})\leq F_{nj}(X_{ij})\leq 1-\omega\bar{F}_{j}(X_{ij})\right\}, and F¯j(xj)=1Fj(xj)\bar{F}_{j}(x_{j})=1-F_{j}(x_{j}), j{1,,d}j\in\{1,\ldots,d\}. As a result, on 𝒟n\mathcal{D}_{n}, ω𝐅(𝐗i)𝐮~ni1ω𝐅¯(Xi)\omega\mathbf{F}(\mathbf{X}_{i})\leq\tilde{\mathbf{u}}_{ni}\leq 1-\omega\bar{\mathbf{F}}(X_{i}). It then follows that ω𝐅(𝐗i)𝐯~ni1ω𝐅¯(Xi)\omega\mathbf{F}(\mathbf{X}_{i}-)\leq\tilde{\mathbf{v}}_{ni}\leq 1-\omega\bar{\mathbf{F}}(X_{i}-). Hence, since qj,A𝒬Aq_{j,A^{\complement}}\in\mathcal{Q}_{A^{\complement}}, for any j{1,,d}j\in\{1,\ldots,d\} with Xij𝒜jX_{ij}\not\in\mathcal{A}_{j}, one has qj,A(u~n,i)c(ω)qj,A(Uj)q_{j,A^{\complement}}(\tilde{u}_{n,i})\leq c(\omega)q_{j,A^{\complement}}(U_{j}). It then follows that on 𝒟nn,M,ϵ\mathcal{D}_{n}\cap\mathcal{B}_{n,M,\epsilon}, j=14D2,A,njc(ω)Mκγ,𝒩1ni=1n𝕀(𝐔i[γ,1γ])qA(𝐔i)\displaystyle\sum_{j=1}^{4}D_{2,A,nj}\leq c(\omega)M\kappa_{\gamma,\mathcal{N}}\frac{1}{n}\sum_{i=1}^{n}\mathbb{I}\left(\mathbf{U}_{i}\in[\gamma,1-\gamma]^{\complement}\right)q_{A^{\complement}}(\mathbf{U}_{i}), which can be made arbitrarily small by the strong law of large numbers since qA(𝐔)q_{A^{\complement}}(\mathbf{U}) is integrable with respect to CC. Finally, D2,A,n5Mκγ,𝒩1ni=1n𝕀(𝐔i[γ,1γ])jAr0ϵ(Uij)qj,A(𝐔i)\displaystyle D_{2,A,n5}\leq M\kappa_{\gamma,\mathcal{N}}\frac{1}{n}\sum_{i=1}^{n}\mathbb{I}\left(\mathbf{U}_{i}\in[\gamma,1-\gamma]^{\complement}\right)\sum_{j\in A^{\complement}}r_{0}^{\epsilon}(U_{ij})q_{j,A^{\complement}}(\mathbf{U}_{i}), and by hypothesis, one can find ϵ(0,0.5)\epsilon\in(0,0.5) so that for any A{1,,d}A\subset\{1,\ldots,d\} and jAj\in A^{\complement}, r0ϵ(Uj)qj,A(𝐔)r_{0}^{\epsilon}(U_{j})q_{j,A^{\complement}}(\mathbf{U}) is integrable with respect to CC. One may then conclude that for δ>0\delta>0, one can find ω,γ,M,𝒩\omega,\gamma,M,\mathcal{N} so that P(|D2,n|>δ)<δP(|D_{2,n}|>\delta)<\delta. The rest of the proof follows easily. ∎

The following assumption is needed for proving convergence of estimators.

Assumption 7.

There exists a neighborhood 𝒩\mathcal{N} of 𝛉0{\boldsymbol{\theta}}_{0} such that 𝛍(𝛉0)=0{\boldsymbol{\mu}}({\boldsymbol{\theta}}_{0})=0 and the Jacobian 𝛍˙(𝛉)\dot{\boldsymbol{\mu}}({\boldsymbol{\theta}}) exists, is continuous, and is positive definite or negative definite at 𝛉0{\boldsymbol{\theta}}_{0}.

Remark 8.

From Proposition 1, there is a neighborhood 𝒩\mathcal{N} of 𝜽0{\boldsymbol{\theta}}_{0} on which 𝝁{\boldsymbol{\mu}} is injective, and 𝝁˙(𝜽)\dot{\boldsymbol{\mu}}({\boldsymbol{\theta}}) is either positive definite or negative definite for all 𝜽{\boldsymbol{\theta}}\in\mathcal{B}. In particular, 𝝁{\boldsymbol{\mu}} has a unique zero in 𝒩\mathcal{N}.

We can now prove the following general convergence result for rank-based Z-estimators.

Theorem 3.

Under Assumptions 6 and 7, if 𝛉n{\boldsymbol{\theta}}_{n} satisfies 𝐓n(𝛉n)=0\mathbf{T}_{n}({\boldsymbol{\theta}}_{n})=0, then 𝚯n=n1/2(𝛉n𝛉0){\boldsymbol{\Theta}}_{n}=n^{1/2}({\boldsymbol{\theta}}_{n}-{\boldsymbol{\theta}}_{0}) converges in law to 𝚯={𝛍˙(𝛉0)}1𝕋(𝛉0){\boldsymbol{\Theta}}=-\{\dot{\boldsymbol{\mu}}({\boldsymbol{\theta}}_{0})\}^{-1}\mathbb{T}({\boldsymbol{\theta}}_{0}).

Proof.

Suppose that 𝜽n{\boldsymbol{\theta}}_{n} is such that 𝐓n(𝜽n)=0\mathbf{T}_{n}({\boldsymbol{\theta}}_{n})=0 and suppose that 𝜽nk{\boldsymbol{\theta}}_{n_{k}} is a sub-sequence converging to 𝜽{\boldsymbol{\theta}}^{\star}. Such a sub-sequence exists by choosing a bounded neighborhood 𝒩\mathcal{N} of 𝜽0{\boldsymbol{\theta}}_{0}. Then, on one hand, 𝝁(𝜽nk)𝝁(𝜽){\boldsymbol{\mu}}\left({\boldsymbol{\theta}}_{n_{k}}\right)\to{\boldsymbol{\mu}}\left({\boldsymbol{\theta}}^{\star}\right). On the other hand, from Theorem 2, 𝝁(𝜽nk)=𝕋nk(𝜽nk)nkPr0{\boldsymbol{\mu}}\left({\boldsymbol{\theta}}_{n_{k}}\right)=-\frac{\mathbb{T}_{n_{k}}({\boldsymbol{\theta}}_{n_{k}})}{\sqrt{n_{k}}}\stackrel{{\scriptstyle Pr}}{{\to}}0. As a result, 𝝁(𝜽)=0{\boldsymbol{\mu}}\left({\boldsymbol{\theta}}^{\star}\right)=0 and by Remark 8, 𝜽=𝜽0{\boldsymbol{\theta}}^{\star}={\boldsymbol{\theta}}_{0}. Since every subsequence converges to the same limit, it follows that 𝜽n{\boldsymbol{\theta}}_{n} converges in probability to 𝜽0{\boldsymbol{\theta}}_{0}. Using Theorem 2 again, on one hand, one may conclude that n1/2𝝁(𝜽n)=𝕋n(𝜽n)𝕋(𝜽0)-n^{1/2}{\boldsymbol{\mu}}({\boldsymbol{\theta}}_{n})=\mathbb{T}_{n}({\boldsymbol{\theta}}_{n})\rightsquigarrow\mathbb{T}({\boldsymbol{\theta}}_{0}). On the other hand, for some λn[0,1]\lambda_{n}\in[0,1], one gets that

n1/2𝝁(𝜽n)=n1/2{𝝁(𝜽n)𝝁(𝜽0)}=𝝁˙(𝜽~n)𝚯n,n^{1/2}{\boldsymbol{\mu}}({\boldsymbol{\theta}}_{n})=n^{1/2}\left\{{\boldsymbol{\mu}}({\boldsymbol{\theta}}_{n})-{\boldsymbol{\mu}}({\boldsymbol{\theta}}_{0})\right\}=\dot{\boldsymbol{\mu}}(\tilde{\boldsymbol{\theta}}_{n}){\boldsymbol{\Theta}}_{n},

where 𝜽~n=𝜽0+λn(𝜽n𝜽0)Pr𝜽0\tilde{\boldsymbol{\theta}}_{n}={\boldsymbol{\theta}}_{0}+\lambda_{n}({\boldsymbol{\theta}}_{n}-{\boldsymbol{\theta}}_{0})\stackrel{{\scriptstyle Pr}}{{\to}}{\boldsymbol{\theta}}_{0}. By Assumption 7, 𝝁˙(𝜽~n)Pr𝝁˙(𝜽0)\dot{\boldsymbol{\mu}}(\tilde{\boldsymbol{\theta}}_{n})\stackrel{{\scriptstyle Pr}}{{\to}}\dot{\boldsymbol{\mu}}({\boldsymbol{\theta}}_{0}) is positive definite. As a result, 𝚯n{𝝁˙(𝜽0)}1𝕋(𝜽0){\boldsymbol{\Theta}}_{n}\rightsquigarrow-\{\dot{\boldsymbol{\mu}}({\boldsymbol{\theta}}_{0})\}^{-1}\mathbb{T}({\boldsymbol{\theta}}_{0}). ∎

Appendix B Application to copula families

In the case of estimation of copula parameters, i.e., when 𝝋A,𝜽{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}} is defined by formula (4), set 𝒲1=𝕋1(𝜽0)\mathcal{W}_{1}=\mathbb{T}_{1}({\boldsymbol{\theta}}_{0}) and 𝒲2=𝕋2=(𝜽0)\mathcal{W}_{2}=\mathbb{T}_{2}=({\boldsymbol{\theta}}_{0}). For any j{1,,d}j\in\{1,\ldots,d\}, define j=y𝒜jj(y)\displaystyle\mathcal{I}_{j}=\cup_{y\in\mathcal{A}_{j}}\mathcal{I}_{j}(y), and j(y)=(Fj(y),Fj(y)]\mathcal{I}_{j}(y)=(F_{j}({y}-),F_{j}(y)]. Next, for xj𝒜jx_{j}\in\mathcal{A}_{j}, jAj\in A, set GA,𝜽(𝐱,𝐮)=(0,1)d[𝕀{vjj(xj)]c𝜽((𝐯,𝐮)A)d𝐯\displaystyle G_{A,{\boldsymbol{\theta}}}(\mathbf{x},\mathbf{u})=\int_{(0,1)^{d}}\left[\mathbb{I}\{v_{j}\in\mathcal{I}_{j}(x_{j})\right]c_{\boldsymbol{\theta}}\left((\mathbf{v},\mathbf{u})^{A}\right)d\mathbf{v}, where (𝐯,𝐮)jA=vj(\mathbf{v},\mathbf{u})^{A}_{j}=v_{j} if jAj\in A and (𝐯,𝐮)jA=uj(\mathbf{v},\mathbf{u})^{A}_{j}=u_{j} if jAj\in A^{\complement}. Then

(10) GA,𝜽(𝐱,𝐮)=BA(1)|B|AC𝜽{(𝐱,𝐮)(B,A)},G_{A,{\boldsymbol{\theta}}}(\mathbf{x},\mathbf{u})=\sum_{B\subset A}(-1)^{|B|}\partial_{A^{\complement}}C_{\boldsymbol{\theta}}\left\{(\mathbf{x},\mathbf{u})^{(B,A)}\right\},

where ((𝐱,𝐮)(B,A))j={Fj(xj),jB;Fj(xj),jAB;uj,jA.\left((\mathbf{x},\mathbf{u})^{(B,A)}\right)_{j}=\left\{\begin{array}[]{ll}F_{j}({x_{j}}-),&j\in B;\\ F_{j}(x_{j}),&j\in A\setminus B;\\ u_{j},&j\in A^{\complement}.\end{array}\right..
Note that if G~A,𝜽0(𝐱,𝐮)=GA,𝜽0(𝐱,𝐮)[kA𝕀{ukk}]\tilde{G}_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})=G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})\left[\prod_{k\in A^{\complement}}\mathbb{I}\{u_{k}\not\in\mathcal{I}_{k}\}\right], then using the previous notations, one gets

A,𝜽𝑑𝒦=jA𝐱×kA𝒜k𝔹jFj(xj)bj𝝋A,𝜽{𝐅A(𝐱),𝐅A(𝐱),𝐮}G~A,𝜽0(𝐱,𝐮)d𝐮+jA𝐱×kA𝒜k𝔹jFj(xj)aj𝝋A,𝜽{𝐅A(𝐱),𝐅A(𝐱),𝐮}G~A,𝜽0(𝐱,𝐮)d𝐮+jA𝐱×kA𝒜k𝔹j(uj)uj𝝋A,𝜽{𝐅A(𝐱),𝐅A(𝐱),𝐮}G~A,𝜽0(𝐱,𝐮)d𝐮.\int\mathbb{H}_{A,{\boldsymbol{\theta}}}d\mathcal{K}=\sum_{j\in A}\sum_{\displaystyle\mathbf{x}\in\bigtimes_{k\in A}\mathcal{A}_{k}}\mathbb{B}_{j}\circ F_{j}(x_{j})\int\partial_{b_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}_{A}(\mathbf{x}),\mathbf{u}\}\tilde{G}_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})d\mathbf{u}\\ +\sum_{j\in A}\sum_{\displaystyle\mathbf{x}\in\bigtimes_{k\in A}\mathcal{A}_{k}}\mathbb{B}_{j}\circ F_{j}(x_{j}-)\int\partial_{a_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}_{A}(\mathbf{x}),\mathbf{u}\}\tilde{G}_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})d\mathbf{u}\\ +\sum_{j\in A^{\complement}}\sum_{\displaystyle\mathbf{x}\in\bigtimes_{k\in A}\mathcal{A}_{k}}\int\mathbb{B}_{j}(u_{j})\partial_{u_{j}}{\boldsymbol{\varphi}}_{A,{\boldsymbol{\theta}}}\{\mathbf{F}_{A}(\mathbf{x}-),\mathbf{F}_{A}(\mathbf{x}),\mathbf{u}\}\tilde{G}_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})d\mathbf{u}.

Before stating the next result, we define additional functions that will appear in the expression of 𝕋2\mathbb{T}_{2}.

B.1. Auxiliary functions

For 𝐮=(u1,,ud)\mathbf{u}=(u_{1},\ldots,u_{d}), 𝐱=(x1,,xd)\mathbf{x}=(x_{1},\ldots,x_{d}), and jAj\in A^{\complement}, define JA(𝐮,𝐱)=[kA𝕀{ukk(xk)}][kA𝕀{ukk}]J_{A}(\mathbf{u},\mathbf{x})=\left[\prod_{k\in A}\mathbb{I}\{u_{k}\in\mathcal{I}_{k}(x_{k})\}\right]\left[\prod_{k\in A^{\complement}}\mathbb{I}\{u_{k}\not\in\mathcal{I}_{k}\}\right], and set 𝜼A,j(𝐮)=𝜼A,j,1(𝐮)𝜼A,j,2(𝐮){\boldsymbol{\eta}}_{A,j}(\mathbf{u})={\boldsymbol{\eta}}_{A,j,1}(\mathbf{u})-{\boldsymbol{\eta}}_{A,j,2}(\mathbf{u}), where

𝜼A,j,1(𝐮)\displaystyle{\boldsymbol{\eta}}_{A,j,1}(\mathbf{u}) =\displaystyle= 𝐱×kA𝒜kujG˙A,𝜽0(𝐱,𝐮)GA,𝜽0(𝐱,𝐮)JA(𝐮,𝐱),\displaystyle\sum_{\displaystyle\mathbf{x}\in\bigtimes_{k\in A}\mathcal{A}_{k}}\dfrac{\partial_{u_{j}}\dot{G}_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})}{G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})}J_{A}(\mathbf{u},\mathbf{x}),
𝜼A,j,2(𝐮)\displaystyle{\boldsymbol{\eta}}_{A,j,2}(\mathbf{u}) =\displaystyle= 𝐱×kA𝒜kG˙A,𝜽0(𝐱,𝐮)ujGA,𝜽0(𝐱,𝐮)GA,𝜽02(𝐱,𝐮)JA(𝐮,𝐱).\displaystyle\sum_{\displaystyle\mathbf{x}\in\bigtimes_{k\in A}\mathcal{A}_{k}}\dfrac{\dot{G}_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})\partial_{u_{j}}G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})}{G_{A,{\boldsymbol{\theta}}_{0}}^{2}(\mathbf{x},\mathbf{u})}J_{A}(\mathbf{u},\mathbf{x}).

Next, for jAj\in A and xj𝒜jx_{j}\in\mathcal{A}_{j}, define 𝜼A,j,+(xj,𝐮)=𝜼A,j,1+(xj,𝐮)𝜼A,j,2+(xj,𝐮){\boldsymbol{\eta}}_{A,j,+}(x_{j},\mathbf{u})={\boldsymbol{\eta}}_{A,j,1+}(x_{j},\mathbf{u})-{\boldsymbol{\eta}}_{A,j,2+}(x_{j},\mathbf{u}) and 𝜼A,j,(xj,𝐮)=𝜼A,j,1(xj,𝐮)𝜼A,j,2(xj,𝐮){\boldsymbol{\eta}}_{A,j,-}(x_{j},\mathbf{u})={\boldsymbol{\eta}}_{A,j,1-}(x_{j},\mathbf{u})-{\boldsymbol{\eta}}_{A,j,2-}(x_{j},\mathbf{u}), where

𝜼A,j,1+(xj,𝐮)\displaystyle{\boldsymbol{\eta}}_{A,j,1+}(x_{j},\mathbf{u}) =\displaystyle= 𝐱×kA{j}𝒜kBA{j}(1)|B|ujAC˙𝜽0((𝐱,𝐮)(B,A))GA,𝜽0(𝐱,𝐮)JA(𝐮,𝐱),\displaystyle\sum_{\displaystyle\mathbf{x}\in\bigtimes_{k\in A\setminus\{j\}}\mathcal{A}_{k}}\dfrac{\displaystyle\sum_{B\subset A\setminus\{j\}}(-1)^{|B|}\partial_{u_{j}}\partial_{A^{\complement}}\dot{C}_{{\boldsymbol{\theta}}_{0}}\left((\mathbf{x},\mathbf{u})^{(B,A)}\right)}{G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})}J_{A}(\mathbf{u},\mathbf{x}),
𝜼A,j,2+(xj,𝐮)\displaystyle{\boldsymbol{\eta}}_{A,j,2+}(x_{j},\mathbf{u}) =\displaystyle= 𝐱×kA{j}𝒜kLA(𝐱,𝐮)GA,𝜽0(𝐱,𝐮)BA{j}(1)|B|ujAC𝜽0((𝐱,𝐮)(B,A))\displaystyle\sum_{\displaystyle\mathbf{x}\in\bigtimes_{k\in A\setminus\{j\}}\mathcal{A}_{k}}\dfrac{L_{A}(\mathbf{x},\mathbf{u})}{G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})}\sum_{B\subset A\setminus\{j\}}(-1)^{|B|}\partial_{u_{j}}\partial_{A^{\complement}}C_{{\boldsymbol{\theta}}_{0}}\left((\mathbf{x},\mathbf{u})^{(B,A)}\right)
×JA(𝐮,𝐱),\displaystyle\qquad\qquad\times J_{A}(\mathbf{u},\mathbf{x}),
𝜼A,j,1(xj,𝐮)\displaystyle{\boldsymbol{\eta}}_{A,j,1-}(x_{j},\mathbf{u}) =\displaystyle= 𝐱×kA{j}𝒜kBA{j}(1)|B|ujAC˙𝜽0((𝐱,𝐮)(B{j},A))GA,𝜽0(𝐱,𝐮)JA(𝐮,𝐱),\displaystyle\sum_{\displaystyle\mathbf{x}\in\bigtimes_{k\in A\setminus\{j\}}\mathcal{A}_{k}}\dfrac{\displaystyle\sum_{B\subset A\setminus\{j\}}(-1)^{|B|}\partial_{u_{j}}\partial_{A^{\complement}}\dot{C}_{{\boldsymbol{\theta}}_{0}}\left((\mathbf{x},\mathbf{u})^{(B\cup\{j\},A)}\right)}{G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})}J_{A}(\mathbf{u},\mathbf{x}),
𝜼A,j,2(xj,𝐮)\displaystyle{\boldsymbol{\eta}}_{A,j,2-}(x_{j},\mathbf{u}) =\displaystyle= 𝐱×kA{j}𝒜kLA(𝐱,𝐮)GA,𝜽0(𝐱,𝐮)BA{j}(1)|B|ujAC𝜽0((𝐱,𝐮)(B{j},A))\displaystyle\sum_{\displaystyle\mathbf{x}\in\bigtimes_{k\in A\setminus\{j\}}\mathcal{A}_{k}}\dfrac{L_{A}(\mathbf{x},\mathbf{u})}{G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})}\sum_{B\subset A\setminus\{j\}}(-1)^{|B|}\partial_{u_{j}}\partial_{A^{\complement}}C_{{\boldsymbol{\theta}}_{0}}\left((\mathbf{x},\mathbf{u})^{(B\cup\{j\},A)}\right)
×JA(𝐮,𝐱),\displaystyle\qquad\qquad\times J_{A}(\mathbf{u},\mathbf{x}),

with 𝐋A=G˙A,𝜽0GA,𝜽0\mathbf{L}_{A}=\dfrac{\dot{G}_{A,{\boldsymbol{\theta}}_{0}}}{G_{A,{\boldsymbol{\theta}}_{0}}}.

B.2. Asymptotic behavior of the estimator

Recall that 𝜻A,i=HA,𝜽0(𝐗i){\boldsymbol{\zeta}}_{A,i}=H_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{X}_{i}) and 𝜻i=H𝜽0(𝐗i){\boldsymbol{\zeta}}_{i}=H_{{\boldsymbol{\theta}}_{0}}(\mathbf{X}_{i}), i{1,,n}i\in\{1,\ldots,n\}.

Theorem 4.

Set 𝒲n,0=n1/2i=1nc˙𝛉0(𝐔i)c𝛉0(𝐔i)\displaystyle\mathcal{W}_{n,0}=n^{-1/2}\sum_{i=1}^{n}\frac{\dot{c}_{{\boldsymbol{\theta}}_{0}}(\mathbf{U}_{i})}{c_{{\boldsymbol{\theta}}_{0}}(\mathbf{U}_{i})}. Under Assumptions 6 and 7, the iid random vectors 𝛇i{\boldsymbol{\zeta}}_{i} have mean 0 and covariance 𝒥1=E(𝛇i𝛇i)\mathcal{J}_{1}=E\left({\boldsymbol{\zeta}}_{i}{\boldsymbol{\zeta}}_{i}^{\top}\right). In particular, 𝛍(𝛉0)=0{\boldsymbol{\mu}}({\boldsymbol{\theta}}_{0})=0. Furthermore, (𝒲n,0,𝒲n,1,𝒲n,2)(\mathcal{W}_{n,0},\mathcal{W}_{n,1},\mathcal{W}_{n,2}) converges in law to a centered Gaussian vector (𝒲0,𝒲1,𝒲2)(\mathcal{W}_{0},\mathcal{W}_{1},\mathcal{W}_{2}) with 𝒲1N(0,𝒥1)\mathcal{W}_{1}\sim N(0,\mathcal{J}_{1}), where 𝒲2\mathcal{W}_{2} given by

𝒲2\displaystyle\mathcal{W}_{2} =\displaystyle= j=1dA∌j(0,1)dc𝜽0(𝐮)𝜼A,j,2(𝐮)𝔹j(uj)𝑑𝐮\displaystyle-\sum_{j=1}^{d}\sum_{A\not\ni j}\int_{(0,1)^{d}}c_{{\boldsymbol{\theta}}_{0}}(\mathbf{u}){\boldsymbol{\eta}}_{A,j,2}(\mathbf{u})\mathbb{B}_{j}(u_{j})d\mathbf{u}
j=1dxj𝒜j𝔹j{Fj(xj)}Aj(0,1)dc𝜽0(𝐮)𝜼A,j,2+(xj,𝐮)𝑑𝐮\displaystyle\quad-\sum_{j=1}^{d}\sum_{x_{j}\in\mathcal{A}_{j}}\mathbb{B}_{j}\{F_{j}(x_{j})\}\sum_{A\ni j}\int_{(0,1)^{d}}c_{{\boldsymbol{\theta}}_{0}}(\mathbf{u}){\boldsymbol{\eta}}_{A,j,2+}(x_{j},\mathbf{u})d\mathbf{u}
+j=1dxj𝒜j𝔹j{Fj(xj)}Aj(0,1)dc𝜽0(𝐮)𝜼A,j,2(xj,𝐮)𝑑𝐮.\displaystyle\qquad+\sum_{j=1}^{d}\sum_{x_{j}\in\mathcal{A}_{j}}\mathbb{B}_{j}\{F_{j}(x_{j}-)\}\sum_{A\ni j}\int_{(0,1)^{d}}c_{{\boldsymbol{\theta}}_{0}}(\mathbf{u}){\boldsymbol{\eta}}_{A,j,2-}(x_{j},\mathbf{u})d\mathbf{u}.

Finally, E(𝒲0𝒲1)=𝒥1=𝛍˙(𝛉0)E\left(\mathcal{W}_{0}\mathcal{W}_{1}^{\top}\right)=\mathcal{J}_{1}=-\dot{\boldsymbol{\mu}}({\boldsymbol{\theta}}_{0}), and E(𝒲0𝒲2)=0E\left(\mathcal{W}_{0}\mathcal{W}_{2}^{\top}\right)=0. In particular, Assumption 7 holds if 𝒥1\mathcal{J}_{1} is invertible.

Proof.

For any A{1,,d}A\subset\{1,\ldots,d\}, set A(𝐮)=[jA𝕀{ujj}][kA𝕀{ukk}]\mathcal{I}_{A}(\mathbf{u})=\left[\prod_{j\in A}\mathbb{I}\{u_{j}\in\mathcal{I}_{j}\}\right]\left[\prod_{k\in A^{\complement}}\mathbb{I}\{u_{k}\not\in\mathcal{I}_{k}\}\right]. It follows from formula (10) that

E(𝜻A,1)\displaystyle E({\boldsymbol{\zeta}}_{A,1}) =\displaystyle= 𝐱×jA𝒜jc𝜽0(𝐮)[kA𝕀{ukk(xk)}][kA𝕀(ukk)]G˙A,𝜽0(𝐱,𝐮)GA,𝜽0(𝐱,𝐮)𝑑𝐮\displaystyle\sum_{\displaystyle\mathbf{x}\in\bigtimes_{j\in A}\mathcal{A}_{j}}\int c_{{\boldsymbol{\theta}}_{0}}(\mathbf{u})\left[\prod_{k\in A}\mathbb{I}\{u_{k}\in\mathcal{I}_{k}(x_{k})\}\right]\left[\prod_{k\in A^{\complement}}\mathbb{I}(u_{k}\notin\mathcal{I}_{k})\right]\dfrac{\dot{G}_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})}{G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})}d\mathbf{u}
=\displaystyle= 𝐱×jA𝒜j[kA𝕀(ukk)]G˙A,𝜽0(𝐱,𝐮)𝑑𝐮\displaystyle\sum_{\displaystyle\mathbf{x}\in\bigtimes_{j\in A}\mathcal{A}_{j}}\int\left[\prod_{k\in A^{\complement}}\mathbb{I}(u_{k}\notin\mathcal{I}_{k})\right]\dot{G}_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})d\mathbf{u}
=\displaystyle= c˙𝜽0(𝐮)A(𝐮)𝑑𝐮=𝜽E𝜽{A(𝐔)}|𝜽=𝜽0.\displaystyle\int\dot{c}_{{\boldsymbol{\theta}}_{0}}(\mathbf{u})\mathcal{I}_{A}(\mathbf{u})d\mathbf{u}=\left.\nabla_{\boldsymbol{\theta}}E_{\boldsymbol{\theta}}\{\mathcal{I}_{A}(\mathbf{U})\}\right|_{{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{0}}.

Thus, 𝝁(𝜽0)=E(𝜻1)=A{1,,d}E(𝜻A,1)=𝜽E𝜽{A{1,,d}A(𝐮)}|𝜽=𝜽0=𝜽[1]|𝜽=𝜽0=0\displaystyle{\boldsymbol{\mu}}({\boldsymbol{\theta}}_{0})=E({\boldsymbol{\zeta}}_{1})=\sum_{A\subset\{1,\ldots,d\}}E({\boldsymbol{\zeta}}_{A,1})=\left.\nabla_{\boldsymbol{\theta}}E_{\boldsymbol{\theta}}\left\{\sum_{A\subset\{1,\ldots,d\}}\mathcal{I}_{A}(\mathbf{u})\right\}\right|_{{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{0}}=\left.\nabla_{\boldsymbol{\theta}}\left[1\right]\right|_{{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{0}}=0. As a result, the independent random variables 𝜻i{\boldsymbol{\zeta}}_{i} are centered, and we may conclude that 𝕋1,n(𝜽0)=n1/2i=1n𝜻i𝕋1(𝜽0)N(0,𝒥1)\mathbb{T}_{1,n}({\boldsymbol{\theta}}_{0})=n^{-1/2}\sum_{i=1}^{n}{\boldsymbol{\zeta}}_{i}\rightsquigarrow\mathbb{T}_{1}({\boldsymbol{\theta}}_{0})\sim N(0,\mathcal{J}_{1}), where 𝒥1=E(𝜻1𝜻1)=A{1,,d}E(𝜻A,1𝜻A,1)\displaystyle\mathcal{J}_{1}=E\left({\boldsymbol{\zeta}}_{1}{\boldsymbol{\zeta}}_{1}^{\top}\right)=\sum_{A\subset\{1,\ldots,d\}}E\left({\boldsymbol{\zeta}}_{A,1}{\boldsymbol{\zeta}}_{A,1}^{\top}\right). Also, 𝒲n,2=𝕋n,2(𝜽0)𝒲2=𝕋2(𝜽0)\mathcal{W}_{n,2}=\mathbb{T}_{n,2}({\boldsymbol{\theta}}_{0})\rightsquigarrow\mathcal{W}_{2}=\mathbb{T}_{2}({\boldsymbol{\theta}}_{0}) since 𝔹n1,,𝔹nd\mathbb{B}_{n1},\ldots,\mathbb{B}_{nd} converge jointly to the Brownian bridges 𝔹1,,𝔹d\mathbb{B}_{1},\ldots,\mathbb{B}_{d}. Now

A,𝜽0𝑑𝒦=jA𝐱×kA𝒜k𝔹jFj(xj)c𝜽0(𝐮){𝜼A,j,1+(xj,𝐮)𝜼A,j,2+(xj,𝐮)}𝑑𝐮jA𝐱×kA𝒜k𝔹jFj(xj)c𝜽0(𝐮){𝜼A,j,1(xj,𝐮)𝜼A,j,2(xj,𝐮)}𝑑𝐮+jA𝐱×kA𝒜k𝔹j(uj)c𝜽0(𝐮){𝜼A,j,1(𝐮)𝜼A,j,2(𝐮)}𝑑𝐮.\int\mathbb{H}_{A,{\boldsymbol{\theta}}_{0}}d\mathcal{K}=\sum_{j\in A}\sum_{\displaystyle\mathbf{x}\in\bigtimes_{k\in A}\mathcal{A}_{k}}\mathbb{B}_{j}\circ F_{j}(x_{j})\int c_{{\boldsymbol{\theta}}_{0}}(\mathbf{u})\{{\boldsymbol{\eta}}_{A,j,1+}(x_{j},\mathbf{u})-{\boldsymbol{\eta}}_{A,j,2+}(x_{j},\mathbf{u})\}d\mathbf{u}\\ -\sum_{j\in A}\sum_{\displaystyle\mathbf{x}\in\bigtimes_{k\in A}\mathcal{A}_{k}}\mathbb{B}_{j}\circ F_{j}(x_{j}-)\int c_{{\boldsymbol{\theta}}_{0}}(\mathbf{u})\{{\boldsymbol{\eta}}_{A,j,1-}(x_{j},\mathbf{u})-{\boldsymbol{\eta}}_{A,j,2-}(x_{j},\mathbf{u})\}d\mathbf{u}\\ +\sum_{j\in A^{\complement}}\sum_{\displaystyle\mathbf{x}\in\bigtimes_{k\in A}\mathcal{A}_{k}}\int\mathbb{B}_{j}(u_{j})c_{{\boldsymbol{\theta}}_{0}}(\mathbf{u})\{{\boldsymbol{\eta}}_{A,j,1}(\mathbf{u})-{\boldsymbol{\eta}}_{A,j,2}(\mathbf{u})\}d\mathbf{u}.

Using the same trick as in the proof that E(𝜻i)=0E({\boldsymbol{\zeta}}_{i})=0, one gets, for any j{1,,d}j\in\{1,\ldots,d\}, and xj𝒜jx_{j}\in\mathcal{A}_{j}

A∌j(0,1)dc𝜽0(𝐮)𝜼A,j,1(𝐮)𝔹j(uj)𝑑𝐮=0=Aj(0,1)dc𝜽0(𝐮)𝜼A,j,1±(xj,𝐮)𝑑𝐮,\sum_{A\not\ni j}\int_{(0,1)^{d}}c_{{\boldsymbol{\theta}}_{0}}(\mathbf{u}){\boldsymbol{\eta}}_{A,j,1}(\mathbf{u})\mathbb{B}_{j}(u_{j})d\mathbf{u}=0=\sum_{A\ni j}\int_{(0,1)^{d}}c_{{\boldsymbol{\theta}}_{0}}(\mathbf{u}){\boldsymbol{\eta}}_{A,j,1\pm}(x_{j},\mathbf{u})d\mathbf{u},

yielding the representation for 𝒲2\mathcal{W}_{2}. Next, it is then easy to show that E{𝒲0𝔹j(t)}=0E\{\mathcal{W}_{0}\mathbb{B}_{j}(t)\}=0 for any j{1,,d}j\in\{1,\ldots,d\}, so E(𝒲0𝒲2)=0E\left(\mathcal{W}_{0}\mathcal{W}_{2}^{\top}\right)=0. Next, for any A{1,,d}A\subset\{1,\ldots,d\},

E{𝜻A,ic˙𝜽0(𝐔i)c𝜽0(𝐔i)}\displaystyle E\left\{{\boldsymbol{\zeta}}_{A,i}\dfrac{\dot{c}_{{\boldsymbol{\theta}}_{0}}(\mathbf{U}_{i})^{\top}}{c_{{\boldsymbol{\theta}}_{0}}(\mathbf{U}_{i})}\right\} =\displaystyle= 𝐱×jA𝒜jkA𝕀(uikk)G˙A,𝜽0(𝐱,𝐮)G˙A,𝜽0(𝐱,𝐮)GA,𝜽0(𝐱,𝐮)d𝐮=E(𝜻A,i𝜻A,i).\displaystyle\sum_{\displaystyle\mathbf{x}\in\bigtimes_{j\in A}\mathcal{A}_{j}}\int\prod_{k\notin A}\mathbb{I}(u_{ik}\notin\mathcal{I}_{k})\dfrac{\dot{G}_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})\dot{G}_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})^{\top}}{G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})}d\mathbf{u}=E\left({\boldsymbol{\zeta}}_{A,i}{\boldsymbol{\zeta}}_{A,i}^{\top}\right).

Therefore, (𝒲1𝒲0)=𝒥1\left(\mathcal{W}_{1}\mathcal{W}_{0}^{\top}\right)=\mathcal{J}_{1}. Next, for any A{1,,d}A\subset\{1,\ldots,d\},

𝝁A(𝜽)=𝐱×jA𝒜jkA𝕀(uikk)G˙A,𝜽(𝐱,𝐮)GA,𝜽(𝐱,𝐮)GA,𝜽0(𝐱,𝐮)d𝐮.{\boldsymbol{\mu}}_{A}({\boldsymbol{\theta}})=\sum_{\displaystyle\mathbf{x}\in\bigtimes_{j\in A}\mathcal{A}_{j}}\int\prod_{k\notin A}\mathbb{I}(u_{ik}\notin\mathcal{I}_{k})\dfrac{\dot{G}_{A,{\boldsymbol{\theta}}}(\mathbf{x},\mathbf{u})}{G_{A,{\boldsymbol{\theta}}}(\mathbf{x},\mathbf{u})}G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})d\mathbf{u}.

Hence, for any A{1,,d}A\subset\{1,\ldots,d\},

𝝁˙A(𝜽)\displaystyle\dot{\boldsymbol{\mu}}_{A}({\boldsymbol{\theta}}) =\displaystyle= 𝐱×jA𝒜jkA𝕀(uikk)G¨A,𝜽(𝐱,𝐮)GA,𝜽(𝐱,𝐮)GA,𝜽0(𝐱,𝐮)d𝐮\displaystyle\sum_{\displaystyle\mathbf{x}\in\bigtimes_{j\in A}\mathcal{A}_{j}}\int\prod_{k\notin A}\mathbb{I}(u_{ik}\notin\mathcal{I}_{k})\dfrac{\ddot{G}_{A,{\boldsymbol{\theta}}}(\mathbf{x},\mathbf{u})}{G_{A,{\boldsymbol{\theta}}}(\mathbf{x},\mathbf{u})}G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})d\mathbf{u}
𝐱×jA𝒜jkA𝕀(uikk)G˙A,𝜽(𝐱,𝐮)G˙A,𝜽(𝐱,𝐮)GA,𝜽2(𝐱,𝐮)GA,𝜽0(𝐱,𝐮)d𝐮,\displaystyle\qquad-\sum_{\displaystyle\mathbf{x}\in\bigtimes_{j\in A}\mathcal{A}_{j}}\int\prod_{k\notin A}\mathbb{I}(u_{ik}\notin\mathcal{I}_{k})\dfrac{\dot{G}_{A,{\boldsymbol{\theta}}}(\mathbf{x},\mathbf{u})\dot{G}_{A,{\boldsymbol{\theta}}}(\mathbf{x},\mathbf{u})^{\top}}{G_{A,{\boldsymbol{\theta}}}^{2}(\mathbf{x},\mathbf{u})}G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})d\mathbf{u},

so

𝝁˙A(𝜽0)\displaystyle\dot{\boldsymbol{\mu}}_{A}({\boldsymbol{\theta}}_{0}) =\displaystyle= 𝐱×jA𝒜jkA𝕀(uikk)G¨A,𝜽(𝐱,𝐮)d𝐮\displaystyle\sum_{\displaystyle\mathbf{x}\in\bigtimes_{j\in A}\mathcal{A}_{j}}\int\prod_{k\notin A}\mathbb{I}(u_{ik}\notin\mathcal{I}_{k})\ddot{G}_{A,{\boldsymbol{\theta}}}(\mathbf{x},\mathbf{u})d\mathbf{u}
𝐱×jA𝒜jkA𝕀(uikk)G˙A,𝜽0(𝐱,𝐮)G˙A,𝜽0(𝐱,𝐮)GA,𝜽0(𝐱,𝐮)d𝐮\displaystyle\qquad-\sum_{\displaystyle\mathbf{x}\in\bigtimes_{j\in A}\mathcal{A}_{j}}\int\prod_{k\notin A}\mathbb{I}(u_{ik}\notin\mathcal{I}_{k})\dfrac{\dot{G}_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})\dot{G}_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})^{\top}}{G_{A,{\boldsymbol{\theta}}_{0}}(\mathbf{x},\mathbf{u})}d\mathbf{u}
=\displaystyle= 𝜽2{c𝜽(𝐮)A(𝐮)𝑑𝐮}|𝜽=𝜽0E(𝜻A,1𝜻A,1).\displaystyle\left.\nabla^{2}_{\boldsymbol{\theta}}\left\{\int c_{\boldsymbol{\theta}}(\mathbf{u})\mathcal{I}_{A}(\mathbf{u})d\mathbf{u}\right\}\right|_{{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{0}}-E\left({\boldsymbol{\zeta}}_{A,1}{\boldsymbol{\zeta}}_{A,1}^{\top}\right).

As a result, 𝝁˙(𝜽0)=A{1,,d}𝝁˙A(𝜽0)=0𝒥1=𝒥1\displaystyle\dot{\boldsymbol{\mu}}({\boldsymbol{\theta}}_{0})=\sum_{A\subset\{1,\ldots,d\}}\dot{\boldsymbol{\mu}}_{A}({\boldsymbol{\theta}}_{0})=0-\mathcal{J}_{1}=-\mathcal{J}_{1}. ∎

Appendix C Supplementary material

Here, some results necessary to identifiability are proven, together with multilinear representations and results on densities with respect to product of mixtures of Lebesgue’s measure and counting measure, as well as a result on the verification of Assumption 5.

C.1. Proof of Proposition 2.1

Suppose that the maximal rank of 𝐓\mathbf{T}^{\prime} is rr and let 𝜽0O{\boldsymbol{\theta}}_{0}\in O so that the rank of 𝐓(𝜽0)\mathbf{T}^{\prime}({\boldsymbol{\theta}}_{0}) is rr. Set 𝐀=𝐓(𝜽0)\mathbf{A}=\mathbf{T}^{\prime}({\boldsymbol{\theta}}_{0}). Then there exists an r×rr\times r submatrix Mr(𝜽0)M_{r}({\boldsymbol{\theta}}_{0}) with non-zero determinant. By continuity, there is a neighborhood 𝒩\mathcal{N} of 𝜽0{\boldsymbol{\theta}}_{0} such that Mr(𝜽)M_{r}({\boldsymbol{\theta}}) has non-zero determinant. Since the maximal rank is rr, it follows that the rank of 𝐓(𝜽)\mathbf{T}^{\prime}({\boldsymbol{\theta}}) is rr for all 𝜽𝒩{\boldsymbol{\theta}}\in\mathcal{N}. One can now apply the famous Rank Theorem (Rudin,, 1976, Theorem 9.32) to deduce that there is a diffeomorphism 𝐇\mathbf{H} from an open set VpV\subset\mathbb{R}^{p} onto a neighborhood U𝒩U\subset\mathcal{N}, 𝜽0U{\boldsymbol{\theta}}_{0}\in U, such that 𝐓{𝐇(𝐱)}=𝐀𝐱+𝝋(𝐀𝐱)\mathbf{T}\{\mathbf{H}(\mathbf{x})\}=\mathbf{A}\mathbf{x}+{\boldsymbol{\varphi}}(\mathbf{A}\mathbf{x}), where 𝝋{\boldsymbol{\varphi}} is a continuously differentiable mapping into the null space of a projection withe same range as AA. Thus, choose 𝐱1\mathbf{x}_{1} in the null space of AA and 𝐱0V\mathbf{x}_{0}\in V so that 𝐇(𝐱0)=𝜽0\mathbf{H}(\mathbf{x}_{0})={\boldsymbol{\theta}}_{0}. Then, for some δ>0\delta>0, 𝐇(𝐱0+t𝐱1)U\mathbf{H}(\mathbf{x}_{0}+t\mathbf{x}_{1})\in U for any |t|<δ|t|<\delta. As a result, for all |t|<δ|t|<\delta, 𝐓{𝐇(𝐱0+t𝐱1)}=A(𝐱0+t𝐱1)+𝝋(A𝐱0+tA𝐱1)=A𝐱0+𝝋(A𝐱0)\mathbf{T}\{\mathbf{H}(\mathbf{x}_{0}+t\mathbf{x}_{1})\}=A(\mathbf{x}_{0}+t\mathbf{x}_{1})+{\boldsymbol{\varphi}}(A\mathbf{x}_{0}+tA\mathbf{x}_{1})=A\mathbf{x}_{0}+{\boldsymbol{\varphi}}(A\mathbf{x}_{0}), showing that TT is not injective. Next, suppose that 𝐉\mathbf{J} has rank pp and 𝐓\mathbf{T} is not injective, i.e, one can find 𝜽0𝜽1O{\boldsymbol{\theta}}_{0}\neq{\boldsymbol{\theta}}_{1}\in O, with T(𝜽0)=T(𝜽1)T({\boldsymbol{\theta}}_{0})=T({\boldsymbol{\theta}}_{1}). Set g(t)=Tn{𝜽0+t(𝜽1𝜽0)}g(t)=T_{n}\{{\boldsymbol{\theta}}_{0}+t({\boldsymbol{\theta}}_{1}-{\boldsymbol{\theta}}_{0})\}. This is well defined on [0,1][0,1] since OO is convex. Since g(0)=g(1)g(0)=g(1) and gg is continuous, g([0,1])g([0,1]) is a closed curve and there exist t0,t1[0,1]t_{0},t_{1}\in[0,1] with gj(t0)gj(t)gj(t1)g_{j}(t_{0})\leq g_{j}(t)\leq g_{j}(t_{1}), for all j{1,,d}j\in\{1,\ldots,d\} and for all t[0,1]t\in[0,1]. It then follows that either t00t_{0}\neq 0 or t10t_{1}\neq 0. Otherwise gg is constant and g(t)=0g^{\prime}(t)=0 for all t[0,1]t\in[0,1]. So suppose that t0>0t_{0}>0; the case t1>0t_{1}>0 is similar. Then 0<t0<10<t_{0}<1, which implied that for any j{1,,d}j\in\{1,\ldots,d\} and for δ>0\delta>0 small enough, gj(t0δ)gj(t0)δ0gj(t0+δ)gj(t0)δ\dfrac{g_{j}(t_{0}-\delta)-g_{j}(t_{0})}{\delta}\geq 0\geq\dfrac{g_{j}(t_{0}+\delta)-g_{j}(t_{0})}{\delta}. As a result, gj(t0)=0g_{j}^{\prime}(t_{0})=0 for all j{1,,d}j\in\{1,\ldots,d\}. Therefore, from this extension of Rolle’s Theorem, 0=g(t)=𝐉{𝜽0+t(𝜽1𝜽0)}(𝜽1𝜽0)0=g^{\prime}(t)=\mathbf{J}\{{\boldsymbol{\theta}}_{0}+t({\boldsymbol{\theta}}_{1}-{\boldsymbol{\theta}}_{0})\}({\boldsymbol{\theta}}_{1}-{\boldsymbol{\theta}}_{0}). As a result, the Jacobian has not rank pp at 𝜽=𝜽0+t(𝜽1𝜽0)O{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{0}+t({\boldsymbol{\theta}}_{1}-{\boldsymbol{\theta}}_{0})\in O, which is a contradiction. Hence 𝐓\mathbf{T} is injective. Suppose now that the rank of 𝐉(𝜽0)\mathbf{J}({\boldsymbol{\theta}}_{0}) is pp. Set g(𝜽)=det{𝐉(𝜽)𝐉(𝜽)}0g({\boldsymbol{\theta}})={\rm det}\left\{\mathbf{J}({\boldsymbol{\theta}})^{\top}\mathbf{J}({\boldsymbol{\theta}})\right\}\geq 0. Then the rank of 𝐉(𝜽)\mathbf{J}({\boldsymbol{\theta}}) is pp iff g(𝜽)>0g({\boldsymbol{\theta}})>0. Since g(𝜽0)>0g({\boldsymbol{\theta}}_{0})>0 and gg is continuous on OO, one can find a neighborhood 𝒩\mathcal{N} of 𝜽0{\boldsymbol{\theta}}_{0} for which g(𝜽)>0g({\boldsymbol{\theta}})>0 for every 𝜽𝒩{\boldsymbol{\theta}}\in\mathcal{N}. Finally, if the maximal rank is r<pr<p, then if follows from the proof of the first statement that the rank of 𝐓\mathbf{T}^{\prime} is also rr in a neighborhood of 𝜽0{\boldsymbol{\theta}}_{0}, and from the Rank Theorem (Rudin,, 1976, Theorem 9.32) , 𝐓\mathbf{T} is not injective in a neighborhood of 𝜽0{\boldsymbol{\theta}}_{0}. ∎

C.2. Density with respect to μ\mu

First, for a given cdf 𝒢\mathcal{G} having density gg with respect to μ𝒜+\mu_{\mathcal{A}}+\mathcal{L}, where 𝒜\mathcal{A} is a countable set, one has g(x)=Δ𝒢(x)g(x)=\Delta\mathcal{G}(x), for x𝒜x\in\mathcal{A}, and

(11) ~𝒢(x)=P(Xx,X𝒜)=𝒢(x)y𝒜,yxΔ𝒢(y)=xg(y)𝑑y.\tilde{}\mathcal{G}(x)=P(X\leq x,X\not\in\mathcal{A})=\mathcal{G}(x)-\sum_{y\in\mathcal{A},\;y\leq x}\Delta\mathcal{G}(y)=\int_{-\infty}^{x}g(y)dy.

Then, by the fundamental theorem of calculus, ~𝒢(x)=g(x)\tilde{}\mathcal{G}^{\prime}(x)=g(x), \mathcal{L}-a.e. If in addition 𝒜\mathcal{A} is closed, then for any x𝒜x\not\in\mathcal{A}, and ϵ>0\epsilon>0 small enough, 𝒢(x+ϵ)𝒢(x)=xx+ϵg(y)𝑑y\mathcal{G}(x+\epsilon)-\mathcal{G}(x)=\int_{x}^{x+\epsilon}g(y)dy, so 𝒢(x)=g(x)\mathcal{G}^{\prime}(x)=g(x) a.e. Next, to find the expression of the density hh of distribution function HH with respect to μ\mu, suppose that x1,,xkx_{1},\ldots,x_{k} are atoms of F1,,FkF_{1},\ldots,F_{k}, while xk+1,,xdx_{k+1},\ldots,x_{d} are not atoms of Fk+1,,FdF_{k+1},\ldots,F_{d}. On one hand, if 𝐗H\mathbf{X}\sim H, for δ>0\delta>0 small enough, one gets

G(𝐱,δ)\displaystyle G(\mathbf{x},\delta) =\displaystyle= P[j=1k{Xj=xj}j=k+1d{xjδ<Xjxj}{Xj𝒜j}]\displaystyle P\left[\cap_{j=1}^{k}\{X_{j}=x_{j}\}\cap\cap_{j=k+1}^{d}\{x_{j}-\delta<X_{j}\leq x_{j}\}\cap\{X_{j}\not\in\mathcal{A}_{j}\}\right]
=\displaystyle= h(x1,,xk,yk+1,,yd){j=k+1d𝕀(xjδ<yjxj)}𝑑yk+1𝑑yd.\displaystyle\int h(x_{1},\ldots,x_{k},y_{k+1},\ldots,y_{d})\left\{\prod_{j=k+1}^{d}\mathbb{I}(x_{j}-\delta<y_{j}\leq x_{j})\right\}dy_{k+1}\cdots dy_{d}.

On the other hand, if 𝐔C\mathbf{U}\sim C, with C𝒮HC\in\mathcal{S}_{H} having a continuous density over (0,1)d(0,1)^{d}, using the fact that for any j>kj>k, the Lebesgue’s measure of (Fj(xjδ),Fj(xj)]j(F_{j}(x_{j}-\delta),F_{j}(x_{j})]\setminus\mathcal{I}_{j} is xjδxjfj(yj)𝑑yj\int_{x_{j}-\delta}^{x_{j}}f_{j}(y_{j})dy_{j} by (11), one gets

G(𝐱,δ)\displaystyle G(\mathbf{x},\delta) =\displaystyle= P[j=1k{Fj(xj)<UjFj(xj)}j=k+1d{Fj(xjδj)<UjFj(xj)}{Ujj}]\displaystyle P\left[\cap_{j=1}^{k}\{F_{j}(x_{j}-)<U_{j}\leq F_{j}(x_{j})\}\cap\cap_{j=k+1}^{d}\{F_{j}(x_{j}-\delta_{j})<U_{j}\leq F_{j}(x_{j})\}\cap\{U_{j}\not\in\mathcal{I}_{j}\}\right]
=\displaystyle= F1(x1)F1(x1)Fk(xk)Fk(xk)(Fk+1(xk+1δ),Fk+1(xk+1)]k+1(Fd(xdδd),Fd(xd)]dc(𝐮)𝑑𝐮\displaystyle\int_{F_{1}(x_{1}-)}^{F_{1}(x_{1})}\cdots\int_{F_{k}(x_{k}-)}^{F_{k}(x_{k})}\int_{(F_{k+1}(x_{k+1}-\delta),F_{k+1}(x_{k+1})]\setminus\mathcal{I}_{k+1}}\cdots\int_{(F_{d}(x_{d}-\delta_{d}),F_{d}(x_{d})]\setminus\mathcal{I}_{d}}c(\mathbf{u})d\mathbf{u}
=\displaystyle= F1(x1)F1(x1)Fk(xk)Fk(xk)c(u1,,uk,Fk+1(xk+1),,Fd(xd))𝑑u1𝑑uk\displaystyle\int_{F_{1}(x_{1}-)}^{F_{1}(x_{1})}\cdots\int_{F_{k}(x_{k}-)}^{F_{k}(x_{k})}c\left(u_{1},\ldots,u_{k},F_{k+1}(x_{k+1}),\ldots,F_{d}(x_{d})\right)du_{1}\cdots du_{k}
×{j=k+1dxjδxjfj(yj)dj}+o(δdk).\displaystyle\qquad\qquad\qquad\times\left\{\prod_{j=k+1}^{d}\int_{x_{j}-\delta}^{x_{j}}f_{j}(y_{j})d_{j}\right\}+o\left(\delta^{d-k}\right).

Therefore,

h(𝐱)\displaystyle h(\mathbf{x}) =\displaystyle= j=k+1dfj(xj)F1(x1)F1(x1)Fk(xk)Fk(xk)c(u1,,uk,Fk+1(xk+1),,Fd(xd))𝑑u1𝑑uk\displaystyle\prod_{j=k+1}^{d}f_{j}(x_{j})\int_{F_{1}(x_{1}-)}^{F_{1}(x_{1})}\cdots\int_{F_{k}(x_{k}-)}^{F_{k}(x_{k})}c\left(u_{1},\ldots,u_{k},F_{k+1}(x_{k+1}),\ldots,F_{d}(x_{d})\right)du_{1}\cdots du_{k}
=\displaystyle= j=k+1dfj(xj)B{1,,k}(1)|B|uk+1udC(𝐅~(B)(𝐱)),\displaystyle\prod_{j=k+1}^{d}f_{j}(x_{j})\sum_{B\subset\{1,\ldots,k\}}(-1)^{|B|}\partial_{u_{k+1}}\cdots\partial_{u_{d}}C\left(\tilde{\mathbf{F}}^{(B)}(\mathbf{x})\right),

where F~(B)\tilde{F}^{(B)} is defined by

(12) (𝐅~(B)(𝐱))j={Fj(xj), if jB;Fj(xj), if jB={1,,d}B.\left(\tilde{\mathbf{F}}^{(B)}(\mathbf{x})\right)_{j}=\left\{\begin{array}[]{ll}F_{j}({x_{j}}-)&\text{, if }j\in B;\\ F_{j}(x_{j})&\text{, if }j\in B^{\complement}=\{1,\ldots,d\}\setminus B.\end{array}\right.

As a result, if A={j:xj𝒜j}A=\{j:x_{j}\in\mathcal{A}_{j}\}, then the density h(x)h(x) with respect to μ\mu is

(13) h(𝐱)={jAfj(xj)}BA(1)|B|AC{𝐅~(B)(𝐱)}.h(\mathbf{x})=\left\{\prod_{j\in A^{\complement}}f_{j}(x_{j})\right\}\sum_{B\subset A}(-1)^{|B|}\partial_{A^{\complement}}C\left\{\tilde{\mathbf{F}}^{(B)}(\mathbf{x})\right\}.

In particular, if AA is the set of indices j{1,,d}j\in\{1,\ldots,d\} for which ΔFj(xj)>0\Delta F_{j}(x_{j})>0, and CC is a copula associated with the joint law of (Y,𝐗)(Y,\mathbf{X}), where YY has margin GG, then (13) yields

(14) P(Yy|𝐗=𝐱)=BA(1)|B|AC{G(y),𝐅~(B)(𝐱)}BA(1)|B|AC{1,𝐅~(B)(𝐱)}.P(Y\leq y|\mathbf{X}=\mathbf{x})=\dfrac{\sum_{B\subset A}(-1)^{|B|}\partial_{A^{\complement}}C\left\{G(y),\tilde{\mathbf{F}}^{(B)}(\mathbf{x})\right\}}{\sum_{B\subset A}(-1)^{|B|}\partial_{A^{\complement}}C\left\{1,\tilde{\mathbf{F}}^{(B)}(\mathbf{x})\right\}}.

C.3. Verification of Assumption 5

Let GG be a discrete distribution function concentrated on 𝒜={x:ΔG(x)>0}\mathcal{A}=\{x:\Delta G(x)>0\}. Recall that Vn(G)=nx𝒜ΔG(x){1ΔG(x)}n1\displaystyle V_{n}(G)=n\sum_{x\in\mathcal{A}}\Delta G(x)\{1-\Delta G(x)\}^{n-1}.

Proposition 2.

Suppose that there exists a constant CC so that for any aa small enough,

(15) x𝒜:ΔG(x)<aΔG(x)Ca.\sum_{x\in\mathcal{A}:\Delta G(x)<a}\Delta G(x)\leq Ca.

Then lim supnVn(G)<C(1e1)2\limsup_{n\to\infty}V_{n}(G)<\dfrac{C}{\left(1-e^{-1}\right)^{2}}.

Proof.

First, recall that 1xex1-x\leq e^{-x} for any x0x\geq 0. For simplicity, for any x𝒜x\in\mathcal{A}, set px=ΔG(x)p_{x}=\Delta G(x). Next, using (15), if nn is large enough, one has

Vn(G)=nk=1x𝒜:knpx<k+1px(1px)n1+k=0x𝒜:n(k+1)px<nkpx(1px)n1Ck=1(k+1)e(n1)k/n+Ck=01nk=C[1{1e(n1)/n}21+nn1].V_{n}(G)=n\sum_{k=1}^{\infty}\sum_{x\in\mathcal{A}:k\leq np_{x}<k+1}p_{x}(1-p_{x})^{n-1}+\sum_{k=0}^{\infty}\sum_{x\in\mathcal{A}:n^{-(k+1)}\leq p_{x}<n^{-k}}p_{x}(1-p_{x})^{n-1}\\ \leq C\sum_{k=1}^{\infty}(k+1)e^{-(n-1)k/n}+C\sum_{k=0}^{\infty}\dfrac{1}{n^{k}}=C\left[\dfrac{1}{\left\{1-e^{-(n-1)/n}\right\}^{2}}-1+\dfrac{n}{n-1}\right].

As a result, lim supnVn(G)C(1e1)2\limsup_{n\to\infty}V_{n}(G)\leq\dfrac{C}{\left(1-e^{-1}\right)^{2}}. ∎

Note that when 𝒜\mathcal{A} is finite, then (15) holds with C=0C=0.

Remark 9.

Note that (15) holds for the geometric distribution with pk=p(1p)kp_{k}=p(1-p)^{k}, k=0,1,k=0,1,\ldots, with p(0,1)p\in(0,1). In this case, C=pC=p since k=ip(1p)k=(1p)i\displaystyle\sum_{k=i}p(1-p)^{k}=(1-p)^{i}, i=0,1,i=0,1,\ldots. (15) holds for Poisson distribution with parameter λ\lambda since pk+ieλpkpip_{k+i}\leq e^{\lambda}p_{k}p_{i}, k,i=0,1,k,i=0,1,\ldots, with p(0,1)p\in(0,1). In this case, C=eλC=e^{\lambda}. In general a sufficient condition for (15) to hold on {0}\mathbb{N}\cup\{0\} is that pk+iCpkpip_{k+i}\leq Cp_{k}p_{i}, k,i{0}k,i\in\mathbb{N}\cup\{0\}. Another example of an important distribution satisfying (15) is the Negative Binomial distribution with parameters rr\in\mathbb{N} and p(0,1)p\in(0,1), where pk=(k+r1r1)pr(1p)kp_{k}=\left(\begin{array}[]{c}k+r-1\\ r-1\end{array}\right)p^{r}(1-p)^{k}, k{0}k\in\mathbb{N}\cup\{0\}. In this case ikpi=j=0r1(k+r1r1)pj(1p)k+r1j\sum_{i\geq k}p_{i}=\sum_{j=0}^{r-1}\left(\begin{array}[]{c}k+r-1\\ r-1\end{array}\right)p^{j}(1-p)^{k+r-1-j}. So if aa is small enough, pip_{i} is decreasing, so let k0k_{0} be such that pk0<apk01p_{k_{0}}<a\geq p_{k_{0}-1}. It then follows that if k0k_{0} is large enough,

i:pi<api=ik0pi=j=0r1(k0+r1j)pj(1p)k0+r1j(k0+r1r1)j=0r1pj(1p)k0+r1jr(k0+r1r1)(1p)k0=Cpk0Ca,\sum_{i:p_{i}<a}p_{i}=\sum_{i\geq k_{0}}p_{i}=\sum_{j=0}^{r-1}\left(\begin{array}[]{c}k_{0}+r-1\\ j\end{array}\right)p^{j}(1-p)^{k_{0}+r-1-j}\\ \leq\left(\begin{array}[]{c}k_{0}+r-1\\ r-1\end{array}\right)\sum_{j=0}^{r-1}p^{j}(1-p)^{k_{0}+r-1-j}\leq r\left(\begin{array}[]{c}k_{0}+r-1\\ r-1\end{array}\right)(1-p)^{k_{0}}=Cp_{k_{0}}\leq Ca,

where C=rprC=rp^{-r}.

References

  • Chen et al., (2013) Chen, L., Singh, V. P., Guo, S., Mishra, A. K., and Guo, J. (2013). Drought analysis using copulas. J. Hydrol. Eng., 18(7):797–808.
  • Clayton, (1978) Clayton, D. G. (1978). A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika, 65:141–151.
  • de Leon and Wu, (2011) de Leon, A. R. and Wu, B. (2011). Copula-based regression models for a bivariate mixed discrete and continuous outcome. Stat. Med., 30(2):175–185.
  • Dunnett and Sobel, (1954) Dunnett, C. W. and Sobel, M. (1954). A bivariate generalization of Student’s t-distribution, with tables for certain special cases. Biometrika, 41:153–169.
  • Embrechts et al., (2002) Embrechts, P., McNeil, A. J., and Straumann, D. (2002). Correlation and dependence in risk management: properties and pitfalls. In Risk Management: Value at Risk and Beyond (Cambridge, 1998), pages 176–223. Cambridge Univ. Press, Cambridge.
  • Ery, (2016) Ery, J. (2016). Semiparametric inference for copulas with discrete margins. Master’s thesis, École Polytechnique Fédérale de Lausanne, Faculty of Mathematics.
  • Faugeras, (2017) Faugeras, O. P. (2017). Inference for copula modeling of discrete data: a cautionary tale and some facts. Depend. Model., 5(1):121–132.
  • Fermanian et al., (2004) Fermanian, J.-D., Radulović, D., and Wegkamp, M. H. (2004). Weak convergence of empirical copula processes. Bernoulli, 10:847–860.
  • Geenens, (2020) Geenens, G. (2020). Copula modeling for discrete random vectors. Depend. Model., 8(1):417–440.
  • Geenens, (2021) Geenens, G. (2021). Dependence, Sklar’s copulas and discreteness. Online presentation available at https://sites.google.com/view/apsps/previous-speakers. Asia-Pacific Seminar on Probability and Statistics.
  • Genest and Favre, (2007) Genest, C. and Favre, A.-C. (2007). Everything you always wanted to know about copula modeling but were afraid to ask. J. Hydrol. Eng., 12(4):347–368.
  • Genest et al., (1995) Genest, C., Ghoudi, K., and Rivest, L.-P. (1995). A semiparametric estimation procedure of dependence parameters in multivariate families of distributions. Biometrika, 82:543–552.
  • Genest and Nešlehová, (2007) Genest, C. and Nešlehová, J. (2007). A primer on copulas for count data. The Astin Bulletin, 37:475–515.
  • Genest et al., (2017) Genest, C., Nešlehová, J. G., and Rémillard, B. (2017). Asymptotic behavior of the empirical multilinear copula process under broad conditions. J. Multivariate Anal., 159:82–110.
  • Genest and Rémillard, (2008) Genest, C. and Rémillard, B. (2008). Validity of the parametric bootstrap for goodness-of-fit testing in semiparametric models. Ann. Inst. Henri Poincaré Probab. Stat., 44(6):1096–1127.
  • Genz, (2004) Genz, A. (2004). Numerical computation of rectangular bivariate and trivariate normal and t probabilities. Stat. Comput., 14(3):251–260.
  • Joe, (1990) Joe, H. (1990). Multivariate concordance. J. Multivariate Anal., 35(1):12–30.
  • Joe, (2005) Joe, H. (2005). Asymptotic efficiency of the two-stage estimation method for copula-based models. J. Multivariate Anal., 94:401–419.
  • Joe and Xu, (1996) Joe, H. and Xu, J. J. (1996). The estimation method of inference functions for margins for multivariate models. Technical Report 166, UBC.
  • Li et al., (2020) Li, Y., Li, Y., Qin, Y., and Yan, J. (2020). Copula modeling for data with ties. Stat. Interface, 13(1):103–117.
  • McKee et al., (1993) McKee, T. B., Doesken, N. J., Kleist, J., et al. (1993). The relationship of drought frequency and duration to time scales. In Proceedings of the 8th Conference on Applied Climatology, volume 17, pages 179–183.
  • Nasri, (2020) Nasri, B. R. (2020). On non-central squared copulas. Statist. Probab. Lett., 161:108704.
  • Nasri and Rémillard, (2019) Nasri, B. R. and Rémillard, B. N. (2019). Copula-based dynamic models for multivariate time series. J. Multivariate Anal., 172:107–121.
  • Nasri and Rémillard, (2023) Nasri, B. R. and Rémillard, B. N. (2023). CopulaInference: Estimation and Goodness-of-Fit of Copula-Based Models with Arbitrary Distributions. R package version 0.5.0.
  • Nasri et al., (2020) Nasri, B. R., Rémillard, B. N., and Thioub, M. Y. (2020). Goodness-of-fit for regime-switching copula models with application to option pricing. Canad. J. Statist., 48(1):79–96.
  • Oh and Patton, (2016) Oh, D. H. and Patton, A. J. (2016). High-dimensional copula-based distributions with mixed frequency data. J. Econom., 193(2):349–366.
  • Rudas, (2018) Rudas, T. (2018). Lectures on Categorical Data Analysis. Springer Texts in Statistics. Springer, New York.
  • Rudin, (1976) Rudin, W. (1976). Principles of Mathematical Analysis. McGraw-Hill Book Co., New York-Auckland-Düsseldorf, third edition. International Series in Pure and Applied Mathematics.
  • Ruymgaart et al., (1972) Ruymgaart, F. H., Shorack, G. R., and van Zwet, W. R. (1972). Asymptotic normality of nonparametric tests for independence. Ann. Math. Statist., 43:1122–1135.
  • Shiau, (2006) Shiau, J. T. (2006). Fitting drought duration and severity with two-dimensional copulas. Water. Resour. Manag., 20(5):795–815.
  • Shih and Louis, (1995) Shih, J. H. and Louis, T. A. (1995). Inferences on the association parameter in copula models for bivariate survival data. Biometrics, 51:1384–1399.
  • Sklar, (1959) Sklar, M. (1959). Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Statist. Univ. Paris, 8:229–231.
  • Song et al., (2009) Song, P. X.-K., Li, M., and Yuan, Y. (2009). Joint regression analysis of correlated data using Gaussian copulas. Biometrics, 65(1):60–68.
  • Tsukahara, (2005) Tsukahara, H. (2005). Semiparametric estimation in copula models. Canad. J. Statist., 33(3):357–375.
  • Varin et al., (2011) Varin, C., Reid, N., and Firth, D. (2011). An overview of composite likelihood methods. Statist. Sinica, 21(1):5–42.
  • Zhang and Singh, (2019) Zhang, L. and Singh, V. P. (2019). Copulas and Their Applications in Water Resources Engineering. Cambridge University Press.
  • Zilko and Kurowicka, (2016) Zilko, A. A. and Kurowicka, D. (2016). Copula in a multivariate mixed discrete–continuous model. Comput. Statist. Data Anal., 103:28–55.