This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

The Impossibility of Testing for Dependence Using Kendall’s τ\tau Under Missing Data of Unknown Form

Oliver R. Cutbill Rami V. Tabri Corresponding author: [email protected].
Abstract

This paper discusses the statistical inference problem associated with testing for dependence between two continuous random variables using Kendall’s τ\tau in the context of the missing data problem. We prove the worst-case identified set for this measure of association always includes zero. The consequence of this result is that robust inference for dependence using Kendall’s τ\tau, where robustness is with respect to the form of the missingness-generating process, is impossible.

AMS 2020 subject classifications: 62H15; 62D10; 62G10
Keywords: Impossible Inference; Statistical Dependence; Kendall’s τ\tau; Partial Identification; Missing Data.

1 Introduction

Testing for statistical dependence between two random variables is an important facet of theoretical and empirical statistical research, and arises as a problem of interest in various areas of the natural and social sciences. Applications in social science include the study of the relationship between health outcomes and insurance levels (e.g., Cameron and Trivedi, 1993), survey analysis (e.g., Yu et al., 2016), stress-testing risk-management models (e.g., Asimit et al., 2016), and stock market co-movements (e.g., Horváth and Rice, 2015; Cameron and Trivedi, 1993). In the natural sciences, applications arise in contexts as diverse as cancerous somatic alteration co-occurrences (e.g., Canisius et al., 2016) and the movement of animals across time (e.g., Swihart and Slade, 1985).

Tests for dependence based on Kendall’s τ\tau (Kendall, 1938) constitute a relevant tool in empirical practice to detect monotonic dependence between two random variables. The interested reader may refer, for instance, to the monographs Nelsen (2006)Bagdonavicius et al. (2011), and the references therein. The strength of such testing procedures is that τ\tau is a distribution-free measure of association between paired continuous random variables. In particular, let (X,Y)(X,Y) be a pair of continuous random variables having joint distribution HH and marginal distributions FF and GG, respectively. In moment form, this measure of association for the random vector (X,Y)(X,Y) is defined as

τ=4EH[C(F(X),G(Y))]1,\displaystyle\tau=4E_{H}\left[C\left(F(X),G(Y)\right)\right]-1, (1.1)

where CC is the copula of HH, and EHE_{H} denotes the expectation operator with respect to the distribution HH. The hypothesis testing problem for detecting monotonic dependence using τ\tau in (1.1) has the form

H0:τ=0versusH1:τ0.\displaystyle H_{0}:\tau=0\quad\text{versus}\quad H_{1}:\tau\neq 0. (1.2)

The null hypothesis in (1.2) posits no monotonic dependence between the two random variables, and the alternative hypothesis is the negation of the null.

Statistical procedures for the hypothesis test problem (1.2) are predicated on the assumption that the random vector (X,Y)(X,Y) is observable. However, this assumption is violated in empirical practice because datasets can have missing values. For example, missing data can arise in the form of nonresponse, as in self-reported cross-sectional and longitudinal surveys, which is inevitable, or at follow-up in clinical studies. See, for example, Dutz et al. (2021) for a discussion on the prevalence of nonresponse in economics research. Missing data are also universal in ecological and evolutionary data, as in other branches of science; see, for example, the monograph Fox et al. (2015) and the references therein. Imputation methods are commonly used to address the missing data problem and enable testing with a complete dataset. However, the validity of such tests hinges on the correct specification of the imputation procedure, which can lead to biased inferences if misspecified. Another approach in the literature that addresses the missing data problem imposes assumptions on the missingness-generating process (MGP) that point-identify τ\tau in (1.1). In the context of Kendall’s rank-correlation test, see, for example, Alvo and Cabilio (1995) who assume the MGP is either missing completely at random or weakly exogenous, and Ma (2012) who assumes that it is either missing at random or missing completely at random. While practical, such tests also ignore misspecification of the MGP, which weakens the credibility of any derived inferences (Manski, 2003).

Consequently, we ask if it is possible to conduct non-parametric inference for dependence using τ\tau under missing data of unknown form. The results of this paper imply that such robust inference is impossible. Reasoning from first principles, any sensible testing procedure of this sort must be based on τ\tau’s identified set because it characterizes the information about this parameter contained in the observables. The identified set for this parameter is an interval [τ¯,τ¯][1,1][\underaccent{\bar}{\tau},\bar{\tau}]\subseteq[-1,1] whose bounds depend on observables. Therefore, the testing problem for inferring statistical dependence using this information must have the form

H0:τ¯0andτ¯0versusH1:τ¯>0orτ¯<0.\displaystyle H_{0}:\underaccent{\bar}{\tau}\leq 0\;\text{and}\;\bar{\tau}\geq 0\quad\text{versus}\quad H_{1}:\underaccent{\bar}{\tau}>0\;\text{or}\;\bar{\tau}<0. (1.3)

Under H1H_{1} in (1.3), the identified set is a subset of either [1,0)[-1,0) or (0,1](0,1]. Since τ¯ττ¯\underaccent{\bar}{\tau}\leq\tau\leq\bar{\tau} holds by definition, this hypothesis implies that either τ<0\tau<0 or τ>0\tau>0 holds, so that XX and YY are statistically dependent. We show the bounds satisfy the inequalities τ¯0τ¯\underaccent{\bar}{\tau}\leq 0\leq\bar{\tau}, for all joint distributions of (X,Y)(X,Y) and MGPs. These inequalities show the null hypothesis in (1.3) always holds, implying that one cannot partition the underlying probability model into two submodels that are compatible with the assertions of the null and alternative hypotheses. Therefore, the worst-case bounds are useless in detecting dependence between XX and YY through the testing problem (1.3). We prove that this property of the bounds holds in the setup where the marginal distributions are known to the practitioner, which implies that it holds in the setup where those distributions are unknown. The reason is that the bounds in the case where the marginal distributions are unknown must be less informative, and hence, weakly wider than their counterparts under known marginal distributions, implying that they must also satisfy this property. A critical step in our theoretical derivations is an innovative use of results on extremal dependence described in Puccetti and Wang (2015).

This paper contributes to the literature on impossible inference, which has a rich history that started with the classic paper of Bahadur and Savage (1956). The recent paper by Bertanha and Moreira (2020) connects this literature and presents a taxonomy of the types of impossible inferences. Our result falls under Type A in their taxonomy, as the alternative is indistinguishable from the null. However, it should be noted that our result is not a consequence of the model of the null hypotheses being dense in the set of all likely models with respect to the total-variation distance, which is the essential characteristic of Type A impossible inferences. Rather, it flows from the fact that the bounds τ¯\underaccent{\bar}{\tau} and τ¯\bar{\tau} are uninformative because they do not define a partition of the underlying probability model into two submodels that are compatible with the assertions of H0H_{0} and H1H_{1} in (1.3).

The idea of using bounds to account for missing data problems started with the seminal paper of Manski (1989) and gained popularity with the important paper of Horowitz and Manski (1995). Since then, there has been a growing influential literature on partial identification that has shaped empirical practice; see, for example, Canay and Shaikh (2016) for a recent survey of this literature and the references therein. Inference on bounds to account for missing data in moment inequality models have been considered in a variety of settings, such as distributional analyses (e.g., Blundell et al., 2007), treatment effect (e.g., Lee, 2009), and stochastic dominance testing (e.g., Fakih et al., 2021). In contrast to those works, this paper shows that such an approach is futile in testing for dependence under missing data of unknown form using Kendall’s τ\tau and its worst-case bounds. We also discuss how to obtain informative partitions of the underlying probability model through restrictions on the dependence between XX and YY and/or the MGPs.

There is also a strand in the partial identification literature focusing on parameters that depend on the joint distribution of two random variables with point-identified marginal distributions; see, for example, Fan and Patton (2014) for a survey of this strand and the references therein. However, to the best of our knowledge, this literature strand has not considered partial identification of those parameters arising from the missing data problem. While convenient, the point-identification of the margins can be untenable in applications with missing data and can create challenges in the inference for such parameters — the results of our paper exemplify this point.

The rest of this paper is organized as follows. Section 2 introduces the statistical setup of this paper and preliminary results on extremal dependence that we utilize in the proof of our results. Section 3 presents our results, Section 4 discusses the scope of the results and implications for empirical practice. Section 5 concludes. All proofs are relegated to the Appendix.

2 Setup and Preliminaries

Consider the random vector (X,Y,Z)(X,Y,Z) having joint distribution PP, where XX and YY are the random variables of interest, which are continuous, and ZZ is a categorical variable supported on {1,2,3,4}\{1,2,3,4\} indicating missingness on XX and YY. In this setup,

the practitioner observes {(X,Y), if Z=1(X,), if Z=2(,Y), if Z=3(,), if Z=4\displaystyle\text{ the practitioner observes }\begin{cases}(X,Y),&\text{ if }Z=1\\ (X,\ast),&\text{ if }Z=2\\ (\ast,Y),&\text{ if }Z=3\\ (\ast,\ast),&\text{ if }Z=4\end{cases} (2.1)

where \ast denotes the missing variable. For simplicity, we assume that the marginal cumulative distribution functions (CDFs) of XX and YY, denoted by FF and GG, respectively, are known by the practitioner. We derive the worst-case bounds on τ\tau under the following probability model.

Definition 1.

Let 𝒫F,G\mathcal{P}_{F,G} be the set of distributions of the random vector (X,Y,Z)(X,Y,Z) supported on 𝒳×𝒴×{1,2,3,4}2×{1,2,3,4}\mathcal{X}\times\mathcal{Y}\times\{1,2,3,4\}\subseteq\mathbb{R}^{2}\times\{1,2,3,4\}, with generic element PP, such that

  1. (i)

    PP has a density, pp.

  2. (ii)

    (X,Y)(X,Y) is a continuous random vector having strictly positive density.

  3. (iii)

    XX has CDF FF and YY has CDF GG.

The worst-case bounds on τ\tau without the practitioner’s knowledge of the marginal CDFs can be computed by extremizing their counterparts with known FF and GG over feasible candidate values of these CDFs. We elaborate on this point in Section 3, and show impossible inference for dependence under 𝒫F,G\mathcal{P}_{F,G} implies that it holds in the more general scenario where FF and GG are unknown to the practitioner.

In this setup, a MGP is specified through restrictions on the joint distribution of (X,Y,Z)(X,Y,Z). The model 𝒫F,G\mathcal{P}_{F,G} does not place any restrictions on the dependence between (X,Y)(X,Y) and ZZ beyond the existence of a density. To account for the missing data problem, we exhibit τ\tau as a functional of P𝒫F,GP\in\mathcal{P}_{F,G}. For each P𝒫F,GP\in\mathcal{P}_{F,G}, an application of the Law of Total probability shows the corresponding value of τ\tau has the following representation:

τ(P)=4z=14EP[CP(F(X),G(Y))Z=z]P[Z=z]1,\displaystyle\tau(P)=4\sum_{z=1}^{4}E_{P}\left[C_{P}\left(F(X),G(Y)\right)\mid Z=z\right]\,P[Z=z]-1, (2.2)

where CPC_{P} is the copula of the joint CDF P(X,Y)P(X\leq\cdot,Y\leq\cdot).111The existence and uniqueness of CPC_{P} in our setup is a result due to Sklar (1959). This representation of τ\tau is useful since it clarifies the situation faced by the practitioner in our setup. In particular, it shows that τ\tau can be calculated for each P𝒫F,GP\in\mathcal{P}_{F,G} using the following parameters: the copula CPC_{P}; the conditional CDFs, P(X,YZ=z)P(X\leq\cdot,Y\leq\cdot\mid Z=z) for z=1,2,3,4z=1,2,3,4; and the marginal probabilities of ZZ, {P(Z=z)}z=14\left\{P(Z=z)\right\}_{z=1}^{4}. From sampling, asymptotically the practitioner can recover {P(Z=z)}z=14\left\{P(Z=z)\right\}_{z=1}^{4} and P(X,YZ=1)P(X\leq\cdot,Y\leq\cdot\mid Z=1) but not CPC_{P} and {P(X,YZ=z)}z=24\left\{P(X\leq\cdot,Y\leq\cdot\mid Z=z)\right\}_{z=2}^{4}, as the data alone do not contain any information on the latter. Consequently, a MGP can be characterized in terms of a specification of the conditional CDFs {P(X,YZ=z)}z=14\left\{P(X\leq\cdot,Y\leq\cdot\mid Z=z)\right\}_{z=1}^{4}.

The above analysis shows τ\tau is partially identified in the missing data setting when were are agnostic about the MGP. The identified set of τ\tau in this case is a closed interval subset of [1,1][-1,1] whose boundary corresponds to the worst-case bounds on τ\tau. These bounds permit the entire spectrum of MGPs, which is especially useful when the data have a large number of missing values, as there can be a diversity of explanations for it.

The next section describes the worst-case bounds on τ\tau in (2.2) and raises the statistical issues concerning testing for dependence between XX and YY in a manner that is robust to the MGP. In developing our results we make use of the Fréchet-Hoeffding copula bounds and two results on extreme values of means of supermodular functions from Puccetti and Wang (2015). To describe these results, denote by 𝒞\mathcal{C} the set of all bivariate copulas on the unit square [0,1]2[0,1]^{2}. The Fréchet-Hoeffding bounds are max{u+v1,0}C(u,v)min{u,v}\max\{u+v-1,0\}\leq C(u,v)\leq\min\{u,v\} for all (u,v)[0,1]2(u,v)\in[0,1]^{2}, which hold for all C𝒞C\in\mathcal{C}.

A function s:2s:\mathbb{R}^{2}\to\mathbb{R} is called supermodular if for all x1x2x_{1}\leq x_{2} and y1y2y_{1}\leq y_{2},

s(x1,y1)+s(x2,y2)s(x1,y2)+s(x2,y1),\displaystyle s(x_{1},y_{1})+s(x_{2},y_{2})\geq s(x_{1},y_{2})+s(x_{2},y_{1}),

important examples of which are copulas. This point is important as the bounds on τ\tau are characterized in terms of copulas of (X,Y)(X,Y)’s joint distribution. The results of Puccetti and Wang (2015) that we utilize are Theorems 2.1 and 3.1 in their paper and we restate them in the following lemma, but in a form that is more suitable for the derivation of our results.

Lemma 1 (Puccetti and Wang (2015)).

Let s:2s:\mathbb{R}^{2}\to\mathbb{R} be a supermodular function, and XX and YY be random variables with marginal CDFs FF and GG respectively. Furthermore, let 𝒞\mathcal{C} be as described above.

  1. 1.

    The moment EH(s(X,Y))E_{H}\left(s\left(X,Y\right)\right), when viewed as a functional of the copula CC through the representation H=C(F,G)H=C(F,G), is maximized when XX and YY are co-monotonic. That is,

    sup{EH(s(X,Y)):H=C(F,G),C𝒞}=EH(s(X,Y)),\displaystyle\sup\{E_{H}\left(s\left(X,Y\right)\right):H=C(F,G),C\in\mathcal{C}\}=E_{H^{*}}\left(s\left(X,Y\right)\right), (2.3)

    where H=min{F,G}H^{*}=\min\left\{F,G\right\}.

  2. 2.

    The functional EH(s(X,Y))E_{H}\left(s\left(X,Y\right)\right), when viewed as a functional of the copula CC through the representation H=C(F,G)H=C(F,G), is minimized when XX and YY are counter-monotonic. That is,

    inf{EH(s(X,Y)):H=C(F,G),C𝒞}=EH(s(X,Y)),\displaystyle\inf\{E_{H}\left(s\left(X,Y\right)\right):H=C(F,G),C\in\mathcal{C}\}=E_{H^{\dagger}}\left(s\left(X,Y\right)\right), (2.4)

    where H=max{F+G1,0}H^{\dagger}=\max\left\{F+G-1,0\right\}.

3 Results

The first result characterizes the worst-case bounds on Kendall’s τ\tau in the case where the marginal CDFs of (X,Y)(X,Y) are known.

Theorem 1.

Let 𝒫F,G\mathcal{P}_{F,G} be given as in Definition 1, and suppose that P𝒫F,GP\in\mathcal{P}_{F,G}. The worst-case bounds of τ\tau under the distribution PP that satisfy τ¯(P)ττ¯(P)\underaccent{\bar}{\tau}(P)\leq\tau\leq\bar{\tau}(P), are given by:

τ¯(P)\displaystyle\bar{\tau}(P) =4EP[min{F(X),G(Y)}Z=1]P(Z=1)+4EP[F(X)Z=2]P(Z=2)\displaystyle=4E_{P}\left[\min\left\{F(X),G(Y)\right\}\mid Z=1\right]P(Z=1)+4E_{P}\left[F(X)\mid Z=2\right]P(Z=2)
+4EP[G(Y)Z=3]P(Z=3)+4P(Z=4)1,and\displaystyle\qquad+4E_{P}\left[G(Y)\mid Z=3\right]P(Z=3)+4P(Z=4)-1,\quad\text{and}
τ¯(P)\displaystyle\underaccent{\bar}{\tau}(P) =4EP[max{F(X)+G(Y)1,0}Z=1]P(Z=1)1.\displaystyle=4E_{P}\left[\max\left\{F(X)+G(Y)-1,0\right\}\mid Z=1\right]P(Z=1)-1.
Proof.

See Appendix A.1. ∎

The result of this theorem is that for each P𝒫F,GP\in\mathcal{P}_{F,G} we can determine bounds on τ\tau that permit the entire spectrum of MGPs. For each P𝒫F,GP\in\mathcal{P}_{F,G}, these bounds are sharp; that is, any value in the interval [τ¯(P),τ¯(P)][\underaccent{\bar}{\tau}(P),\bar{\tau}(P)], including the endpoints, cannot be rejected as the true value of τ(P)\tau(P). This property of the bounds follows from the sharpness of the Fréchet-Hoeffding bounds on a bivariate copula, which we use in the derivation of τ¯(P)\underaccent{\bar}{\tau}(P) and τ¯(P)\bar{\tau}(P).

To test for statistical dependence using τ\tau in a manner that is robust to the form of the MGP, one can only consider tests that depend on observables through τ\tau’s identified set. This means positing the hypothesis testing problem

H0:P0𝒫F,G0versusH1:P0𝒫F,G1,\displaystyle H_{0}:P_{0}\in\mathcal{P}^{0}_{F,G}\quad\text{versus}\quad H_{1}:P_{0}\in\mathcal{P}^{1}_{F,G}, (3.1)

where P0P_{0} is the true distribution of (X,Y,Z)(X,Y,Z), 𝒫F,G0={P𝒫F,G:τ¯(P)0andτ¯(P)0}\mathcal{P}^{0}_{F,G}=\{P\in\mathcal{P}_{F,G}:\bar{\tau}(P)\geq 0\;\text{and}\;\underaccent{\bar}{\tau}(P)\leq 0\} and 𝒫F,G1={P𝒫F,G:τ¯(P)<0orτ¯(P)>0}\mathcal{P}^{1}_{F,G}=\{P\in\mathcal{P}_{F,G}:\bar{\tau}(P)<0\;\text{or}\;\underaccent{\bar}{\tau}(P)>0\}. Notice that 𝒫F,G1\mathcal{P}^{1}_{F,G} is the relative complement of 𝒫F,G0\mathcal{P}^{0}_{F,G} in 𝒫F,G\mathcal{P}_{F,G}; that is, 𝒫F,G1=𝒫F,G𝒫F,G0\mathcal{P}^{1}_{F,G}=\mathcal{P}_{F,G}-\mathcal{P}^{0}_{F,G}, which implies that either τ(P0)<0\tau(P_{0})<0 or τ(P0)>0\tau(P_{0})>0 holds. The next result implies that 𝒫F,G1=\mathcal{P}^{1}_{F,G}=\emptyset, meaning the null hypothesis in (3.1) is always true.

Theorem 2.

Let 𝒫F,G\mathcal{P}_{F,G} be given as in Definition 1. Furthermore, let τ¯(P)\underaccent{\bar}{\tau}(P) and τ¯(P)\bar{\tau}(P) be given as in Theorem 1. Then τ¯(P)0τ¯(P)\underaccent{\bar}{\tau}(P)\leq 0\leq\bar{\tau}(P) for every P𝒫F,GP\in\mathcal{P}_{F,G}.

Proof.

See Appendix A.2. ∎

The result of Theorem 2 shows the worst-case bounds of τ\tau are not informative in the sense that they do not simultaneously take negative or positive values when the joint distribution of (X,Y)(X,Y) exhibits negative or positive dependence, respectively. This property creates an impossibility in testing for dependence on the basis of τ\tau that is robust to missingness of any form, as in (3.1), since it implies that 𝒫F,G1=\mathcal{P}^{1}_{F,G}=\emptyset.

The results of Theorem 1 and 2 have assumed that the marginal CDFs of XX and YY are known to the practitioner. In the scenario where this is not the case, neither of those distributions would be point-identified when we are agnostic about the nature of the MGP. By an application of the Law of Total Probability to FF and GG, we can obtain pointwise bounds on these marginal CDFs as F¯(x)F(x)F¯(x)\underline{F}(x)\leq F(x)\leq\overline{F}(x) and G¯(y)G(y)G¯(y)\underline{G}(y)\leq G(y)\leq\overline{G}(y) for all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y}, with the boundaries themselves CDFs, given by

F¯(x)\displaystyle\overline{F}(x) =P(XxZ=1)+P(XxZ=2)P[Z=2]+P[Z=3]+P[Z=4]x𝒳,\displaystyle=P(X\leq x\mid Z=1)+P(X\leq x\mid Z=2)P[Z=2]+P[Z=3]+P[Z=4]\;\forall x\in\mathcal{X},
F¯(x)\displaystyle\underline{F}(x) ={P(XxZ=1)+P(XxZ=2)P[Z=2] if x𝒳1 if x=sup𝒳,\displaystyle=\begin{cases}P(X\leq x\mid Z=1)+P(X\leq x\mid Z=2)P[Z=2]&\text{ if }x\in\mathcal{X}\\ 1&\text{ if }x=\sup\mathcal{X},\end{cases}

and

G¯(y)\displaystyle\overline{G}(y) =P(YyZ=1)+P(YyZ=3)P[Z=3]+P[Z=2]+P[Z=4]y𝒴,\displaystyle=P(Y\leq y\mid Z=1)+P(Y\leq y\mid Z=3)P[Z=3]+P[Z=2]+P[Z=4]\;\forall y\in\mathcal{Y},
G¯(x)\displaystyle\underline{G}(x) ={P(YyZ=1)+P(YyZ=3)P[Z=3] if y𝒴1 if y=sup𝒴.\displaystyle=\begin{cases}P(Y\leq y\mid Z=1)+P(Y\leq y\mid Z=3)P[Z=3]&\text{ if }y\in\mathcal{Y}\\ 1&\text{ if }y=\sup\mathcal{Y}.\\ \end{cases}

Denoting by \mathcal{F} and 𝒢\mathcal{G} the sets of all CDFs of XX and YY that satisfy the respective bounds described above, the probability model is 𝒫=F,G𝒢𝒫F,G\mathcal{P}=\bigcup_{F\in\mathcal{F},G\in\mathcal{G}}\mathcal{P}_{F,G}. Thus, one has bounds on τ(P)\tau(P) that depend on hypothetical values of the margins FF and GG, and extremizing these bounds with respect to the margins over \mathcal{F} and 𝒢\mathcal{G} yields the worst-case upper and lower bounds, sup(F,G)×𝒢τ¯(P)\sup_{(F,G)\in\mathcal{F}\times\mathcal{G}}\bar{\tau}(P) and inf(F,G)×𝒢τ¯(P)\inf_{(F,G)\in\mathcal{F}\times\mathcal{G}}\underaccent{\bar}{\tau}(P), respectively. Therefore, the conclusion of Theorem 2 also holds for these worst-case bounds since they are wider than their counterparts in the scenario where the marginal CDFs of XX and YY are known to the practitioner.222Scrutinizing the expressions of τ¯(P)\bar{\tau}(P) and τ¯(P)\underaccent{\bar}{\tau}(P), observe that the worst-case bounds in this larger model can be obtained in closed-form. For the upper bound, replace FF and GG in τ¯(P)\bar{\tau}(P) with F¯\overline{F} and G¯\overline{G}, respectively; and for the lower bound, replace FF and GG in τ¯(P)\underaccent{\bar}{\tau}(P) with F¯\underline{F} and G¯\underline{G}, respectively.

4 Discussion

This section discusses the implications of our results. The model 𝒫F,G\mathcal{P}_{F,G} is large, which is the reason why the bounds τ¯\bar{\tau} and τ¯\underaccent{\bar}{\tau} do not yield a partition of 𝒫F,G\mathcal{P}_{F,G} that is compatible with the hypotheses in (3.1). Note that the model is non-parametric and permits (i) the entire spectrum of MGPs, and (ii) all bivariate absolutely continuous copulas in modelling the statistical dependence between XX and YY. This point raises the following question: does restricting 𝒫F,G\mathcal{P}_{F,G} give rise to an identified set for τ\tau whose bounds are informative in detecting dependence between XX and YY? The answer is in the affirmative. Restrictions on 𝒫F,G\mathcal{P}_{F,G} can be motivated by many considerations based on the application at hand. For example, they can be arise from the possession of side-information or by restrictions implied by economic theory as in the partial identification approach in econometrics (e.g., Tamer, 2010). We elaborate on this point with an example of the former utilizing results in Nelsen et al. (2001).

Let P0𝒫F,GP_{0}\in\mathcal{P}_{F,G} denote the true distribution of (X,Y,Z)(X,Y,Z). Suppose we possess side-information that P0(Xx~,Yy~)=θ,P_{0}\left(X\leq\tilde{x},Y\leq\tilde{y}\right)=\theta, where x~\tilde{x} and y~\tilde{y} are the medians of XX and YY, and θ[0,1/2]\theta\in[0,1/2]. Accordingly, we must have CP0(1/2,1/2)=θC_{P_{0}}(1/2,1/2)=\theta, which represents the side-information in terms of the copula. Theorem 1 of Nelsen et al. (2001) provides the bounds on the copula under this restriction, which are given by

C¯θ(u,v)\displaystyle\underaccent{\bar}{C}_{\theta}(u,v) =max{max{u+v1,0},θ(1/2u)+(1/2v)+}and\displaystyle=\max\left\{\max\{u+v-1,0\},\theta-(1/2-u)^{+}-(1/2-v)^{+}\right\}\quad\text{and}
C¯θ(u,v)\displaystyle\bar{C}_{\theta}(u,v) =min{min{u,v},θ+(u1/2)++(v1/2)+},\displaystyle=\min\left\{\min\{u,v\},\theta+(u-1/2)^{+}+(v-1/2)^{+}\right\},

for all (u,v)[0,1]2(u,v)\in[0,1]^{2}, where a+=max{a,0}a^{+}=\max\{a,0\}. Thus, the bounds on the joint distribution of (X,Y)(X,Y) are

C¯θ(F(x),G(y))P(Xx,Yy)C¯θ(F(x),G(y)),(x,y)𝒳×𝒴,\displaystyle\underaccent{\bar}{C}_{\theta}(F(x),G(y))\leq P\left(X\leq x,Y\leq y\right)\leq\bar{C}_{\theta}(F(x),G(y)),\quad\forall(x,y)\in\mathcal{X}\times\mathcal{Y}, (4.1)

which hold for all P𝒫F,G,θP\in\mathcal{P}_{F,G,\theta} where the probability model 𝒫F,G,θ={P𝒫F,G:P(Xx~,Yy~)=θ}\mathcal{P}_{F,G,\theta}=\left\{P\in\mathcal{P}_{F,G}:P\left(X\leq\tilde{x},Y\leq\tilde{y}\right)=\theta\right\} accounts for the side-information. We can apply identical steps as in the proof of Theorem 1 to obtain the corresponding bounds on τ\tau under the model 𝒫F,G,θ\mathcal{P}_{F,G,\theta}, but replacing the Fréchet-Hoeffding lower and upper bounds with C¯θ\underaccent{\bar}{C}_{\theta} and C¯θ\bar{C}_{\theta}, respectively. For brevity, we omit these details. The bounds on τ(P)\tau(P) are given by

τ¯θ(P)\displaystyle\bar{\tau}_{\theta}(P) =4EP[C¯θ(F(X),G(Y))Z=1]P(Z=1)+4EP[F(X)Z=2]P(Z=2)\displaystyle=4E_{P}\left[\bar{C}_{\theta}(F(X),G(Y))\mid Z=1\right]P(Z=1)+4E_{P}\left[F(X)\mid Z=2\right]P(Z=2)
+4EP[G(Y)Z=3]P(Z=3)+4P(Z=4)1and\displaystyle\qquad+4E_{P}\left[G(Y)\mid Z=3\right]P(Z=3)+4P(Z=4)-1\quad\text{and}
τ¯θ(P)\displaystyle\underaccent{\bar}{\tau}_{\theta}(P) =4EP[C¯θ(F(X),G(Y))Z=1]P(Z=1)1,\displaystyle=4E_{P}\left[\underaccent{\bar}{C}_{\theta}(F(X),G(Y))\mid Z=1\right]P(Z=1)-1,

and satisfy [τ¯θ(P),τ¯θ(P)][τ¯(P),τ¯(P)]\left[\underaccent{\bar}{\tau}_{\theta}(P),\bar{\tau}_{\theta}(P)\right]\subset\left[\underaccent{\bar}{\tau}(P),\bar{\tau}(P)\right], for P𝒫F,G,θP\in\mathcal{P}_{F,G,\theta}.

In contrast to the worst-case bounds on τ\tau, the bounds τ¯θ\bar{\tau}_{\theta} and τ¯θ\underaccent{\bar}{\tau}_{\theta} are informative, in the sense that P1,P2,P3𝒫F,G,θ\exists P_{1},P_{2},P_{3}\in\mathcal{P}_{F,G,\theta} such that

τ¯θ(P1),τ(P1)<0,τ¯θ(P2),τ(P2)>0,andτ¯θ(P3)τ(P3)τ¯θ(P3)withτ(P3)=0.\displaystyle\bar{\tau}_{\theta}(P_{1}),\tau(P_{1})<0,\;\underaccent{\bar}{\tau}_{\theta}(P_{2}),\tau(P_{2})>0,\;\text{and}\;\underaccent{\bar}{\tau}_{\theta}(P_{3})\leq\tau(P_{3})\leq\bar{\tau}_{\theta}(P_{3})\;\text{with}\;\tau(P_{3})=0.

We demonstrate this point using a numerical example in which we specify XX and YY being uniformly distributed on [0,1][0,1] for simplicity. Furthermore, we set θ=0.4\theta=0.4 and derive the MGP from a multinomial logit specification for the propensity probabilities; i.e.,

P[Z=zX=x,Y=y]=eγ1,zx+γ2,zyj=14eγ1,jx+γ2,jyz=1,2,3,4.\displaystyle P\left[Z=z\mid X=x,Y=y\right]=\frac{e^{\gamma_{1,z}x+\gamma_{2,z}y}}{\sum_{j=1}^{4}e^{\gamma_{1,j}x+\gamma_{2,j}y}}\quad\forall z=1,2,3,4. (4.2)

Finally, to complete the specification of PP we must designiate the copula of (X,Y)(X,Y), CPC_{P}, as the marginal probabilities of ZZ can be obtained by integrating the propensity scores with respect to this copula. Then, by Bayes’ Theorem, the MGP is given by the conditional probability density functions

P[X=x,Y=yZ=z]=P[Z=zX=x,Y=y][CP(x,y)P[Z=z]]z=1,2,3,4.\displaystyle P\left[X=x,Y=y\mid Z=z\right]=P\left[Z=z\mid X=x,Y=y\right]\left[\frac{C_{P}(x,y)}{P[Z=z]}\right]\quad\forall z=1,2,3,4.

We set CPC_{P} as the bivariate Gaussian copula, with standard normal margins, and construct P1P_{1}, P2P_{2}, and P3P_{3} through setting the correlation coefficient ρ\rho. As τ¯θ(P1),τ¯θ(P2),τ¯θ(P3)\bar{\tau}_{\theta}(P_{1}),\underaccent{\bar}{\tau}_{\theta}(P_{2}),\bar{\tau}_{\theta}(P_{3}), and τ¯θ(P3)\underaccent{\bar}{\tau}_{\theta}(P_{3}) are linear combinations of moments, we calculated them using Monte Carlo simulations with 10810^{8} random draws from the corresponding bivariate Gaussian copula.

The parameter specification for P1P_{1} are as follows: γ1,1=γ2,1=2\gamma_{1,1}=\gamma_{2,1}=2; γ1,2=5,γ2,2=1/4\gamma_{1,2}=-5,\gamma_{2,2}=1/4;γ1,3=5,γ2,3=1/4\gamma_{1,3}=5,\gamma_{2,3}=-1/4;γ1,4=γ2,4=5\gamma_{1,4}=\gamma_{2,4}=-5; and ρ=0.999\rho=-0.999. This yields τ(P1)<0\tau(P_{1})<0 and τ¯θ(P1)0.0108\bar{\tau}_{\theta}(P_{1})\approx-0.0108. The parameter specification for P2P_{2} are as follows: γ1,1=γ2,1=1/2\gamma_{1,1}=\gamma_{2,1}=1/2; γ1,2=3,γ2,2=1/2\gamma_{1,2}=3,\gamma_{2,2}=1/2;γ1,3=1/2,γ2,3=2\gamma_{1,3}=1/2,\gamma_{2,3}=-2;γ1,4=γ2,4=2\gamma_{1,4}=\gamma_{2,4}=2; and ρ=0.99\rho=0.99. This specification has τ(P2)>0\tau(P_{2})>0 and gives rise to τ¯θ(P1)0.034\underaccent{\bar}{\tau}_{\theta}(P_{1})\approx 0.034. Finally, the parameter specification for P3P_{3} are identical to that under P2P_{2} except that now ρ=0\rho=0. This specification has τ(P3)=0\tau(P_{3})=0 and gives rise to τ¯θ(P3)0.32\underaccent{\bar}{\tau}_{\theta}(P_{3})\approx-0.32 and τ¯θ(P3)0.63\bar{\tau}_{\theta}(P_{3})\approx 0.63.

The numerical results demonstrate the refined bounds can be informative in the detection of dependence. In such a situation, the practitioner can consider the following testing problem

H0:P0𝒫F,G,θ0versusH1:P0𝒫F,G,θ1,\displaystyle H_{0}:P_{0}\in\mathcal{P}_{F,G,\theta}^{0}\quad\text{versus}\quad H_{1}:P_{0}\in\mathcal{P}_{F,G,\theta}^{1}, (4.3)

where 𝒫F,G,θ0={P𝒫F,G,θ:τ¯θ(P)0andτ¯θ(P)0}\mathcal{P}_{F,G,\theta}^{0}=\{P\in\mathcal{P}_{F,G,\theta}:\bar{\tau}_{\theta}(P)\geq 0\,\text{and}\,\underaccent{\bar}{\tau}_{\theta}(P)\leq 0\} and 𝒫F,G,θ1={P𝒫F,G,θ:τ¯θ(P)<0orτ¯θ(P)>0}\mathcal{P}_{F,G,\theta}^{1}=\{P\in\mathcal{P}_{F,G,\theta}:\bar{\tau}_{\theta}(P)<0\,\text{or}\,\underaccent{\bar}{\tau}_{\theta}(P)>0\} form a partition of 𝒫F,G,θ\mathcal{P}_{F,G,\theta}. The bounds are linear combinations of moments but with unknown coefficients being the marginal probabilities P[Z=z]P[Z=z]. As these marginal probabilities are typically estimable at the n\sqrt{n}-rate, one can adapt moment inequality testing procedures for this situation, which are abundant and well-established (e.g., Andrews and Soares, 2010; Canay, 2010; Romano et al., 2014). Developing the details of such a testing procedure goes beyond the intended scope of the paper, and is left for future research.

As this side-information only restricts the dependence between XX and YY, any valid testing procedure that rejects H0H_{0} in favour of H1H_{1} in (4.3) would be robust to the nature of the MGP. This robustness, however, comes at the expense of an ambiguity under the null. Specifically, P𝒫F,G,θ\exists P\in\mathcal{P}_{F,G,\theta} such that τ(P)0\tau(P)\neq 0 and τ¯θ(P)0τ¯θ(P)\underaccent{\bar}{\tau}_{\theta}(P)\leq 0\leq\bar{\tau}_{\theta}(P). The uninformative nature of H0H_{0} in (4.3) is a consequence of circumventing assumptions on the MGP, which are unverifiable in practice. If one fails to reject the null, then, unfortunately, one cannot conclude anything informative about the dependence between XX and YY. In such a situation, we recommend empirical researchers perform a sensitivity analysis of this empirical conclusion (i.e., non-rejection of H1H_{1}) with respect to plausible assumptions on the MGP. The virtue of this type of analysis is that it establishes, in a transparent way, clear links between empirical outcomes and different assumptions made on the MGP. Such an analysis would reveal non-trivial links between assumptions on the MGP and inferences made. See, for example,  Blundell et al. (2007) and Lee (2009) who refine worst-case distributional bounds using economic theory and develop testable implications based on them in the contexts of distributional analyses and treatment effect, respectively. See also Fakih et al. (2021) who discuss the refinement of worst-case distributional bounds of ordinal variables in the context of stochastic dominance testing by positing assumptions on the form of nonresponse in self-reported surveys.

5 Conclusion

This paper establishes the impossibility of performing inference for dependence between two continuous random variables using Kendall’s τ\tau under missing data of unknown form. The crux of the issue is that its identified set always includes zero, implying that the sign of τ\tau is not identified. We show how refining this identified set using additional information can address this problem, creating a pathway for robust inference based on statistical procedures from the moment inequality testing literature.

6 Acknowledgement

We thank Brendan K. Beare and Christopher D. Walker for helpful feedback and comments.

References

  • Alvo and Cabilio (1995) Alvo, M. and P. Cabilio (1995). Rank correlation methods for missing data. Canadian Journal of Statistics 23(4), 345–358.
  • Andrews and Soares (2010) Andrews, D. W. K. and G. Soares (2010). Inference for Parameters Defined by Moment Inequalities using Generalized Moment Selection. Econometrica 78(1), 119–157.
  • Asimit et al. (2016) Asimit, A. V., R. Gerrard, Y. Hou, and L. Peng (2016). Tail dependence measure for examining financial extreme co-movements. Journal of econometrics 194(2), 330–348.
  • Bagdonavicius et al. (2011) Bagdonavicius, V., J. Kruopis, and M. Nikulin (2011). Nonparametric Tests for Complete Data (First ed.). Wiley.
  • Bahadur and Savage (1956) Bahadur, R. R. and L. J. Savage (1956). The Nonexistence of Certain Statistical Procedures in Nonparametric Problems. The Annals of Mathematical Statistics 27(4), 1115 – 1122.
  • Bertanha and Moreira (2020) Bertanha, M. and M. J. Moreira (2020). Impossible inference in econometrics: Theory and applications. Journal of Econometrics 218(2), 247–270.
  • Blundell et al. (2007) Blundell, R., A. Gosling, H. Ichimura, and C. Meghir (2007). Changes in the distribution of male and female wages accounting for employment composition using bounds. Econometrica 75(2), 323–363.
  • Cameron and Trivedi (1993) Cameron, A. C. and P. K. Trivedi (1993). Tests of independence in parametric models with applications and illustrations. Journal of Business & Economic Statistics 11(1), 29–43.
  • Canay (2010) Canay, I. A. (2010). EL Inference for Partially Identified Models:Large Deviations Optimality and Bootstrap Validity. Journal of Econometrics 156(2), 408–425.
  • Canay and Shaikh (2016) Canay, I. A. and A. M. Shaikh (2016, January). Practical and theoretical advances in inference for partially identified models. CeMMAP working papers CWP05/16, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
  • Canisius et al. (2016) Canisius, S., L. Wessels, and J. W. M. Martens (2016). A novel independence test for somatic alterations in cancer shows that biology drives mutual exclusivity but chance explains most co-occurrence. Genome Biology 17(261).
  • Dutz et al. (2021) Dutz, D., I. Huitfeldt, S. Lacouture, M. Mogstad, A. Torgovitsky, and W. van Dijk (2021, December). Selection in surveys. National Bureau of Economic Research. Working Paper 29549.
  • Fakih et al. (2021) Fakih, A., P. Makdissi, W. Marrouch, R. V. Tabri, and M. Yazbeck (2021). A stochastic dominance test under survey nonresponse with an application to comparing trust levels in Lebanese public institutions. Journal of Econometrics.
  • Fan and Patton (2014) Fan, Y. and A. J. Patton (2014). Copulas in econometrics. Annual Review of Economics 6(1), 179–200.
  • Fox et al. (2015) Fox, G. A., S. Negrete-Yankelevich, and V. J. Sosa (2015). Ecological Statistics: Contemporary theory and application. Oxford University Press.
  • Horowitz and Manski (1995) Horowitz, J. L. and C. F. Manski (1995). Identification and robustness with contaminated and corrupted data. Econometrica 63(2), 281–302.
  • Horváth and Rice (2015) Horváth, L. and G. Rice (2015). Testing for independence between functional time series. Journal of econometrics 189(2), 371–382.
  • Kendall (1938) Kendall, M. G. (1938). A new measure of rank correlation. Biometrika 30(1), 81–93.
  • Lee (2009) Lee, D. S. (2009, 07). Training, wages, and sample selection: Estimating sharp bounds on treatment effects. The Review of Economic Studies 76(3), 1071–1102.
  • Ma (2012) Ma, Y. (2012). On inference for kendall’s τ\tau within a longitudinal data setting. Journal of Applied Statistics 39(11), 2441–2452.
  • Manski (1989) Manski, C. F. (1989). Anatomy of the selection problem. The Journal of Human Resources 24(3), 343–360.
  • Manski (2003) Manski, C. F. (2003). Partial Identification of Probability Distributions. Springer.
  • Nelsen (2006) Nelsen, R. B. (2006). An Introduction to Copulas (Second ed.). Springer.
  • Nelsen et al. (2001) Nelsen, R. B., J. J. Quesada-Molina, J. A. Rodriíguez-Lallena, and M. Úbeda Flores (2001). Bounds on bivariate distribution functions with given margins and measures of association. Communications in Statistics - Theory and Methods 30(6), 1055–1062.
  • Puccetti and Wang (2015) Puccetti, G. and R. Wang (2015). Extremal dependence concepts. Statistical Science 30(4), 485–517.
  • Romano et al. (2014) Romano, J. P., A. M. Shaikh, and M. Wolf (2014). A practical two-step method for testing moment inequalities. Econometrica 82(5), 1979–2002.
  • Sklar (1959) Sklar, A. (1959). Fonctions de répartition à n dimensions et leurs marges. Publications de l’Institut Statistique de l’Université de Paris 8(2), 229–231.
  • Swihart and Slade (1985) Swihart, R. K. and N. A. Slade (1985). Testing for independence of observations in animal movements. Ecology (Durham) 66(4), 1176–1184.
  • Tamer (2010) Tamer, E. (2010). Partial identification in econometrics. Annual Reviews in Economics 2(1), 167–195.
  • Yu et al. (2016) Yu, P. L., K. Lam, and M. Alvo (2016). Nonparametric rank tests for independence in opinion surveys. Österreichische Zeitschrift für Statistik 31(4).

Appendix A Proofs of Results

A.1 Proof of Theorem 1

Proof.

The proof proceeds by the direct method. We shall derive bounds on τ(P)\tau(P) for each P𝒫F,GP\in\mathcal{P}_{F,G} and recall that

τ(P)=4z=14EP[CP(F(X),G(Y))Z=z]P[Z=z]1.\displaystyle\tau(P)=4\sum_{z=1}^{4}E_{P}\left[C_{P}\left(F(X),G(Y)\right)\mid Z=z\right]\,P[Z=z]-1.

First, we focus on the upper bound, and bound each term appearing in the sum separately. Starting with EP[CP(F(X),G(Y))Z=1]E_{P}\left[C_{P}\left(F(X),G(Y)\right)\mid Z=1\right], note that it is less than or equal to EP[min{F(X),G(Y)}Z=1]E_{P}\left[\min\left\{F(X),G(Y)\right\}\mid Z=1\right], since by Fréchet-Hoeffding upper bound in 22-dimensions we have that CP(F(x),G(y))min{F(x),G(y)}C_{P}\left(F(x),G(y)\right)\leq\min\left\{F(x),G(y)\right\} for all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y}. Now focusing on the term EP[CP(F(X),G(Y))Z=2]E_{P}\left[C_{P}\left(F(X),G(Y)\right)\mid Z=2\right], note that YY is not observed and we replace G(Y)G(Y) it with its largest theoretical value, 11. Thus we bound CP(F(x),G(y))CP(F(x),1)=F(x)C_{P}(F(x),G(y))\leq C_{P}(F(x),1)=F(x) which holds for all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y}. Therefore, EP[CP(F(X),G(Y))Z=2]EP[F(X)Z=2]E_{P}\left[C_{P}\left(F(X),G(Y)\right)\mid Z=2\right]\leq E_{P}\left[F(X)\mid Z=2\right]. Similarly, we bound CP(F(x),G(y))CP(1,G(y))=G(y)C_{P}(F(x),G(y))\leq C_{P}(1,G(y))=G(y) which holds for all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y}. Hence, EP[CP(F(X),G(Y))Z=3]EP[G(Y)Z=3]E_{P}\left[C_{P}\left(F(X),G(Y)\right)\mid Z=3\right]\leq E_{P}\left[G(Y)\mid Z=3\right]. Finally, on the event {Z=4}\{Z=4\}, which is when neither XX nor YY is observed, we bound CP(F(x),G(y))C_{P}\left(F(x),G(y)\right) from above by 11, which holds for all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y}. This yields EP[CP(F(X),G(Y))Z=4]1E_{P}\left[C_{P}\left(F(X),G(Y)\right)\mid Z=4\right]\leq 1. Combining these bounds yields

τ(P)\displaystyle\tau(P) 4EP[min{F(X),G(Y)}Z=1]P(Z=1)+4EP[F(X)Z=2]P(Z=2)\displaystyle\leq 4E_{P}\left[\min\left\{F(X),G(Y)\right\}\mid Z=1\right]P(Z=1)+4E_{P}\left[F(X)\mid Z=2\right]P(Z=2)
+4EP[G(Y)Z=3]P(Z=3)+4P(Z=4)1\displaystyle\qquad+4E_{P}\left[G(Y)\mid Z=3\right]P(Z=3)+4P(Z=4)-1
=τ¯(P).\displaystyle=\bar{\tau}(P).

Next, we focus on the lower bound, and bound each term appearing in the representation of τ(P)\tau(P) separately. Starting with EP[CP(F(X),G(Y))Z=1]E_{P}\left[C_{P}\left(F(X),G(Y)\right)\mid Z=1\right], note that it is greater than or equal to
EP[max{F(X)+G(Y)1,0}Z=1]E_{P}\left[\max\left\{F(X)+G(Y)-1,0\right\}\mid Z=1\right], since by Fréchet-Hoeffding lower bound in 22-dimensions we have that CP(F(x),G(y))max{F(x)+G(y)1,0}C_{P}\left(F(x),G(y)\right)\geq\max\left\{F(x)+G(y)-1,0\right\} for all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y}. Now focusing on the term
EP[CP(F(X),G(Y))Z=2]E_{P}\left[C_{P}\left(F(X),G(Y)\right)\mid Z=2\right], note that YY is not observed and we replace G(Y)G(Y) it with its smallest theoretical value, 0. Thus, we bound CP(F(x),G(y))max{F(x)+G(y)1,0}max{F(x)1,0}=0C_{P}(F(x),G(y))\geq\max\left\{F(x)+G(y)-1,0\right\}\geq\max\left\{F(x)-1,0\right\}=0, which holds for all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y}, implying that EP[CP(F(X),G(Y))Z=2]0E_{P}\left[C_{P}\left(F(X),G(Y)\right)\mid Z=2\right]\geq 0. Similarly we bound CP(F(x),G(y))max{F(x)+G(y)1,0}0C_{P}(F(x),G(y))\geq\max\left\{F(x)+G(y)-1,0\right\}\geq 0, which holds for all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y}, so that EP[CP(F(X),G(Y))Z=3]0E_{P}\left[C_{P}\left(F(X),G(Y)\right)\mid Z=3\right]\geq 0. Finally, on the event {Z=4}\{Z=4\}, which is when neither XX nor YY is observed, we bound EP[CP(F(X),G(Y))Z=4]E_{P}\left[C_{P}\left(F(X),G(Y)\right)\mid Z=4\right] from below by 0. Combining these bounds yields

τ(P)4EP[max{F(X)+G(Y)1,0}Z=1]P(Z=1)1=τ¯(P).\displaystyle\tau(P)\geq 4E_{P}\left[\max\left\{F(X)+G(Y)-1,0\right\}\mid Z=1\right]P(Z=1)-1=\underaccent{\bar}{\tau}(P).

This concludes the proof. ∎

A.2 Proof of Theorem 2

Proof.

The proof proceeds by the direct method. We split the proof into two cases: (i) showing τ¯(P)0\underaccent{\bar}{\tau}(P)\leq 0 for every P𝒫F,GP\in\mathcal{P}_{F,G}, and (ii) showing τ¯(P)0\bar{\tau}(P)\geq 0 for every P𝒫F,GP\in\mathcal{P}_{F,G}.

Part (i). Fix P𝒫F,GP\in\mathcal{P}_{F,G}. Firstly, note that if P(Z=1)=0P(Z=1)=0, then τ¯(P)=1\underaccent{\bar}{\tau}(P)=-1, and the inequality trivially holds. Now, we consider the case P(Z=1)>0P(Z=1)>0, and note that in this case

τ¯(P)=4P(Z=1)𝒳×𝒴max{F(x)+G(y)1,0}p(x,yZ=1)𝑑x𝑑y1,\displaystyle\underaccent{\bar}{\tau}(P)=4P(Z=1)\int_{\mathcal{X}\times\mathcal{Y}}\max\left\{F(x)+G(y)-1,0\right\}\,p(x,y\mid Z=1)\,dxdy-1, (A.1)

where we have used the Condition (i) in the definition of 𝒫F,G\mathcal{P}_{F,G} to express the integrator in terms of a density. By Condition (ii), we can re-write p(x,yZ=1)p(x,y\mid Z=1) as follows:

p(x,yZ=1)=p(x,y,1)P(Z=1)=p(x,y,1)pX,Y(x,y)pX,Y(x,y)P(Z=1)=P(Z=1X=x,Y=y)pX,Y(x,y)P(Z=1).\displaystyle p(x,y\mid Z=1)=\frac{p(x,y,1)}{P(Z=1)}=\frac{p(x,y,1)}{p_{X,Y}(x,y)}\,\frac{p_{X,Y}(x,y)}{P(Z=1)}=P(Z=1\mid X=x,Y=y)\,\frac{p_{X,Y}(x,y)}{P(Z=1)}.

Substituting this expression for p(x,yZ=1)p(x,y\mid Z=1) in (A.1) and simplifying yields

τ¯(P)\displaystyle\underaccent{\bar}{\tau}(P) =4𝒳×𝒴max{F(x)+G(y)1,0}P(Z=1X=x,Y=y)pX,Y(x,y)dxdy1\displaystyle=4\int_{\mathcal{X}\times\mathcal{Y}}\max\left\{F(x)+G(y)-1,0\right\}\,P(Z=1\mid X=x,Y=y)\,p_{X,Y}(x,y)\,dxdy-1
4𝒳×𝒴max{F(x)+G(y)1,0}pX,Y(x,y)𝑑x𝑑y1,\displaystyle\leq 4\int_{\mathcal{X}\times\mathcal{Y}}\max\left\{F(x)+G(y)-1,0\right\}\,p_{X,Y}(x,y)\,dxdy-1, (A.2)

where the inequality follow from P(Z=1X=x,Y=y)1P(Z=1\mid X=x,Y=y)\leq 1 holding for all (x,y)𝒳×𝒴(x,y)\in\mathcal{X}\times\mathcal{Y}. We can maximize the integral in (A.2) with respect to the joint distribution of (X,Y)(X,Y) to find that

τ¯(P)4supP𝒫F,GEP[max{F(X)+G(Y)1,0}]1.\displaystyle\underaccent{\bar}{\tau}(P)\leq 4\sup_{P\in\mathcal{P}_{F,G}}E_{P}\left[\max\left\{F(X)+G(Y)-1,0\right\}\right]-1. (A.3)

Since the function (x,y)max{F(x)+G(y)1,0}(x,y)\mapsto\max\left\{F(x)+G(y)-1,0\right\} is supermodular, Part 1 of Lemma 1 implies

supP𝒫F,GEP[max{F(X)+G(Y)1,0}]EH[max{F(X)+G(Y)1,0}],\displaystyle\sup_{P\in\mathcal{P}_{F,G}}E_{P}\left[\max\left\{F(X)+G(Y)-1,0\right\}\right]\leq E_{H^{*}}\left[\max\left\{F(X)+G(Y)-1,0\right\}\right],

where H=min{F,G}H^{*}=\min\left\{F,G\right\}. Now, we shall argue that EH[max{F(X)+G(Y)1,0}]=1/4E_{H^{*}}\left[\max\left\{F(X)+G(Y)-1,0\right\}\right]=1/4. As the CDFs FF and GG are known, we define the new random variables U:=F(X)U:=F(X) and V:=G(X)V:=G(X). By the Probability Integral Transform, both UU and VV are distributed as continuous uniform on the unit interval [0,1][0,1]. This yields the representation

EH[max{F(X)+G(Y)1,0}]=EC[max{U+V1,0}].\displaystyle E_{H^{*}}\left[\max\left\{F(X)+G(Y)-1,0\right\}\right]=E_{C^{*}}\left[\max\left\{U+V-1,0\right\}\right].

where C(u,v)=min(u,v)C^{*}(u,v)=\min(u,v). This copula is supported on the line segment u=vu=v in the unit square [0,1]2[0,1]^{2}. Consequently,

EC[max{U+V1,0}]=1/21(2u1)𝑑u=1/4.E_{C^{*}}\left[\max\left\{U+V-1,0\right\}\right]=\int_{1/2}^{1}(2u-1)\,du=1/4.

Therefore, τ¯(P)0\underaccent{\bar}{\tau}(P)\leq 0. Since we have chosen an arbitrary P𝒫F,GP\in\mathcal{P}_{F,G}, the deduction τ¯(P)0\underaccent{\bar}{\tau}(P)\leq 0 holds for all P𝒫F,GP\in\mathcal{P}_{F,G}. This concludes the proof for the lower-bound.

Part (ii). First, recall that

τ¯(P)\displaystyle\bar{\tau}(P) =4EP[min{F(X),G(Y)}Z=1]P(Z=1)+4EP[F(X)Z=2]P(Z=2)\displaystyle=4E_{P}\left[\min\left\{F(X),G(Y)\right\}\mid Z=1\right]P(Z=1)+4E_{P}\left[F(X)\mid Z=2\right]P(Z=2)
+4EP[G(Y)Z=3]P(Z=3)+4P(Z=4)1.\displaystyle\qquad+4E_{P}\left[G(Y)\mid Z=3\right]P(Z=3)+4P(Z=4)-1.

In the case that P(Z=4)=1P(Z=4)=1 holds, we have that τ¯(P)=30\bar{\tau}(P)=3\geq 0, which is the desired result. Now we consider the case where P(Z=4)[0,1)P(Z=4)\in[0,1). Substituting P(Z=4)=1z=13P(Z=z)P(Z=4)=1-\sum_{z=1}^{3}P(Z=z) into the expression of τ¯(P)\bar{\tau}(P) above and simplifying yields

τ¯(P)\displaystyle\bar{\tau}(P) =4EP[min{F(X),G(Y)}1Z=1]P(Z=1)+4EP[F(X)1Z=2]P(Z=2)\displaystyle=4E_{P}\left[\min\left\{F(X),G(Y)\right\}-1\mid Z=1\right]P(Z=1)+4E_{P}\left[F(X)-1\mid Z=2\right]P(Z=2)
+4EP[G(Y)1Z=3]P(Z=3)+3\displaystyle\qquad+4E_{P}\left[G(Y)-1\mid Z=3\right]P(Z=3)+3
4z=13EP[min{F(X),G(Y)}1Z=z]P(Z=z)+3,\displaystyle\geq 4\sum_{z=1}^{3}E_{P}\left[\min\left\{F(X),G(Y)\right\}-1\mid Z=z\right]P(Z=z)+3, (A.4)

where the inequality (A.4) arises from the fact that F(X)1F(X)-1 and G(Y)1G(Y)-1 are bounded from below by min{F(X),G(Y)}1\min\left\{F(X),G(Y)\right\}-1 with probability one (under PP). Next, fix an arbitrary z{1,2,3}z\in\{1,2,3\} such that P(Z=z)>0P(Z=z)>0. Note that such a zz exists since P(Z=4)[0,1)P(Z=4)\in[0,1). We will re-express

EP[min{F(X),G(Y)}1Z=z]P(Z=z)E_{P}\left[\min\left\{F(X),G(Y)\right\}-1\mid Z=z\right]P(Z=z)

in a more convenient form to apply Part 2 of Lemma 1. It is by definition equal to

P(Z=z)𝒳×𝒴(min{F(x),G(y)}1)p(x,y,z)P(Z=z)𝑑x𝑑y.\displaystyle P(Z=z)\int_{\mathcal{X}\times\mathcal{Y}}\left(\min\left\{F(x),G(y)\right\}-1\right)\frac{p(x,y,z)}{P(Z=z)}\,dxdy.

Now since P𝒫F,GP\in\mathcal{P}_{F,G} we can multiply and divide by pX,Y(x,y)p_{X,Y}(x,y) in the integrand and simplifying yields

𝒳×𝒴(min{F(x),G(y)}1)p(x,y,z)pX,Y(x,y)pX,Y(x,y)𝑑x𝑑y,\displaystyle\int_{\mathcal{X}\times\mathcal{Y}}\left(\min\left\{F(x),G(y)\right\}-1\right)\frac{p(x,y,z)}{p_{X,Y}(x,y)}p_{X,Y}(x,y)\,dxdy,

which is equal to

𝒳×𝒴(min{F(x),G(y)}1)P[Z=zX=x,Y=y]pX,Y(x,y)dxdy.\displaystyle\int_{\mathcal{X}\times\mathcal{Y}}\left(\min\left\{F(x),G(y)\right\}-1\right)P\left[Z=z\mid X=x,Y=y\right]p_{X,Y}(x,y)\,dxdy.

Note that this re-writing applies for each z=1,2,3z=1,2,3 for which P(Z=z)>0P(Z=z)>0. Now because

0P[Z=zX=x,Y=y]\displaystyle 0\leq P\left[Z=z\mid X=x,Y=y\right] 1(x,y,z)𝒳×𝒴×{1,2,3,4}and\displaystyle\leq 1\quad\forall(x,y,z)\in\mathcal{X}\times\mathcal{Y}\times\{1,2,3,4\}\,\text{and}
min{F(x),G(y)}1\displaystyle\min\left\{F(x),G(y)\right\}-1 0(x,y)𝒳×𝒴,\displaystyle\leq 0\quad\forall(x,y)\in\mathcal{X}\times\mathcal{Y},

the expression in (A.4) is bounded from below by

4𝒳×𝒴(min{F(x),G(y)}1)pX,Y(x,y)𝑑x𝑑y+3=4EP[min{F(X),G(Y)}1]+3,\displaystyle 4\int_{\mathcal{X}\times\mathcal{Y}}\left(\min\left\{F(x),G(y)\right\}-1\right)p_{X,Y}(x,y)\,dxdy+3=4E_{P}\left[\min\left\{F(X),G(Y)\right\}-1\right]+3, (A.5)

which in turn is bounded from below by

4infP𝒫F,GEP[min{F(X),G(Y)}1]+3.\displaystyle 4\inf_{P\in\mathcal{P}_{F,G}}E_{P}\left[\min\left\{F(X),G(Y)\right\}-1\right]+3. (A.6)

As the function (x,y)min{F(x),G(y)}1(x,y)\mapsto\min\left\{F(x),G(y)\right\}-1 is supermodular, and the integrator in (A.6) only depends on the joint distribution of (X,Y)(X,Y), we can apply Part 2 of Lemma 1 to deduce that the minimal value (A.6) is bounded from below:

infP𝒫F,GEP[min{F(X),G(Y)}1]EH[min{F(X),G(Y)}1],\displaystyle\inf_{P\in\mathcal{P}_{F,G}}E_{P}\left[\min\left\{F(X),G(Y)\right\}-1\right]\geq E_{H^{\dagger}}\left[\min\left\{F(X),G(Y)\right\}-1\right], (A.7)

where H=max{F+G1,0}H^{\dagger}=\max\left\{F+G-1,0\right\}. Now we argue that the right side of the inequality in (A.7) equals -3/4 to deduce the result of this theorem. This expected value equals

(01/2(u1)𝑑u1/21u𝑑u)=3/4.\left(\int_{0}^{1/2}(u-1)\,du-\int_{1/2}^{1}u\,du\right)=-3/4.

Now, using this result, we find that

τ¯(P)4infP𝒫F,GEP[min{F(X),G(Y)}1]+30,\displaystyle\bar{\tau}(P)\geq 4\inf_{P\in\mathcal{P}_{F,G}}E_{P}\left[\min\left\{F(X),G(Y)\right\}-1\right]+3\geq 0, (A.8)

concluding the proof. ∎