This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Universal inference with composite likelihoods

Hien D Nguyen 1\text{ }^{1} Corresponding author—Email: [email protected]. 1Department of Mathematics and Statistics, La Trobe University, Melbourne, Australia.    Jessica Bagnall-Guerreiro1    Andrew T Jones1
Abstract

Maximum composite likelihood estimation is a useful alternative to maximum likelihood estimation when data arise from data generating processes (DGPs) that do not admit tractable joint specification. We demonstrate that generic composite likelihoods consisting of marginal and conditional specifications permit the simple construction of composite likelihood ratio-like statistics from which finite-sample valid confidence sets and hypothesis tests can be constructed. These statistics are universal in the sense that they can be constructed from any estimator for the parameter of the underlying DGP. We demonstrate our methodology via a simulation study using a pair of conditionally specified bivariate models.

Key words: Composite likelihoods; Pseudolikelihoods, Confidence sets; Hypothesis tests; Conditional models

1 Introduction

Likelihood-based methods are among the most important tools for conducting statistical inference. However, data generating processes (DGPs) of complex models often do not admit tractable likelihood functions. In such cases, a potential remedy is to specify the model based on more amenable marginal and conditional probability density/mass functions (PDFs/PMFs) of the DGP, instead. This joint specification is often referred to as the composite likelihood (CL) or pseudolikelihood.

The literature regarding CL-based inference has its roots in the works of Besag (1975) and Lindsay (1988). Further developments regarding the theory and application of CL methods can be found in Arnold and Strauss (1991), Molenberghs and Verbeke (2005), Varin et al. (2011), Yi (2014), and Nguyen (2018), among other works.

We build upon the recent work of Wasserman et al. (2020) who demonstrated the construction of sample splitting and sample swapping likelihood ratio statistics that yield finite-sample valid confidence sets and hypothesis tests, and are universal in the sense that they are agnostic to parameter estimators. The inferential constructions are similar to the recently popularized ee-values of Vovk and Wang (2021), as well as the ss-values of Grunwald et al. (2020) and the betting scores of Shafer (2021). We demonstrate how our CL-based methods can be used via applications to constructing confidence sets and tests for a pair of conditionally specified bivariate models. Here, we consider a simulation study regarding the exponential conditional model of Arnold et al. (1999) and the log-normal conditional model of Sarabia et al. (2007).

The paper proceeds as follows. In Section 2, we present the CL framework and the universal confidence set and hypothesis test constructions. A simulation study of our methodology is presented in Section 3.

2 Universal inference via composite likelihoods

Let 𝑿𝕏d\bm{X}\in\mathbb{X}\subseteq\mathbb{R}^{d} be a random variable arising from a parametric distribution characterized by the PDF/PMF (generically, PDF) p(𝒙;𝜽)p\left(\bm{x};\bm{\theta}\right), where 𝜽Θq\bm{\theta}\in\Theta\subseteq\mathbb{R}^{q} is a parameter vector (d,qd,q\in\mathbb{N}). We shall write 𝑿=(X1,,Xd)\bm{X}^{\top}=\left(X_{1},\dots,X_{d}\right) to indicate a random variable and 𝒙=(x1,,xd)\bm{x}^{\top}=\left(x_{1},\dots,x_{d}\right) to indicate its realization.

Let 2[d]2^{\left[d\right]} be the power set of [d]={1,,d}\left[d\right]=\left\{1,\dots,d\right\}, and let 𝕊d=2[d]\{}\mathbb{S}_{d}=2^{\left[d\right]}\backslash\left\{\varnothing\right\}. For each 𝒮𝕊d\mathcal{S}\in\mathbb{S}_{d}, let 𝒮={s1,,s|𝒮|}[d]\mathcal{S}=\left\{s_{1},\dots,s_{\left|\mathcal{S}\right|}\right\}\subseteq\left[d\right], where |𝒮|\left|\mathcal{S}\right| is the cardinality of 𝒮\mathcal{S}. Further, let 𝕋d\mathbb{T}_{d} be the set of all divisions of [d]\left[d\right] into two nonempty subsets. We represent each element of 𝕋d\mathbb{T}_{d} as a pair 𝒯=(𝒯,𝒯)\mathcal{T}=\left(\overleftarrow{\mathcal{T}},\overrightarrow{\mathcal{T}}\right), where 𝒯={t1,,t|𝒯|}[d]\overleftarrow{\mathcal{T}}=\left\{\overleftarrow{t}_{1},\dots,\overleftarrow{t}_{\left|\overleftarrow{\mathcal{T}}\right|}\right\}\subset\left[d\right] and 𝒯={t1,,t|𝒯|}[d]\𝒯\overrightarrow{\mathcal{T}}=\left\{\overrightarrow{t}_{1},\dots,\overrightarrow{t}_{\left|\overrightarrow{\mathcal{T}}\right|}\right\}\subset\left[d\right]\backslash\overleftarrow{\mathcal{T}} are the ’left-hand’ and ’right-hand’ subsets of the division 𝒯\mathcal{T}, respectively. We note that |𝕊d|=2d1\left|\mathbb{S}_{d}\right|=2^{d}-1 and |𝕋d|=3d2d+1+1\left|\mathbb{T}_{d}\right|=3^{d}-2^{d+1}+1.

For each 𝒮\mathcal{S} and 𝒯\mathcal{T}, we assign a non-negative coefficient σ𝒮\sigma_{\mathcal{S}} and τ𝒯\tau_{\mathcal{T}}, respectively. We call these coefficients the weights, and we put these weights into the vectors 𝝈=(σ𝒮)𝒮𝕊d\bm{\sigma}=\left(\sigma_{\mathcal{S}}\right)_{\mathcal{S}\in\mathbb{S}_{d}} and 𝝉=(τ𝒯)𝒯𝕋d\bm{\tau}=\left(\tau_{\mathcal{T}}\right)_{\mathcal{T}\in\mathbb{T}_{d}}, respectively. We assume that υ=𝒮𝕊dσ𝒮+𝒯𝕋dτ𝒯>0\upsilon=\sum_{\mathcal{S}\in\mathbb{S}_{d}}\sigma_{\mathcal{S}}+\sum_{\mathcal{T}\in\mathbb{T}_{d}}\tau_{\mathcal{T}}>0.

Given weights 𝝈\bm{\sigma} and 𝝉\bm{\tau}, we define the individual CL (ICL) function for 𝑿\bm{X} as

p𝝈,𝝉(𝒙;𝜽)=𝒮𝕊d[p(𝒙𝒮;𝜽)]σ𝒮/υ𝒯𝕋d[p(𝒙𝒯|𝒙𝒯;𝜽)]τ𝒯/υ,p_{\bm{\sigma},\bm{\tau}}\left(\bm{x};\bm{\theta}\right)=\prod_{\mathcal{S}\in\mathbb{S}_{d}}\left[p\left(\bm{x}_{\mathcal{S}};\bm{\theta}\right)\right]^{\sigma_{\mathcal{S}}/\upsilon}\prod_{\mathcal{T}\in\mathbb{T}_{d}}\left[p\left(\bm{x}_{\overleftarrow{\mathcal{T}}}|\bm{x}_{\overrightarrow{\mathcal{T}}};\bm{\theta}\right)\right]^{\tau_{\mathcal{T}}/\upsilon}\text{,}

where 𝒙𝒮=(xs1,,xs|𝒮|)\bm{x}_{\mathcal{S}}^{\top}=\left(x_{s_{1}},\dots,x_{s_{\left|\mathcal{S}\right|}}\right), 𝒙𝒯=(xt1,,xt|𝒯|)\bm{x}_{\overleftarrow{\mathcal{T}}}=\left(x_{\overleftarrow{t}_{1}},\dots,x_{\overleftarrow{t}_{\left|\overleftarrow{\mathcal{T}}\right|}}\right), and 𝒙𝒯=(xt1,,xt|𝒯|)\bm{x}_{\overrightarrow{\mathcal{T}}}=\left(x_{\overrightarrow{t}_{1}},\dots,x_{\overrightarrow{t}_{\left|\overrightarrow{\mathcal{T}}\right|}}\right). Here, p(𝒙𝒮;𝜽)p\left(\bm{x}_{\mathcal{S}};\bm{\theta}\right) is the marginal PDF of 𝑿𝒮\bm{X}_{\mathcal{S}}, and p(𝒙𝒯|𝒙𝒯;𝜽)p\left(\bm{x}_{\overleftarrow{\mathcal{T}}}|\bm{x}_{\overrightarrow{\mathcal{T}}};\bm{\theta}\right) is the conditional PDF of 𝑿𝒯\bm{X}_{\overleftarrow{\mathcal{T}}} conditioned on 𝑿𝒯=𝒙𝒯\bm{X}_{\overrightarrow{\mathcal{T}}}=\bm{x}_{\overrightarrow{\mathcal{T}}}.

2.1 Sample splitting and sample swapping

Let 𝐗n=(𝑿i)i=1n\mathbf{X}_{n}=\left(\bm{X}_{i}\right)_{i=1}^{n} be a sequence of nn IID random variables with the same DGP as 𝑿\bm{X}, and split 𝐗n\mathbf{X}_{n} into two subsamples 𝐗n1=(𝑿i1)i=1n1\mathbf{X}_{n}^{1}=\left(\bm{X}_{i}^{1}\right)_{i=1}^{n_{1}} and 𝐗n2=(𝑿i2)i=1n2\mathbf{X}_{n}^{2}=\left(\bm{X}_{i}^{2}\right)_{i=1}^{n_{2}} of sizes n1n_{1} and n2n_{2}, respectively, where n=n1+n2n=n_{1}+n_{2}. We assume that 𝑿\bm{X} has a DGP that is characterized by the PDF p(𝒙;𝜽0)p\left(\bm{x};\bm{\theta}_{0}\right), for some 𝜽0Θ\bm{\theta}_{0}\in\Theta, and we let Pr𝜽0\Pr_{\bm{\theta}_{0}} be its corresponding probability measure. Let 𝜽^n1\hat{\bm{\theta}}_{n}^{1} and 𝜽^n2\hat{\bm{\theta}}_{n}^{2} be a pair of generic estimators of 𝜽0\bm{\theta}_{0}, using only 𝐗n1\mathbf{X}_{n}^{1} or 𝐗n2\mathbf{X}_{n}^{2}, respectively.

For k{1,2}k\in\left\{1,2\right\}, we let

L𝝈,𝝉(𝜽;𝐗nk)=i=1nkp𝝈,𝝉(𝑿ik)L_{\bm{\sigma},\bm{\tau}}\left(\bm{\theta};\mathbf{X}_{n}^{k}\right)=\prod_{i=1}^{n_{k}}p_{\bm{\sigma},\bm{\tau}}\left(\bm{X}_{i}^{k}\right)

be the CL function of 𝐗nk\mathbf{X}_{n}^{k}, as a function of 𝜽\bm{\theta}. We write the split sample CL ratio statistics (spCLRSs) and the swapped sample CL ratio statistic (swCLRS) as

U𝝈,𝝉k(𝜽;𝐗n)=L𝝈,𝝉(𝜽^3k;𝐗nk)/L𝝈,𝝉(𝜽;𝐗nk),U_{\bm{\sigma},\bm{\tau}}^{k}\left(\bm{\theta};\mathbf{X}_{n}\right)=L_{\bm{\sigma},\bm{\tau}}\left(\hat{\bm{\theta}}^{3-k};\mathbf{X}_{n}^{k}\right)/L_{\bm{\sigma},\bm{\tau}}\left(\bm{\theta};\mathbf{X}_{n}^{k}\right)\text{,}

for each k{1,2}k\in\left\{1,2\right\}, and

U¯𝝈,𝝉(𝜽;𝐗n)={U𝝈,𝝉1(𝜽;𝐗n)+U𝝈,𝝉2(𝜽;𝐗n)}/2,\bar{U}_{\bm{\sigma},\bm{\tau}}\left(\bm{\theta};\mathbf{X}_{n}\right)=\left\{U_{\bm{\sigma},\bm{\tau}}^{1}\left(\bm{\theta};\mathbf{X}_{n}\right)+U_{\bm{\sigma},\bm{\tau}}^{2}\left(\bm{\theta};\mathbf{X}_{n}\right)\right\}/2\text{,}

respectively.

For α(0,1)\alpha\in\left(0,1\right), let

𝒞α(𝐗n)={𝜽Θ:U𝝈,𝝉1(𝜽;𝐗n)1/α} and 𝒞¯α(𝐗n)={𝜽Θ:U¯𝝈,𝝉(𝜽;𝐗n)1/α}\mathcal{C}^{\alpha}\left(\mathbf{X}_{n}\right)=\left\{\bm{\theta}\in\Theta:U_{\bm{\sigma},\bm{\tau}}^{1}\left(\bm{\theta};\mathbf{X}_{n}\right)\leq 1/\alpha\right\}\text{ and }\bar{\mathcal{C}}^{\alpha}\left(\mathbf{X}_{n}\right)=\left\{\bm{\theta}\in\Theta:\bar{U}_{\bm{\sigma},\bm{\tau}}\left(\bm{\theta};\mathbf{X}_{n}\right)\leq 1/\alpha\right\}

be confidence sets based on the spCLRS and the swCLRS, respectively. We have the following result regarding the validity of 𝒞α(𝐗n)\mathcal{C}^{\alpha}\left(\mathbf{X}_{n}\right) and 𝒞¯α(𝐗n)\bar{\mathcal{C}}^{\alpha}\left(\mathbf{X}_{n}\right) (all theoretical results in this work are proved in Nguyen, 2020).

Proposition 1.

The set estimators 𝒞α(𝐗n)\mathcal{C}^{\alpha}\left(\mathbf{X}_{n}\right) and 𝒞¯α(𝐗n)\bar{\mathcal{C}}^{\alpha}\left(\mathbf{X}_{n}\right) are finite sample-valid 100(1α)%100\left(1-\alpha\right)\% confidence sets for 𝛉0\bm{\theta}_{0} in the sense that

Pr𝜽0(𝜽0𝒞α(𝐗n))1α, and Pr𝜽0(𝜽0𝒞¯α(𝐗n))1α,\mathrm{Pr}_{\bm{\theta}_{0}}\left(\bm{\theta}_{0}\in\mathcal{C}^{\alpha}\left(\mathbf{X}_{n}\right)\right)\geq 1-\alpha\text{, and }\mathrm{Pr}_{\bm{\theta}_{0}}\left(\bm{\theta}_{0}\in\mathcal{\bar{C}}^{\alpha}\left(\mathbf{X}_{n}\right)\right)\geq 1-\alpha\text{,}

for any nn\in\mathbb{N}.

We now consider the testing of null and alternative hypotheses

H0:𝜽Θ0 and H1:𝜽Θ1,\text{H}_{0}:\bm{\theta}\in\Theta_{0}\text{ and }\text{H}_{1}:\bm{\theta}\in\Theta_{1}\text{,}

where Θ0,Θ1Θ\Theta_{0},\Theta_{1}\subseteq\Theta. Let

𝕄(𝐗nk)={𝜽Θ0:L𝝈,𝝉(𝜽;𝐗nk)=maxϑΘ0L𝝈,𝝉(ϑ;𝐗nk)}\mathbb{M}\left(\mathbf{X}_{n}^{k}\right)=\left\{\bm{\theta}\in\Theta_{0}:L_{\bm{\sigma},\bm{\tau}}\left(\bm{\theta};\mathbf{X}_{n}^{k}\right)=\max_{\bm{\vartheta}\in\Theta_{0}}L_{\bm{\sigma},\bm{\tau}}\left(\bm{\vartheta};\mathbf{X}_{n}^{k}\right)\right\}

be the set of maximizers of the CL function L𝝈,𝝉(𝜽;𝐗nk)L_{\bm{\sigma},\bm{\tau}}\left(\bm{\theta};\mathbf{X}_{n}^{k}\right), for each k{1,2}k\in\left\{1,2\right\}, and write 𝜽~nk𝕄(𝐗nk).\tilde{\bm{\theta}}_{n}^{k}\in\mathbb{M}\left(\mathbf{X}_{n}^{k}\right). We then write the sample splitting and sample swapping test statistics as

V𝝈,𝝉k(𝐗n)=U𝝈,𝝉k(𝜽~nk), and V¯𝝈,𝝉(𝐗n)={U𝝈,𝝉1(𝜽~n1)+U𝝈,𝝉2(𝜽~n2)}/2,V_{\bm{\sigma},\bm{\tau}}^{k}\left(\mathbf{X}_{n}\right)=U_{\bm{\sigma},\bm{\tau}}^{k}\left(\tilde{\bm{\theta}}_{n}^{k}\right)\text{, and }\bar{V}_{\bm{\sigma},\bm{\tau}}\left(\mathbf{X}_{n}\right)=\left\{U_{\bm{\sigma},\bm{\tau}}^{1}\left(\tilde{\bm{\theta}}_{n}^{1}\right)+U_{\bm{\sigma},\bm{\tau}}^{2}\left(\tilde{\bm{\theta}}_{n}^{2}\right)\right\}/2\text{,}

respectively. Further, define the split sample CL ratio test (spCLRT) and the swapped sample CL ratio test (swCLRT) by the rejection rules: reject H0\text{H}_{0} if V𝝈,𝝉1(𝐗n)1/αV_{\bm{\sigma},\bm{\tau}}^{1}\left(\mathbf{X}_{n}\right)\geq 1/\alpha or if V¯𝝈,𝝉(𝐗n)1/α\bar{V}_{\bm{\sigma},\bm{\tau}}\left(\mathbf{X}_{n}\right)\geq 1/\alpha, respectively. We have the following result regarding the finite sample-validity of the tests.

Proposition 2.

The spCLRT and swCLRT control the Type I error for all α(0,1)\alpha\in\left(0,1\right) and nn\in\mathbb{N} in the sense that

sup𝜽0Θ0Pr𝜽0(V𝝈,𝝉1(𝐗n)>1/α)α, and sup𝜽0Θ0Pr𝜽0(V¯𝝈,𝝉(𝐗n)>1/α)α.\sup_{\bm{\theta}_{0}\in\Theta_{0}}\mathrm{Pr}_{\bm{\theta}_{0}}\left(V_{\bm{\sigma},\bm{\tau}}^{1}\left(\mathbf{X}_{n}\right)>1/\alpha\right)\leq\alpha\text{, and }\sup_{\bm{\theta}_{0}\in\Theta_{0}}\mathrm{Pr}_{\bm{\theta}_{0}}\left(\bar{V}_{\bm{\sigma},\bm{\tau}}\left(\mathbf{X}_{n}\right)>1/\alpha\right)\leq\alpha\text{.}

3 Simulation study

All numerical computation were conducted in the 𝖱\mathsf{R} programming environment (R Core Team, 2020). The code for the analyses are made available at hiendn/CompositeLikelihoodISI.

3.1 Bivariate distribution with exponential conditional distributions

We first consider a simulation study regarding data generated from the bivariate exponential distribution of Arnold et al. (1999, Sec. 4.4). Here the random variable 𝑿=(X1,X2)\bm{X}^{\top}=\left(X_{1},X_{2}\right) has joint PDF

p(𝒙;θ)=κ(θ)exp{x1x2θx1x2},p\left(\bm{x};\theta\right)=\kappa\left(\theta\right)\exp\left\{-x_{1}-x_{2}-\theta x_{1}x_{2}\right\}\text{,}

where θ0\theta\geq 0 is the parameter of interest, and κ(θ)=θexp{1/θ}/1/θw1exp(w)dw\kappa\left(\theta\right)=\theta\exp\left\{-1/\theta\right\}/\int_{1/\theta}^{\infty}w^{-1}\exp\left(-w\right)\text{d}w is an intractable normalization constant. However, the conditional PDFs of Xk|X3k=x3kX_{k}|X_{3-k}=x_{3-k}, for k{1,2}k\in\left\{1,2\right\}, can be specified by

p(xk|x3k;θ)=fExp(xk;1+θx3k),p\left(x_{k}|x_{3-k};\theta\right)=f_{\text{Exp}}\left(x_{k};1+\theta x_{3-k}\right)\text{,}

where fExp(x;λ)=λexp(λx)f_{\text{Exp}}\left(x;\lambda\right)=\lambda\exp\left(-\lambda x\right) is the PDF of the exponential distribution with rate λ>0\lambda>0. Thus, we can conduct inference regarding this DGP by considering ICLs of the form

p𝝈,𝝉(𝒙;θ)=[p(x1|x2;θ)]1/2[p(x2|x1;θ)]1/2,p_{\bm{\sigma},\bm{\tau}}\left(\bm{x};\theta\right)=\left[p\left(x_{1}|x_{2};\theta\right)\right]^{1/2}\left[p\left(x_{2}|x_{1};\theta\right)\right]^{1/2}\text{,}

where 𝝈=𝟎\bm{\sigma}=\mathbf{0} and 𝝉=(1/2)𝟏\bm{\tau}=\left(1/2\right)\mathbf{1}.

For data 𝐗n\mathbf{X}_{n} with identical DGP to 𝑿\bm{X}, characterized by θ0{1,5,10}\theta_{0}\in\left\{1,5,10\right\}, where n1=n2{100,1000,10000}n_{1}=n_{2}\in\left\{100,1000,10000\right\}, we consider the use of the spCLRS and swCLRS confidence sets at the α=0.05\alpha=0.05 level. Here, each confidence set is constructed using the maximum composite likelihood estimator (MCLE).

For each pair (n1,θ)\left(n_{1},\theta\right), we replicate the simulation r=100r=100 times and compute the coverage proportion (CP) and average size (AS) of the confidence intervals for the two set constructions. Here, CP and AS are computed as r1j=1rθ0𝒞jr^{-1}\sum_{j=1}^{r}\left\llbracket\theta_{0}\in\mathcal{C}_{j}\right\rrbracket and r1j=1rdiam(𝒞j)r^{-1}\sum_{j=1}^{r}\text{diam}\left(\mathcal{C}_{j}\right), where 𝒞j\mathcal{C}_{j} is a stand-in for a confidence set constructed from the rthr\text{th} replicate, \left\llbracket\cdot\right\rrbracket are Iverson brackets, and diam()\text{diam}\left(\cdot\right) is the metric set diameter operator.

The results are presented in Table 1(a). We observe that CP was near perfect, with only one scenario yielding a confidence set that did not contain θ0\theta_{0}. This supports Proposition 1, although it indicates that the confidence sets are fairly conservative. We observe that AS is decreasing in n1n_{1}, as expected, and increasing in θ0\theta_{0}. We also find that the swCLRS sets are smaller than the spCLRS sets, which suggests a more efficient use of the data.

Table 1: Simulation results.
CP n1n_{1} AS n1n_{1}
θ0\theta_{0} 100 1000 10000 100 1000 10000
spCLRS 1 1 1 1 1.43 0.46 0.14
5 1 1 1 4.60 1.49 0.47
10 1 1 1 8.32 2.57 0.82
swCLRS 1 1 1 1 1.28 0.40 0.12
5 1 0.99 1 4.13 1.29 0.40
10 1 1 1 7.40 2.31 0.73
(a) CP and AS results for the spCLRS and swCLRS 95%95\% confidence sets.
Rej. n1n_{1}
c0c_{0} 100 1000 10000
spCLRT 0 0 0 0
1 0.26 1 1
5 0.98 1 1
swCLRT 0 0 0 0
1 0.32 1 1
5 1 1 1
(b) Proportion of rejections by the spCLRT and swCLRT.

3.2 Bivariate distribution with log-normal conditional distributions

We now consider the bivariate distribution of Sarabia et al. (2007), which is specified by the PDF

p(𝒙;𝜽)=κ(c)2πσ1σ2x1x2exp{12[(logx1μ1σ1)2+(logx2μ2σ2)2+c(logx1μ1σ1)2(logx2μ2σ2)2]},p\left(\bm{x};\bm{\theta}\right)=\frac{\kappa\left(c\right)}{2\pi\sigma_{1}\sigma_{2}x_{1}x_{2}}\exp\left\{-\frac{1}{2}\left[\left(\frac{\log x_{1}-\mu_{1}}{\sigma_{1}}\right)^{2}+\left(\frac{\log x_{2}-\mu_{2}}{\sigma_{2}}\right)^{2}+c\left(\frac{\log x_{1}-\mu_{1}}{\sigma_{1}}\right)^{2}\left(\frac{\log x_{2}-\mu_{2}}{\sigma_{2}}\right)^{2}\right]\right\}\text{,} (1)

where 𝜽=(μ1,σ12,μ2,σ22,c)\bm{\theta}^{\top}=\left(\mu_{1},\sigma_{1}^{2},\mu_{2},\sigma_{2}^{2},c\right), with μ1,μ2\mu_{1},\mu_{2}\in\mathbb{R}, σ12,σ22>0\sigma_{1}^{2},\sigma_{2}^{2}>0, and c0c\geq 0. Here, κ(c)=2c/U(1/2,1,(2c)1)\kappa\left(c\right)=\sqrt{2c}/U\left(1/2,1,\left(2c\right)^{-1}\right), where U(a,b,z)U\left(a,b,z\right) is the confluence hypergeometric function, defined as per Abramowitz and Stegun (1972, Eqn. 13.2.5). Like in the previous example, the normalizing constant of the joint PDF makes it intractable. However, we may again specify the conditional PDFs of Xk|X3k=x3kX_{k}|X_{3-k}=x_{3-k}, for k{1,2}k\in\left\{1,2\right\}, by

p(xk|x3k;𝜽)=fLN(xk;μk,σk2/{1+c(logx3kμ3kσ3k)2}),p\left(x_{k}|x_{3-k};\bm{\theta}\right)=f_{\text{LN}}\left(x_{k};\mu_{k},\sigma_{k}^{2}/\left\{1+c\left(\frac{\log x_{3-k}-\mu_{3-k}}{\sigma_{3-k}}\right)^{2}\right\}\right)\text{,}

where

fLN(x;μ,σ2)=1x2πσ2exp{12(logxμσ)2}f_{\text{LN}}\left(x;\mu,\sigma^{2}\right)=\frac{1}{x\sqrt{2\pi\sigma^{2}}}\exp\left\{-\frac{1}{2}\left(\frac{\log x-\mu}{\sigma}\right)^{2}\right\}

is the PDF of a log-normal distribution with location and scale parameters μ\mu\in\mathbb{R} and σ2>0\sigma^{2}>0, respectively. We can use the conditional PDFs to conduct CL inference via the ICLs of the form

p𝝈,𝝉(𝒙;𝜽)=[p(x1|x2;𝜽)]1/2[p(x2|x1;𝜽)]1/2,p_{\bm{\sigma},\bm{\tau}}\left(\bm{x};\bm{\theta}\right)=\left[p\left(x_{1}|x_{2};\bm{\theta}\right)\right]^{1/2}\left[p\left(x_{2}|x_{1};\bm{\theta}\right)\right]^{1/2}\text{,}

where 𝝈=𝟎\bm{\sigma}=\mathbf{0} and 𝝉=(1/2)𝟏\bm{\tau}=\left(1/2\right)\mathbf{1}.

We simulate data 𝐗n\mathbf{X}_{n}, n1=n2{100,1000,10000}n_{1}=n_{2}\in\left\{100,1000,10000\right\} from DGPs that are characterized by the PDF (1), with parameter vector 𝜽0=(2,1,2,1,c0)\bm{\theta}_{0}=\left(2,1,2,1,c_{0}\right), where c0{0,1,5}c_{0}\in\left\{0,1,5\right\}. For each pair (n1,c0)\left(n_{1},c_{0}\right), we use the spCLRT and swCLRT to test the hypotheses H0:c0=0\text{H}_{0}:c_{0}=0 versus H1:c0>0\text{H}_{1}:c_{0}>0, at the α=0.05\alpha=0.05 level. We repeat each simulation pair r=100r=100 times and compute the proportion of times the null hypothesis was rejected. Here, we again make use of the MCLE.

The results are reported in Table 1(b). Notice that no false rejections were made when c0=0c_{0}=0, thus the size of the test is conservatively controlled, as predicted by Proposition 2. We also see that the tests become increasingly powerful as c0c_{0} increases and as n1n_{1} increases, as would be expected. There is some evidence that the swCLRT is more powerful than the spCLRT, conforming to observations from the previous study.

References

  • Abramowitz and Stegun [1972] M Abramowitz and I A Stegun, editors. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards, Washington, 1972.
  • Arnold and Strauss [1991] B C Arnold and D Strauss. Pseudolikelihood estimation: some examples. Sankhya B, 53:233–243, 1991.
  • Arnold et al. [1999] B C Arnold, E Castillo, and J M Sarabia. Conditional Specification of Statistical Models. Springer, New York, 1999.
  • Besag [1975] J Besag. Statistical analysis of non-lattice data. Journal of the Royal Statistical Society D, 24:179–195, 1975.
  • Grunwald et al. [2020] P Grunwald, R de Heide, and W M Koolen. Safe testing. In IEEE Information Theory and Applications Workshop (ITA), 2020.
  • Lindsay [1988] B Lindsay. Composite likelihood methods. Contemporary Mathematics, 8:221–239, 1988.
  • Molenberghs and Verbeke [2005] G Molenberghs and G Verbeke. Models For Discrete Longitudinal Data. Springer, New York, 2005.
  • Nguyen [2018] H D Nguyen. Nearly universal consistency of maximum likelihood in discrete models. Journal of the Korean Statistical Society, 47:90–98, 2018.
  • Nguyen [2020] H D Nguyen. Universal inference with composite likelihoods. ArXiv:2009.00848, 2020.
  • R Core Team [2020] R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, 2020.
  • Sarabia et al. [2007] J M Sarabia, E Castillo, M Pascual, and M Sarabia. Bivariate income distributions with lognormal conditionals. Journal of Economic Inequality, 5:371–383, 2007.
  • Shafer [2021] G Shafer. Testing by betting: a strategy for statistical and scientific communication. Journal of the Royal Statistical Society B, To appear, 2021.
  • Varin et al. [2011] C Varin, N Reid, and D Firth. An overview of composite likelihood methods. Statistica Sinica, 21:5–42, 2011.
  • Vovk and Wang [2021] V Vovk and R Wang. E-values: calibration, combination, and application. Annals of Statistics, To appear, 2021.
  • Wasserman et al. [2020] L Wasserman, A Ramdas, and S Balakrishnan. Universal inference. Proceedings of the National Academy of Sciences, 117:16880–16890, 2020.
  • Yi [2014] G Yi. Composite likelihood/pseudolikelihood. In Wiley StatsRef: Statistics Reference Online, pages 1–14. Wiley, 2014.