Universal inference with composite likelihoods
Abstract
Maximum composite likelihood estimation is a useful alternative to maximum likelihood estimation when data arise from data generating processes (DGPs) that do not admit tractable joint specification. We demonstrate that generic composite likelihoods consisting of marginal and conditional specifications permit the simple construction of composite likelihood ratio-like statistics from which finite-sample valid confidence sets and hypothesis tests can be constructed. These statistics are universal in the sense that they can be constructed from any estimator for the parameter of the underlying DGP. We demonstrate our methodology via a simulation study using a pair of conditionally specified bivariate models.
Key words: Composite likelihoods; Pseudolikelihoods, Confidence sets; Hypothesis tests; Conditional models
1 Introduction
Likelihood-based methods are among the most important tools for conducting statistical inference. However, data generating processes (DGPs) of complex models often do not admit tractable likelihood functions. In such cases, a potential remedy is to specify the model based on more amenable marginal and conditional probability density/mass functions (PDFs/PMFs) of the DGP, instead. This joint specification is often referred to as the composite likelihood (CL) or pseudolikelihood.
The literature regarding CL-based inference has its roots in the works of Besag (1975) and Lindsay (1988). Further developments regarding the theory and application of CL methods can be found in Arnold and Strauss (1991), Molenberghs and Verbeke (2005), Varin et al. (2011), Yi (2014), and Nguyen (2018), among other works.
We build upon the recent work of Wasserman et al. (2020) who demonstrated the construction of sample splitting and sample swapping likelihood ratio statistics that yield finite-sample valid confidence sets and hypothesis tests, and are universal in the sense that they are agnostic to parameter estimators. The inferential constructions are similar to the recently popularized -values of Vovk and Wang (2021), as well as the -values of Grunwald et al. (2020) and the betting scores of Shafer (2021). We demonstrate how our CL-based methods can be used via applications to constructing confidence sets and tests for a pair of conditionally specified bivariate models. Here, we consider a simulation study regarding the exponential conditional model of Arnold et al. (1999) and the log-normal conditional model of Sarabia et al. (2007).
The paper proceeds as follows. In Section 2, we present the CL framework and the universal confidence set and hypothesis test constructions. A simulation study of our methodology is presented in Section 3.
2 Universal inference via composite likelihoods
Let be a random variable arising from a parametric distribution characterized by the PDF/PMF (generically, PDF) , where is a parameter vector (). We shall write to indicate a random variable and to indicate its realization.
Let be the power set of , and let . For each , let , where is the cardinality of . Further, let be the set of all divisions of into two nonempty subsets. We represent each element of as a pair , where and are the ’left-hand’ and ’right-hand’ subsets of the division , respectively. We note that and .
For each and , we assign a non-negative coefficient and , respectively. We call these coefficients the weights, and we put these weights into the vectors and , respectively. We assume that .
Given weights and , we define the individual CL (ICL) function for as
where , , and . Here, is the marginal PDF of , and is the conditional PDF of conditioned on .
2.1 Sample splitting and sample swapping
Let be a sequence of IID random variables with the same DGP as , and split into two subsamples and of sizes and , respectively, where . We assume that has a DGP that is characterized by the PDF , for some , and we let be its corresponding probability measure. Let and be a pair of generic estimators of , using only or , respectively.
For , we let
be the CL function of , as a function of . We write the split sample CL ratio statistics (spCLRSs) and the swapped sample CL ratio statistic (swCLRS) as
for each , and
respectively.
For , let
be confidence sets based on the spCLRS and the swCLRS, respectively. We have the following result regarding the validity of and (all theoretical results in this work are proved in Nguyen, 2020).
Proposition 1.
The set estimators and are finite sample-valid confidence sets for in the sense that
for any .
We now consider the testing of null and alternative hypotheses
where . Let
be the set of maximizers of the CL function , for each , and write We then write the sample splitting and sample swapping test statistics as
respectively. Further, define the split sample CL ratio test (spCLRT) and the swapped sample CL ratio test (swCLRT) by the rejection rules: reject if or if , respectively. We have the following result regarding the finite sample-validity of the tests.
Proposition 2.
The spCLRT and swCLRT control the Type I error for all and in the sense that
3 Simulation study
All numerical computation were conducted in the programming environment (R Core Team, 2020). The code for the analyses are made available at hiendn/CompositeLikelihoodISI.
3.1 Bivariate distribution with exponential conditional distributions
We first consider a simulation study regarding data generated from the bivariate exponential distribution of Arnold et al. (1999, Sec. 4.4). Here the random variable has joint PDF
where is the parameter of interest, and is an intractable normalization constant. However, the conditional PDFs of , for , can be specified by
where is the PDF of the exponential distribution with rate . Thus, we can conduct inference regarding this DGP by considering ICLs of the form
where and .
For data with identical DGP to , characterized by , where , we consider the use of the spCLRS and swCLRS confidence sets at the level. Here, each confidence set is constructed using the maximum composite likelihood estimator (MCLE).
For each pair , we replicate the simulation times and compute the coverage proportion (CP) and average size (AS) of the confidence intervals for the two set constructions. Here, CP and AS are computed as and , where is a stand-in for a confidence set constructed from the replicate, are Iverson brackets, and is the metric set diameter operator.
The results are presented in Table 1(a). We observe that CP was near perfect, with only one scenario yielding a confidence set that did not contain . This supports Proposition 1, although it indicates that the confidence sets are fairly conservative. We observe that AS is decreasing in , as expected, and increasing in . We also find that the swCLRS sets are smaller than the spCLRS sets, which suggests a more efficient use of the data.
CP | AS | ||||||
---|---|---|---|---|---|---|---|
100 | 1000 | 10000 | 100 | 1000 | 10000 | ||
spCLRS | 1 | 1 | 1 | 1 | 1.43 | 0.46 | 0.14 |
5 | 1 | 1 | 1 | 4.60 | 1.49 | 0.47 | |
10 | 1 | 1 | 1 | 8.32 | 2.57 | 0.82 | |
swCLRS | 1 | 1 | 1 | 1 | 1.28 | 0.40 | 0.12 |
5 | 1 | 0.99 | 1 | 4.13 | 1.29 | 0.40 | |
10 | 1 | 1 | 1 | 7.40 | 2.31 | 0.73 |
Rej. | ||||
---|---|---|---|---|
100 | 1000 | 10000 | ||
spCLRT | 0 | 0 | 0 | 0 |
1 | 0.26 | 1 | 1 | |
5 | 0.98 | 1 | 1 | |
swCLRT | 0 | 0 | 0 | 0 |
1 | 0.32 | 1 | 1 | |
5 | 1 | 1 | 1 |
3.2 Bivariate distribution with log-normal conditional distributions
We now consider the bivariate distribution of Sarabia et al. (2007), which is specified by the PDF
(1) |
where , with , , and . Here, , where is the confluence hypergeometric function, defined as per Abramowitz and Stegun (1972, Eqn. 13.2.5). Like in the previous example, the normalizing constant of the joint PDF makes it intractable. However, we may again specify the conditional PDFs of , for , by
where
is the PDF of a log-normal distribution with location and scale parameters and , respectively. We can use the conditional PDFs to conduct CL inference via the ICLs of the form
where and .
We simulate data , from DGPs that are characterized by the PDF (1), with parameter vector , where . For each pair , we use the spCLRT and swCLRT to test the hypotheses versus , at the level. We repeat each simulation pair times and compute the proportion of times the null hypothesis was rejected. Here, we again make use of the MCLE.
The results are reported in Table 1(b). Notice that no false rejections were made when , thus the size of the test is conservatively controlled, as predicted by Proposition 2. We also see that the tests become increasingly powerful as increases and as increases, as would be expected. There is some evidence that the swCLRT is more powerful than the spCLRT, conforming to observations from the previous study.
References
- Abramowitz and Stegun [1972] M Abramowitz and I A Stegun, editors. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. National Bureau of Standards, Washington, 1972.
- Arnold and Strauss [1991] B C Arnold and D Strauss. Pseudolikelihood estimation: some examples. Sankhya B, 53:233–243, 1991.
- Arnold et al. [1999] B C Arnold, E Castillo, and J M Sarabia. Conditional Specification of Statistical Models. Springer, New York, 1999.
- Besag [1975] J Besag. Statistical analysis of non-lattice data. Journal of the Royal Statistical Society D, 24:179–195, 1975.
- Grunwald et al. [2020] P Grunwald, R de Heide, and W M Koolen. Safe testing. In IEEE Information Theory and Applications Workshop (ITA), 2020.
- Lindsay [1988] B Lindsay. Composite likelihood methods. Contemporary Mathematics, 8:221–239, 1988.
- Molenberghs and Verbeke [2005] G Molenberghs and G Verbeke. Models For Discrete Longitudinal Data. Springer, New York, 2005.
- Nguyen [2018] H D Nguyen. Nearly universal consistency of maximum likelihood in discrete models. Journal of the Korean Statistical Society, 47:90–98, 2018.
- Nguyen [2020] H D Nguyen. Universal inference with composite likelihoods. ArXiv:2009.00848, 2020.
- R Core Team [2020] R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical Computing, 2020.
- Sarabia et al. [2007] J M Sarabia, E Castillo, M Pascual, and M Sarabia. Bivariate income distributions with lognormal conditionals. Journal of Economic Inequality, 5:371–383, 2007.
- Shafer [2021] G Shafer. Testing by betting: a strategy for statistical and scientific communication. Journal of the Royal Statistical Society B, To appear, 2021.
- Varin et al. [2011] C Varin, N Reid, and D Firth. An overview of composite likelihood methods. Statistica Sinica, 21:5–42, 2011.
- Vovk and Wang [2021] V Vovk and R Wang. E-values: calibration, combination, and application. Annals of Statistics, To appear, 2021.
- Wasserman et al. [2020] L Wasserman, A Ramdas, and S Balakrishnan. Universal inference. Proceedings of the National Academy of Sciences, 117:16880–16890, 2020.
- Yi [2014] G Yi. Composite likelihood/pseudolikelihood. In Wiley StatsRef: Statistics Reference Online, pages 1–14. Wiley, 2014.