Asymptotics of numerical integration for two-level mixed models

Blair Bilodeaulabel=e1][email protected] [ Alex Stringerlabel=e2][email protected] [ Yanbo Tang*label=e3][email protected] [ University of Toronto presep=, ]e1 University of Waterloo presep=, ]e2 Imperial College London presep=, ]e3

Abstract

We study mixed models with a single grouping factor, where inference about unknown parameters requires optimizing a marginal likelihood defined by an intractable integral. Low-dimensional numerical integration techniques are regularly used to approximate these integrals, with inferences about parameters based on the resulting approximate marginal likelihood. For a generic class of mixed models that satisfy explicit regularity conditions, we derive the stochastic relative error rate incurred for both the likelihood and maximum likelihood estimator when adaptive numerical integration is used to approximate the marginal likelihood. We then specialize the analysis to well-specified generalized linear mixed models having exponential family response and multivariate Gaussian random effects, verifying that the regularity conditions hold, and hence that the convergence rates apply. We also prove that for models with likelihoods satisfying very weak concentration conditions that the maximum likelihood estimators from non-adaptive numerical integration approximations of the marginal likelihood are not consistent, further motivating adaptive numerical integration as the preferred tool for inference in mixed models. Code to reproduce the simulations in this paper is provided at https://github.com/awstringer1/aq-theory-paper-code.

Adaptive quadrature,

approximate inference,

generalized linear models,

keywords:

\IfEq-1-1|~π(y)π(y)-1|\IfEq-10|~π(y)π(y)-1|\IfEq-11|~π(y)π(y)-1|\IfEq-12|~π(y)π(y)-1|\IfEq-13|~π(y)π(y)-1|\IfEq-14|~π(y)π(y)-1|>κ}>γ.

Proof.

Observethat~π(y)π(y)=∑z∈Qω(z)π(z|y),whereπ(z|y)istheposteriordensityofuevaluatedatz.Wecanthereforewrite

\underline{\omega}\times\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}_{\bm{z}\in\mathcal{Q}}\pi(\bm{z}|\bm{y})\leq\frac{\widetilde{\pi}(\bm{y})}{\pi(\bm{y})}\leq\IfEq{-1}{-1}{\left|{{{{\mathcal{Q}}}}}\right|}{\IfEq{-1}{0}{|{{{{\mathcal{Q}}}}}|}{\IfEq{-1}{1}{\bigl{|}{{{{\mathcal{Q}}}}}\bigr{|}}{\IfEq{-1}{2}{\Bigl{|}{{{{\mathcal{Q}}}}}\Bigr{|}}{\IfEq{-1}{3}{\biggl{|}{{{{\mathcal{Q}}}}}\biggr{|}}{\IfEq{-1}{4}{\Biggl{|}{{{{\mathcal{Q}}}}}\Biggr{|}}{}}}}}\overline{\omega}\times\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}_{\bm{z}\in\mathcal{Q}}\pi(\bm{z}|\bm{y}),\end{equation}where\underline{\omega}=\operatorname*{\mathrm{min}\vphantom{\mathrm{infsup}}}_{\bm{z}\in\mathcal{Q}}\omega(\bm{z})and\overline{\omega}=\operatorname*{\mathrm{max}\vphantom{\mathrm{infsup}}}_{\bm{z}\in\mathcal{Q}}\omega(\bm{z}).\par Therearetwocasestoconsider.Supposefirstthat\bm{u}_{*}\in\mathcal{Q}.Thenby\lx@cref{creftype~refnum}{eqn:fracbound}andtheassumptionofthetheorem,\widetilde{\pi}(\bm{y})/\pi(\bm{y})\overset{p}{\longrightarrow}\infty.Wethereforemaychoose\epsilon>0,\gamma\in(0,1)suchthattheremustexistn\in\mathbb{N}suchthatforeveryN>n,}

PN∗{~π(y)π(y)>ε+1}>γ.Setκ=ϵ>0andnotethat~π(y)>π(y)eventuallytoyieldtheresult.Supposenextthatu∗∉Q.Thenby

Proof.,~π(y)/π(y)p⟶0.Chooseϵ,γ∈(0,1)suchthattheremustexistn∈NsuchthatforeveryN>n,

\begin{aligned} \gamma<P_{N}^{*}\left\{\frac{\widetilde{\pi}(\bm{y})}{\pi(\bm{y})}<\varepsilon\right\}=P_{N}^{*}\left\{1-\frac{\widetilde{\pi}(\bm{y})}{\pi(\bm{y})}>1-\varepsilon\right\}=P_{N}^{*}\left\{\IfEq{-1}{-1}{\left|{{{{\frac{\widetilde{\pi}(\bm{y})}{\pi(\bm{y})}-1}}}}\right|}{\IfEq{-1}{0}{|{{{{\frac{\widetilde{\pi}(\bm{y})}{\pi(\bm{y})}-1}}}}|}{\IfEq{-1}{1}{\bigl{|}{{{{\frac{\widetilde{\pi}(\bm{y})}{\pi(\bm{y})}-1}}}}\bigr{|}}{\IfEq{-1}{2}{\Bigl{|}{{{{\frac{\widetilde{\pi}(\bm{y})}{\pi(\bm{y})}-1}}}}\Bigr{|}}{\IfEq{-1}{3}{\biggl{|}{{{{\frac{\widetilde{\pi}(\bm{y})}{\pi(\bm{y})}-1}}}}\biggr{|}}{\IfEq{-1}{4}{\Biggl{|}{{{{\frac{\widetilde{\pi}(\bm{y})}{\pi(\bm{y})}-1}}}}\Biggr{|}}{}}}}}>1-\varepsilon\right\}\end{aligned}wherethelaststepusesthat\widetilde{\pi}(\bm{y})<\pi(\bm{y})eventually.Set\kappa=1-\epsilon>0toyieldtheresult.\qed\end@proof\par Themostobviouswaytoguaranteetheconditionsof\lx@cref{creftype~refnum}{fact:nonconvergence}isforthemodeltosatisfyaBernstein-vonMisestheorem\cite[citep]{(\@@bibref{AuthorsPhrase1Year}{vandervaart}{\@@citephrase{, }}{}, Section 10.2)}.Foraverybroadclassofmisspecifiedmodels,\cite[citet]{\@@bibref{Authors Phrase1YearPhrase2}{misspec}{\@@citephrase{(}}{\@@citephrase{)}}}showthataBernstein-vonMises-typeresultholds,suggestingthat\lx@cref{creftype~refnum}{fact:nonconvergence}iswidelyapplicableandthatitsconclusionsapplytomanymodelsusedinpractice.\par Returningfocustothemixedmodelswhicharethesubjectofthepresentpaper,\lx@cref{creftype~refnum}{cor:mixedmodelsnonconverge}specializes\lx@cref{creftype~refnum}{fact:nonconvergence}tomixedmodelsoftheformgivenin\lx@cref{creftype~refnum}{eqn:glmmdefinition},with``true^{\prime\prime}parametervalue\bm{\theta}_{*}(see\lx@cref{creftype~refnum}{sec:uniform-assumptions}fortheprecisedefinition).\par\begin{corollary}Considerthemodelgivenby\lx@cref{creftype~refnum}{eqn:glmmdefinition}.Under\lx@cref{creftypeplural~refnum}{assn:kderiv},\lx@cref{refnum}{assn:hessian},\lx@cref{refnum}{assn:limsup},\lx@cref{refnum}{assn:limsup-out},\lx@cref{refnum}{assn:consistency}and~\lx@cref{refnum}{assn:prior}in\lx@cref{creftype~refnum}{sec:uniform-assumptions}andfor\bm{\theta}_{*}definedtherein,thereexists\kappa>0and\gamma\in(0,1)suchthat\end{equation}\operatorname*{\mathrm{lim}\vphantom{\mathrm{infsup}}}_{N\to\infty}P_{N}^{*}\left\{\IfEq{-1}{-1}{\left|{{{{\frac{\widetilde{\pi}(\bm{y};\bm{\theta}_{*})}{\pi(\bm{y};\bm{\theta}_{*})}-1}}}}\right|}{\IfEq{-1}{0}{|{{{{\frac{\widetilde{\pi}(\bm{y};\bm{\theta}_{*})}{\pi(\bm{y};\bm{\theta}_{*})}-1}}}}|}{\IfEq{-1}{1}{\bigl{|}{{{{\frac{\widetilde{\pi}(\bm{y};\bm{\theta}_{*})}{\pi(\bm{y};\bm{\theta}_{*})}-1}}}}\bigr{|}}{\IfEq{-1}{2}{\Bigl{|}{{{{\frac{\widetilde{\pi}(\bm{y};\bm{\theta}_{*})}{\pi(\bm{y};\bm{\theta}_{*})}-1}}}}\Bigr{|}}{\IfEq{-1}{3}{\biggl{|}{{{{\frac{\widetilde{\pi}(\bm{y};\bm{\theta}_{*})}{\pi(\bm{y};\bm{\theta}_{*})}-1}}}}\biggr{|}}{\IfEq{-1}{4}{\Biggl{|}{{{{\frac{\widetilde{\pi}(\bm{y};\bm{\theta}_{*})}{\pi(\bm{y};\bm{\theta}_{*})}-1}}}}\Biggr{|}}{}}}}}>\kappa\right\}>\gamma.}}

Proof.

Restrictingattentiontoθ=θ∗reducestheproblemtoexactlythatconsideredby

bilodeau2021stochastic,andourLABEL:assn:kderiv,LABEL:assn:hessian,LABEL:assn:limsup,LABEL:assn:consistencyandLABEL:assn:priorreducetotheirAssumptions1--5.IntheirRemark5theyshowthattheseassumptionsimplythattheBernstein-vonMisestheoremholdsforπ(y,u;θ∗);thisinturnimpliestheconditionsofLABEL:fact:nonconvergence.∎

While

Corollary 1onlyappliestothesingleparametervalueθ∗andonlystatesthattheerrorcannotreachzero(asopposedto,say,divergingto∞),itisnonethelesssufficienttoruleoutinferencesbasedon~π(y;θ)formostmixedmodelsusedinpractice.Inmostcasestheerroroftheapproximationwilldependonθ,thereforeitisnotguaranteedthattheapproximatedintegratedlikelihoodmaintainsitsshapelocallyaroundthemodeandconsequentlyconfidenceintervalsconstructedusingthelocalcurvatureorthelikelihooddropmaybeunreliable.

3.2 ApproximationErrorforAdaptiveQuadrature

Likelihoodapproximationsbasedon

adaptivequadraturedoconverge.LABEL:fact:likelihoodquantifiestherateofconvergenceforadaptivequadratureapproximationstothemarginallikelihoodinmixedmodels.Thisintermediatetechnicalresultisrequiredtoproveconvergenceoftheapproximatemaximumlikelihoodestimator(LABEL:fact:consistency).AsimilarresultisassumedbyaghqmlealthoughtheydonotspecifytheregionofΘinwhichtheuniformconvergenceoccurs,andastrongerresultaboutuniformconvergenceofderivativesoftheapproximatelog-likelihoodisrequiredbyapproximatelikelihood.Ourproofisself-contained,andmakesuseofsuitablyupgradedtechnicallemmasrecentlyprovidedbybilodeau2021stochastic.Aslightlooseningoftheusualerrorratescomparedtoresultsobtainedinaghqmleandapproximatelikelihoodisrequiredforuniformityoftheapproximationerrortohold.Giventhattheuniformityisassumedandnotshowinthesepreviousworks,itispossiblethattheirratesaretoooptimisticforthemixedmodelsconsideredatpresent.

Wedefineζi=ni-αfor0<α<1/4,andζN=(mini=1,…,mni)α.Weletthenumberofgroupsm=nminqforsomeq>0,sothatasN→∞,m→∞aswell.TheradiusζNwilldefinetheshrinkingregionintheparameterspaceinwhichallourstatementsaboutuniformconvergencehold;thepreciserateofshrinkageαischosentobalancetheconcentrationofthelikelihoodwiththeconvergencetozerooftheintegrationerror.Wealsodefineafixedneighbourhoodofarbitraryradiusδ>0,andapointθ∗∈Θaroundwhichthelikelihoodconcentrates.Thismaybeintuitivelythoughtofasa``true′′valueofθ,andunderweakconditionswillbethepointthatmaximizestheexpectedlog-likelihood;weemphasizethatatnopointdoweassumethemodeliscorrectlyspecifiedinthesensethatPN∗isrecoveredbyπ(θ;y)