This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Verifiable identification condition for nonignorable nonresponse data with categorical instrumental variables

\nameKenji Beppua and Kosuke Morikawaa Email: [email protected] aGraduate School of Engineering Science, Osaka University, Osaka, Japan
Abstract

We consider a model identification problem in which an outcome variable contains nonignorable missing values. Statistical inference requires a guarantee of the model identifiability to obtain estimators enjoying theoretically reasonable properties such as consistency and asymptotic normality. Recently, instrumental or shadow variables, combined with the completeness condition in the outcome model, have been highlighted to make a model identifiable. In this paper, we elucidate the relationship between the completeness condition and model identifiability when the instrumental variable is categorical. We first show that when both the outcome and instrumental variables are categorical, the two conditions are equivalent. However, when one of the outcome and instrumental variables is continuous, the completeness condition may not necessarily hold, even for simple models. Consequently, we provide a sufficient condition that guarantees the identifiability of models exhibiting a monotone-likelihood property, a condition particularly useful in instances where establishing the completeness condition poses significant challenges. Using observed data, we demonstrate that the proposed conditions are easy to check for many practical models and outline their usefulness in numerical experiments and real data analysis.

keywords:
missing not at random; nonignorable missingness; identification; instrumental variable; exponential family

1 Introduction

There has been a rapidly growing movement to utilize all the available data that may explicitly, even implicitly, contain missing values, such as causal inference (Imbens and Rubin, 2015) and data integration (Yang and Kim, 2020; Hu et al., 2022). For such datasets, appropriate analysis of missing data is indispensable to correct selection bias owing to the missingness. In recent years, analysis of missing data under missing at random (MAR) assumption (Little and Rubin, 2019) has gradually matured (Robins et al., 1994; Kim and Shao, 2021). Although model identifiability is one of the most fundamental conditions in constructing the asymptotic theory, removing the MAR assumption makes statistical inference drastically difficult, especially in model identification (Miao et al., 2016). Estimation with unidentifiable models may provide multiple solutions that have exactly the same model fitting. Several researchers have considered giving sufficient conditions for the model identification under missing not at random (MNAR).

Constructing observed likelihood consists of two distributions: (R) response mechanism and (O) outcome distribution (Kim and Shao, 2021). Miao et al. (2016) considered identification condition with Logistic, Probit, and Robit (cumulative distribution function of tt-distribution) models for (R) and normal and tt (mixture) distributions for (O). Cui et al. (2017) assumed Logistic, Probit, and cLog-log models for (R) and the generalized linear models for (O). These studies depend heavily on the model specification of both (R) and (O). Wang et al. (2014) introduced a covariate called instrument or shadow variable and demonstrated that the use of the instrument could considerably relax conditions on (R) and (O). For example, (O) requires only the monotone-likelihood property, which includes a variety of models, such as the generalized linear model. Tang et al. (2003) and Miao and Tchetgen (2018) derived conditions for model identifiability without postulating any assumptions on (R) with the help of the instrument. Miao et al. (2019) further relaxed the assumption under an assumption referred to as the completeness condition on (R) (D’Haultfœuille, 2010, 2011). For example, the generalized linear model with continuous covariates satisfies the completeness condition. To the best of our knowledge, this combination of an instrument on (R) and completeness on (O) is the most general condition for model identification and has been accepted in numerous studies (Zhao and Ma, 2022; Yang et al., 2019).

Generally, assumptions on (O) rely on the distribution of the complete data, which is untestable from observed data. Recently, modeling (O’) the observed or respondents’ outcome model, instead of (O), has been used to relax the subjective assumption (Miao et al., 2019; Riddles et al., 2016). However, the observed likelihood with (R) and (O’) involves an integration that makes the identification problem intractable. Morikawa and Kim (2021) and Beppu et al. (2021) established that the integration can be computed explicitly with Logistic models for (R) and generalized linear models for (O’) and derived identification condition. For general response mechanisms and respondents’ outcome distributions, the model identification remains an open question. Furthermore, when the instrument is categorical such as smoking history and sex, the completeness condition is not available. For example, Ibrahim et al. (2001) considered a study on the mental health of children in Connecticut and used the parents’ report of the psychopathology of the child as the binary instrument.

In this paper, we consider an identification problem with an instrument for (R) and (O’) that satisfies the monotone-likelihood ratio property. Note that although our model setup is similar to Wang et al. (2014), we can check the validity of (O’) with observed data, for example, by using the information criteria such as AIC and BIC. Furthermore, we can use semiparametric/nonparametric methods for modeling both (O’) and (R).

The rest of this paper is organized as follows. Section 2 introduces the notation and defines model identifiability. Section 3 derives the proposed identification condition. We demonstrate the effects of identifiability via a limited numerical study in Section 4. Moreover, application to real data is presented in Section 5. Finally, concluding remarks are summarized in Section 6. All the technical proofs are relegated to the Appendix.

2 Basic setup

2.1 Observed likelihood

Let {𝒙i,yi,δi}i=1n\{\bm{x}_{i},y_{i},\delta_{i}\}_{i=1}^{n} be independent and identically distributed samples from a distribution of (𝒙,y,δ)(\bm{x},y,\delta), where 𝒙\bm{x} is a fully observed covariate vector, yy is an outcome variable subject to missingness, and δ\delta is a response indicator of yy being 1(0)1(0) if yy is observed (missing). We use the generic notation p()p(\cdot) and p()p(\cdot\mid\cdot) for the marginal density and conditional density, respectively. For example, p(𝒙)p(\bm{x}) is the marginal density of 𝒙\bm{x}, and p(y𝒙)p(y\mid\bm{x}) is the conditional density of yy given 𝒙\bm{x}. We model the MNAR response mechanism P(δ=1𝒙,y)P(\delta=1\mid\bm{x},y) and consider its identification. The observed likelihood is defined as

i:δi=1P(δi=1yi,𝒙i)p(yi𝒙i)i:δi=0{1P(δi=1y,𝒙i)}p(y𝒙i)𝑑y.\displaystyle\prod_{i:\delta_{i}=1}P(\delta_{i}=1\mid y_{i},\bm{x}_{i})p(y_{i}\mid\bm{x}_{i})\prod_{i:\delta_{i}=0}\int\left\{1-P(\delta_{i}=1\mid y,\bm{x}_{i})\right\}p(y\mid\bm{x}_{i})dy. (1)

We say that this model is identifiable if parameters in (1) are identified, which is equivalent to parameters in P(δ=1y,𝒙)p(y𝒙)P(\delta=1\mid y,\bm{x})p(y\mid\bm{x}) being identified. This identification condition is essential even for semiparametric models such as an estimator defined by moment conditions (Morikawa and Kim, 2021). However, simple models can be easily unidentifiable. For example, Example 1 in Wang et al. (2014) presented an unidentifiable model when the outcome model is normal, and the response mechanism is a Logistic model.

There is an alternative way to express the relationship between yy and 𝒙\bm{x}. A disadvantage of modeling p(y𝒙)p(y\mid\bm{x}) is its subjective assumption on the distribution of complete data, not of observed data. In other words, if we made assumptions about p(y𝒙)p(y\mid\bm{x}) and ensured its identifiability, we could not verify the assumptions using the observed data. By contrast, this issue can be overcome by modeling p(y𝒙,δ=1)p(y\mid\bm{x},\delta=1) because p(y𝒙,δ=1)p(y\mid\bm{x},\delta=1) is the outcome model for the observed data, and we can check its validity using ordinal information criteria such as AIC and BIC. Therefore, we model p(y𝒙,δ=1)p(y\mid\bm{x},\delta=1) and consider the identification condition in Section 3. Hereafter, we assume two parametric models p(y𝒙,δ=1;𝜸)p(y\mid\bm{x},\delta=1;\bm{\gamma}) and P(δ=1𝒙,y;ϕ)P(\delta=1\mid\bm{x},y;\bm{\phi}), where 𝜸\bm{\gamma} and ϕ\bm{\phi} are parameters of the outcome and response models, respectively. Although our method requires two parametric models, the class of identifiable models is very large. For example, it can include semiparametric outcome models for p(y𝒙,δ=1;𝜸)p(y\mid\bm{x},\delta=1;\bm{\gamma}) and general response models P(δ=1𝒙,y;ϕ)P(\delta=1\mid\bm{x},y;\bm{\phi}) other than Logistic models, as discussed in Example 3.7.

2.2 Estimation

We present a procedure of parameter estimation based on parametric models of p(y𝒙,δ=1;𝜸)p(y\mid\bm{x},\delta=1;\bm{\gamma}) and P(δ=1𝒙,y;ϕ)P(\delta=1\mid\bm{x},y;\bm{\phi}). Let 𝜸^\hat{\bm{\gamma}} be the maximum likelihood estimator of 𝜸\bm{\gamma}. The observed likelihood (1) yields to the mean score equation for ϕ\bm{\phi} (Kim and Shao, 2021):

i=1n{δilogπ(𝒙i,yi;ϕ)ϕ(1δi)π(𝒙i,y;ϕ)/ϕp(y𝒙)dy{1π(𝒙i,y;ϕ)}p(y𝒙)𝑑y}=0\displaystyle\sum_{i=1}^{n}\left\{\delta_{i}\frac{\partial\log\pi(\bm{x}_{i},y_{i};\bm{\phi})}{\bm{\phi}}-(1-\delta_{i})\frac{\int\partial\pi(\bm{x}_{i},y;\bm{\phi})/\partial\bm{\phi}\cdot p(y\mid\bm{x})dy}{\int\{1-\pi(\bm{x}_{i},y;\bm{\phi})\}p(y\mid\bm{x})dy}\right\}=0

where π(𝒙,y;ϕ)=P(δ=1𝒙,y;ϕ)\pi(\bm{x},y;\bm{\phi})=P(\delta=1\mid\bm{x},y;\bm{\phi}). By using Bayes’ formula p(y𝒙)p(y𝒙,δ=1)/π(𝒙,y;ϕ)p(y\mid\bm{x})\propto p(y\mid\bm{x},\delta=1)/\pi(\bm{x},y;\bm{\phi}), the mean score can be written as

i=1n{δis1(𝒙i,yi;ϕ)+(1δi)s0(𝒙i;ϕ)}=0,\displaystyle\sum_{i=1}^{n}\left\{\delta_{i}s_{1}(\bm{x}_{i},y_{i};\bm{\phi})+(1-\delta_{i})s_{0}(\bm{x}_{i};\bm{\phi})\right\}=0,

where

s1(𝒙,y;ϕ)=logπ(𝒙,y;ϕ)ϕ,s0(𝒙;ϕ)=s1(𝒙,y;ϕ)p(y𝒙,δ=1)𝑑y{1/π(𝒙,y;ϕ)1}p(y𝒙,δ=1)𝑑y.\displaystyle s_{1}(\bm{x},y;\bm{\phi})=\frac{\partial\log\pi(\bm{x},y;\bm{\phi})}{\partial\bm{\phi}},\ s_{0}(\bm{x};\bm{\phi})=-\frac{\int s_{1}(\bm{x},y;\bm{\phi})p(y\mid\bm{x},\delta=1)dy}{\int\left\{1/\pi(\bm{x},y;\bm{\phi})-1\right\}p(y\mid\bm{x},\delta=1)dy}.

To compute the two integrations in s0()s_{0}(\cdot), we can use the fractional imputation (Kim, 2011). As described in Riddles et al. (2016), the EM algorithm is also applicable.

3 Identifiability

3.1 Definition of identification

Recall that the identification condition in (1) is for parameters in P(δ=1y,𝒙)p(y𝒙)P(\delta=1\mid y,\bm{x})p(y\mid\bm{x}). As seen in Section 2.2, the conditional density p(y𝒙)p(y\mid\bm{x}) is represented by p(y𝒙,δ=1;𝜸)p(y\mid\bm{x},\delta=1;\bm{\gamma}) and P(δ=1𝒙,y;𝜶,ϕ)P(\delta=1\mid\bm{x},y;\bm{\alpha},\bm{\phi}) by Bayes’ formula. Thus, using the formula, identification with these models changes to parameters in φ(y,𝒙;ϕ,𝜸)\varphi(y,\bm{x};\bm{\phi},\bm{\gamma}), where

φ(y,𝒙;ϕ,𝜸)=p(y𝒙,δ=1;𝜸)p(y𝒙,δ=1;𝜸)/π(𝒙,y;ϕ)𝑑y.\displaystyle\varphi(y,\bm{x};\bm{\phi},\bm{\gamma})=\frac{p(y\mid\bm{x},\delta=1;\bm{\gamma})}{\int p(y\mid\bm{x},\delta=1;\bm{\gamma})/\pi(\bm{x},y;\bm{\phi})dy}. (2)

Strictly speaking, the identification condition is φ(y,𝒙;ϕ,𝜸)=φ(y,𝒙;ϕ,𝜸)\varphi(y,\bm{x};\bm{\phi},\bm{\gamma})=\varphi(y,\bm{x};\bm{\phi}^{\prime},\bm{\gamma}^{\prime}) with probability 11 implies that (ϕ,𝜸)=(ϕ,𝜸)(\bm{\phi}^{\top},\bm{\gamma}^{\top})=({\bm{\phi}^{\prime}}^{\top},{\bm{\gamma}^{\prime}}^{\top}). Generally, the integral in the denominator of (2) does not have the closed form, which makes deriving a sufficient condition for the identifiability quite challenging. Morikawa and Kim (2021) identified a combination of Logistic models and normal distributions for response and outcome models has a closed form of the integration and derived a sufficient condition for the model identifiability. Beppu et al. (2021) extended the model to a case where the outcome model belongs to the exponential family while the response model is still a Logistic model. However, when the response mechanism is general, simple outcome models such as normal distribution can be unidentifiable.

Example 3.1.

Suppose that the respondents’ outcome model is y(δ=1,x)N(γ0+γ1x,1)y\mid(\delta=1,x)\sim N(\gamma_{0}+\gamma_{1}x,1), and the response model is P(δ=1x,y)=Ψ(α0+α1x+βy)P(\delta=1\mid x,y)=\Psi(\alpha_{0}+\alpha_{1}x+\beta y), where Ψ\Psi is a known distribution function such that the integration in (2) exists; then, this model is unidentifiable. For example, different parametrization (α0,α1,β,γ0,γ1)=(0,1,1,0,1)(\alpha_{0},\alpha_{1},\beta,\gamma_{0},\gamma_{1})=(0,1,1,0,1), (α0,α1,β,γ0,γ1)=(0,3,1,0,1)(\alpha_{0}^{\prime},\alpha_{1}^{\prime},\beta^{\prime},\gamma_{0}^{\prime},\gamma_{1}^{\prime})=(0,3,-1,0,1) yields the same value of the observed likelihood.

Recently, widely applicable sufficient conditions have been proposed. Assume that a covariate 𝒙\bm{x} has two components, 𝒙=(𝒖,𝒛)\bm{x}=(\bm{u}^{\top},\bm{z}^{\top})^{\top}, such that

  • (C1)

    𝒛δ(𝒖,y)\bm{z}\perp\!\!\!\perp\delta\mid(\bm{u},y) and 𝒛⟂̸y(δ=1,𝒖).\bm{z}\not\perp\!\!\!\perp y\mid(\delta=1,\bm{u}).

The covariate 𝒛\bm{z} is called an instrument (D’Haultfœuille, 2010) or a shadow variable (Miao and Tchetgen Tchetgen, 2016). Miao et al. (2019) derived sufficient conditions for model identifiability by combining the instrument and the completeness condition:

  • (C2)

    For all square-integrable function h(𝒖,y)h(\bm{u},y), E[h(𝒖,y)δ=1,𝒖,𝒛]=0E[h(\bm{u},y)\mid\delta=1,\bm{u},\bm{z}]=0 almost surely implies h(𝒖,y)=0h(\bm{u},y)=0 almost surely.

Lemma 3.2 (Identification condition by Miao et al. (2019)).

Under the conditions (C1) and (C2), the joint distribution p(y,𝐮,𝐳,δ)p(y,\bm{u},\bm{z},\delta) is identifiable.

Although the completeness condition is useful and applicable for general models, a simple model with a categorical instrument does not hold the completeness condition.

Example 3.3 (Violating completeness with categorical instrument).

Suppose y(δ=1,u,z)y\mid(\delta=1,u,z) follows the normal distribution N(u+z,1)N(u+z,1), and an instrument zz is binary taking 0 or 11. This distribution does not satisfy the completeness condition because the conditional expectation E[h(u,y)δ=1,u,z]=0E[h(u,y)\mid\delta=1,u,z]=0 when h(u,y)=1+yu(yu)2h(u,y)=1+y-u-(y-u)^{2}.

A vital implication of Example 3.3 is that instruments are no longer evidence of model identification when the instrument is categorical. Developing the identification condition for models with discrete instruments is important in applications (Ibrahim et al., 2001). We separately discuss two cases: (i) both yy and 𝒛\bm{z} are categorical; (ii) respondents’ outcome model has the monotone-likelihood ratio property.

When all variables, yy and 𝒛\bm{z}, are categorical, the model can be fully nonparametric. Theorem 3.4 demonstrates that, under these conditions, the completeness and identifiability conditions are equivalent. See Appendix 2 in Riddles et al. (2016) for the estimation of such fully nonparametric models.

Theorem 3.4.

When both yy and 𝐳\bm{z} are categorical, under condition (C1), the joint distribution p(y,𝐮,𝐳,δ)p(y,\bm{u},\bm{z},\delta) is identifiable if and only if condition (C2) holds.

As evidenced in Lemma 3.2, condition (C2) is generally sufficient for model identifiability, but Theorem 3.4 also reveals that it is necessary when yy and 𝒛\bm{z} are categorical.

Next, we consider the identification condition for the other case (ii). Let 𝒮y\mathcal{S}_{y} be the support of the random variable yy. We assume the following four conditions:

  • (C3)

    The response mechanism is

    P(δ=1y,𝒙;ϕ)=P(δ=1y,𝒖;ϕ)=Ψ{h(𝒖;𝜶)+g(𝒖;𝜷)m(y)},\displaystyle P(\delta=1\mid y,\bm{x};\bm{\phi})=P(\delta=1\mid y,\bm{u};\bm{\phi})=\Psi\{h(\bm{u};\bm{\alpha})+g(\bm{u};\bm{\beta})m(y)\}, (3)

    where ϕ=(𝜶,𝜷)\bm{\phi}=(\bm{\alpha}^{\top},\bm{\beta})^{\top}, m:𝒮ym:\mathcal{S}_{y}\to\mathbb{R} and Ψ:(0,1]\Psi:\mathbb{R}\to(0,1] are known continuous strictly monotone functions, and h(𝒖;𝜶)h(\bm{u};\bm{\alpha}) and g(𝒖;𝜷)g(\bm{u};\bm{\beta}) are known injective functions of 𝜶\bm{\alpha} and 𝜷\bm{\beta}, respectively.

  • (C4)

    The density or mass function p(y𝒙,δ=1;𝜸)p(y\mid\bm{x},\delta=1;\bm{\gamma}) is identifiable, and its support does not depend on 𝒙\bm{x}.

  • (C5)

    For all 𝒖𝒮𝒖\bm{u}\in\mathcal{S}_{\bm{u}}, there exist 𝒛1\bm{z}_{1} and 𝒛2\bm{z}_{2}, such that p(y𝒖,𝒛1,δ=1)p(y𝒖,𝒛2,δ=1)p(y\mid\bm{u},\bm{z}_{1},\delta=1)\neq p(y\mid\bm{u},\bm{z}_{2},\delta=1), and p(y𝒖,𝒛1,δ=1)/p(y𝒖,𝒛2,δ=1)p(y\mid\bm{u},\bm{z}_{1},\delta=1)/p(y\mid\bm{u},\bm{z}_{2},\delta=1) is monotone.

  • (C6)
    p(y𝒙,δ=1;𝜸)Ψ{h(𝒖;𝜶)+g(𝒖;𝜷)m(y)}𝑑y<a.s.\displaystyle\int\frac{p(y\mid\bm{x},\delta=1;\bm{\gamma})}{\Psi\{h(\bm{u};\bm{\alpha})+g(\bm{u};\bm{\beta})m(y)\}}dy<\infty\ \ \ \mathrm{a.s.}

The condition (C3) means that the random variable 𝒛\bm{z} plays a role of an instrument. The condition (C4) is the identifiability of p(y𝒙,δ=1;𝜸)p(y\mid\bm{x},\delta=1;\bm{\gamma}), which is testable from the observed data. The condition (C5) assumes a monotone-likelihood property on the outcome model, which was also used in Wang et al. (2014) for the complete data. The condition (C6) is necessary for (1) to be well-defined. It is essentially the same condition as Theorem 3.1 (I1) of Morikawa and Kim (2021). This condition is always true when the support of yy is finite. However, it must be carefully verified when yy is continuous. See Proposition 3.8 below for useful sufficient conditions when the respondents’ outcome model is normal distribution.

Under conditions (C3)–(C6), we obtain the desired identification condition.

Theorem 3.5.

The parameter (ϕ,𝛄)(\bm{\phi}^{\top},\bm{\gamma}^{\top})^{\top} is identifiable if the conditions (C1) and (C3)–(C6) hold.

We provide an example of outcome models satisfying the condition (C5).

Example 3.6 (Model satisfying (C5)).

Let density functions in the exponential family be

p(y𝒙,δ=1;𝜸)=exp(yθb(θ)τ+c(y;τ)),\displaystyle p(y\mid\bm{x},\delta=1;\bm{\gamma})=\exp\left(\frac{y\theta-b(\theta)}{\tau}+c(y;\tau)\right),

where θ=θ(η)\theta=\theta(\eta), η=l=1Lηl(𝒙)κl\eta=\sum_{l=1}^{L}\eta_{l}(\bm{x})\kappa_{l}, 𝜿=(κ1,,κL)\bm{\kappa}=(\kappa_{1},\ldots,\kappa_{L})^{\top}, and 𝜸=(τ,𝜿)\bm{\gamma}=(\tau,\bm{\kappa}^{\top})^{\top}. Then the density ratio becomes

p(y𝒖,𝒛1,δ=1)p(y𝒖,𝒛2,δ=1)exp(θ1θ2τy),\displaystyle\frac{p(y\mid\bm{u},\bm{z}_{1},\delta=1)}{p(y\mid\bm{u},\bm{z}_{2},\delta=1)}\propto\exp\left(\frac{\theta_{1}-\theta_{2}}{\tau}y\right),

where 𝒙i=(𝒖,𝒛i)\bm{x}_{i}=(\bm{u},\bm{z}_{i}) and θi=θ{l=1Lηl(𝒙i)κl},i=1,2\theta_{i}=\theta\{\sum_{l=1}^{L}\eta_{l}(\bm{x}_{i})\kappa_{l}\},\ i=1,2. Therefore, the density ratio is monotone.

Example 3.7 (Model satisfying (C6)).

In application, it is often reasonable to assume a normal distribution on the respondents’ outcome model. Focusing on the tail of the outcome model, we provide a sufficient condition to check (C6) for models with general response mechanisms.

Proposition 3.8.

Suppose that the observed distribution p(y𝐱,δ=1)p(y\mid\bm{x},\delta=1) is normal distribution N(μ(𝐱;𝛋),σ2)N(\mu(\bm{x};\bm{\kappa}),\sigma^{2}), the response mechanism is (3) with m(y)=ym(y)=y and g(𝐮;𝛃)=βg(\bm{u};\bm{\beta})=\beta, and the strictly monotone increasing function Ψ\Psi meets the following condition:

s(0,2)s.t.lim infzΨ(z)exp(|z|s)>0.{}^{\exists}s\in(0,2)\ \mathrm{s.t.}\ \liminf_{z\to-\infty}\Psi(z)\exp(|z|^{s})>0. (4)

Then, this model satisfies (C6).

The condition (4) is easy to check. For example, it holds for Logistic and Robit functions but not for the Probit function. According to Proposition 3.8, it is possible to estimate μ(x;𝜿)\mu(x;\bm{\kappa}) with observed data using splines and other nonparametric methods, which allows us to use very flexible models. Furthermore, we can also estimate the response mechanism using nonparametric methods because it does not impose any restrictions on the functional form of h(𝒖;𝜶)h(\bm{u};\bm{\alpha}).

4 Numerical experiment

We present the effects of identifiability in numerical experiments by comparing weak and strong identifiable models. We prepared four Scenarios S1–S4:

  • S1:

    (Outcome: Normal, Response: Logistic)
    [yu,z,δ=1]N(κ0+κ1u+κ2z,σ2)[y\mid u,z,\delta=1]\sim N(\kappa_{0}+\kappa_{1}u+\kappa_{2}z,\sigma^{2}), logit{P(δ=1u,y;𝜶,β)}=α0+α1u+βy\text{logit}\{P(\delta=1\mid u,y;\bm{\alpha},\beta)\}=\alpha_{0}+\alpha_{1}u+\beta y, uN(0,12)u\sim N(0,1^{2}), and zB(1,0.5)z\sim B(1,0.5), where (κ0,κ1,σ2)=(0.3,0.4,1/22)(\kappa_{0},\kappa_{1},\sigma^{2})^{\top}=(0.3,0.4,1/{\sqrt{2}^{2}})^{\top} and (α0,α1,β)=(0.7,0.2,0.29)(\alpha_{0},\alpha_{1},\beta)^{\top}=(0.7,-0.2,0.29)^{\top}.

  • S2:

    (Outcome: Normal, Response: Cauchy)
    [yu,z,δ=1]N(κ0+κ1u+κ2z,σ2)[y\mid u,z,\delta=1]\sim N(\kappa_{0}+\kappa_{1}u+\kappa_{2}z,\sigma^{2}), P(δ=1u,y;𝜶,β)=Ψ(α0+α1u+βy)P(\delta=1\mid u,y;\bm{\alpha},\beta)=\Psi(\alpha_{0}+\alpha_{1}u+\beta y), uUnif(1,1)u\sim\mathrm{Unif}(-1,1), and zB(1,0.7)z\sim B(1,0.7), where (κ0,κ1,σ2)=(0.36,0.59,1/22)(\kappa_{0},\kappa_{1},\sigma^{2})^{\top}=(-0.36,0.59,1/{\sqrt{2}^{2}})^{\top}, (α0,α1,β)=(0.24,0.1,0.42)(\alpha_{0},\alpha_{1},\beta)^{\top}=(0.24,-0.1,0.42)^{\top}, and Ψ\Psi is the cumulative distribution function of the Cauchy distribution.

  • S3:

    (Outcome: Bernoulli, Response: Probit)
    [yu,z,δ=1]B(1,p(u,z;𝜿)})[y\mid u,z,\delta=1]\sim B(1,p(u,z;\bm{\kappa})\}), P(δ=1u,y;𝜶,β)=Ψ(α0+α1u+βy)P(\delta=1\mid u,y;\bm{\alpha},\beta)=\Psi(\alpha_{0}+\alpha_{1}u+\beta y), uN(0,12)u\sim N(0,1^{2}), and zN(0,12)z\sim N(0,1^{2}), where p(u,z;𝜿)=1/{1+exp(κ0κ1uκ2z)p(u,z;\bm{\kappa})=1/\{1+\exp(-\kappa_{0}-\kappa_{1}u-\kappa_{2}z), (κ0,κ1,κ2)=(0.21,3.8,1.0)(\kappa_{0},\kappa_{1},\kappa_{2})^{\top}=(-0.21,3.8,1.0)^{\top}, (α0,α1,β)=(0.4,0.39,0.3)(\alpha_{0},\alpha_{1},\beta)^{\top}=(0.4,0.39,0.3)^{\top}, and Ψ\Psi is the cumulative distribution function of the standard normal.

  • S4:

    (Outcome: Normal+nonlinear mean structure, Response: Cauchy or Logistic)
    [yu,z,δ=1]N(μ(𝒙),0.52)[y\mid u,z,\delta=1]\sim N(\mu(\bm{x}),0.5^{2}), P(δ=1u,y;𝜶,β)=Ψ(α0+α1u+βy)P(\delta=1\mid u,y;\bm{\alpha},\beta)=\Psi(\alpha_{0}+\alpha_{1}u+\beta y), uUnif(1,1)u\sim\mathrm{Unif}(-1,1), and zB(1,0.5)z\sim B(1,0.5), where μ(𝒙)=z+cos(2πu)+exp(z+u)\mu(\bm{x})=z+\cos(2\pi u)+\exp(z+u), (α0,α1,β)=(0.1,0.2,0.3)(\alpha_{0},\alpha_{1},\beta)^{\top}=(0.1,-0.2,0.3)^{\top}, and Ψ\Psi is the cumulative distribution function of the Cauchy or Logistic distribution.

In S1 and S2, the strength of the identification can be adjusted by changing the parameter κ2\kappa_{2} because κ2=0\kappa_{2}=0 indicates that the model is unidentifiable by Example 3.1. On the other hand, we can verify that the models in S3 and S4 are identifiable by Theorem 3.5. For example, in S4, we can see that checking (C3) and (C4) is straightforward to the setting, while (C5) and (C6) hold from Example 3.6 and Proposition 3.8, respectively. From S3 and S4, we can confirm the successful inference even in the case of discrete outcome and complex mean structures, respectively.

We generated 1,000 independent Monte Carlo samples and computed two estimators for E[y]E[y] and β\beta with two methods: fractional imputation (FI) and complete case (CC) estimators, which use only completely observed data. The estimator for E[y]E[y] is computed by the standard inverse probability weighting method with estimated response models (Riddles et al., 2016). We used correctly specified models for Scenarios S1–S3 but used nonparametric models for Scenario S4 because it is unrealistic to assume that the complicated mean structure is known. The R package ‘crs’ specialized in nonparametric spline regression on the mixture of categorical and continuous covariates (Nie and Racine, 2012) is used to estimate the respondents’ outcome model. Response models are estimated by using the method discussed in Section 2.2.

Bias, root mean squared error (RMSE), and coverage rate for 95% confidence intervals in S1–S4 are reported in Table 1. In all the Scenarios, CC estimators have a significant bias, and the coverage rates are far from 95%, while FI estimators work well when the model is surely identifiable. When κ2\kappa_{2} is small in S1 and S2, the performance of variance estimation with FI is poor, as expected, although that of point estimates is acceptable. The results in S4 indicate that the model is identifiable even if we use a nonparametric mean structure, and the estimates are almost the same between the two response models.

Table 1: Results of S1–S4: Bias, root mean square error (RMSE), and coverage rate (CR,%) with 95%95\% confidence interval are reported. CC: complete case; FI: fractional imputation.
Scenario Parameter κ2\kappa_{2} Method Bias RMSE CR
1.0 CC 0.053 0.066 73.5
FI 0.000 0.043 95.4
E[y]E[y] 0.5 CC 0.039 0.053 80.9
FI -0.001 0.059 97.1
S1 0.1 CC 0.034 0.049 83.0
FI 0.021 0.136 99.8
1.0 FI 0.001 0.163 95.2
β\beta 0.5 FI 0.003 0.330 98.6
0.1 FI -0.146 0.865 100
1.0 CC 0.146 0.152 5.7
FI -0.004 0.051 94.8
E[y]E[y] 0.5 CC 0.130 0.136 7.7
FI -0.008 0.086 86.2
S2 0.1 CC 0.127 0.133 9.4
FI -0.007 0.105 92.4
1.0 FI 0.008 0.148 95.4
β\beta 0.5 FI 0.044 0.365 100
0.1 FI 0.033 0.448 100
E[y]E[y] CC 0.100 0.102 0.3
S3 FI 0.001 0.022 95.3
β\beta FI -0.023 0.279 95.0
CC(Logistic) 0.341 0.355 5.4
E[y]E[y] FI(Logistic) 0.005 0.079 95.4
CC(Cauchy) 0.296 0.312 10.7
S4 FI(Cauchy) 0.007 0.080 94.3
β\beta FI(Logistic) 0.006 0.050 94.7
FI(Cauchy) 0.011 0.063 93.8

5 Real data analysis

We analyzed a dataset of 2139 HIV-positive patients enrolled in AIDS Clinical Trials Group Study 175 (ACTG175; Hammer et al. (1996)). In this analysis, we specify 532 patients for analysis who received zidovudine (ZDV) monotherapy. Let each yy, x1x_{1}, and x2x_{2} be the CD4 cell count at 96±596\pm 5 weeks, at the baseline, and at 20±520\pm 5 weeks, x3x_{3} be the CD8 cell count at the baseline, and zz be sex. The outcome was subject to missingness with a 60.34% observation rate, while all covariates were observed. To make estimation stable and easy, we standardized all the data. We expect that zz (sex) is a reasonable choice for an instrument variable because the information is a biological value, which affects the value of CD4, but has little effect on the response probability.

Patients who are suffering from a mild illness of HIV tend to have higher CD4 cell count; thus, one may consider that missingness of the outcome relates to serious conditions and may expect that the missing value of the outcome would be a lower CD4 cell count than the respondent. We therefore considered five different MNAR response models:

P(δ=1x1,x2,x3,y)=Ψ(α0+α1x1+α2x2+α3x3+βy),\displaystyle P(\delta=1\mid x_{1},x_{2},x_{3},y)=\Psi(\alpha_{0}+\alpha_{1}x_{1}+\alpha_{2}x_{2}+\alpha_{3}x_{3}+\beta y),

where Ψ\Psi represents either the Logistic function or the distribution functions of the Cauchy or tt distribution with degrees of freedom v(=2,5,10)v\,(=2,5,10). Theorem 3.5 and Proposition 3.8 ensure that all the models with these five response models are identifiable, even when the instrumental variable zz is discrete. From the above conjecture on missing values, the sign of β\beta is expected to be negative. We assumed that the respondent’s outcome is a normal distribution with a nonparametric mean structure and estimated by the ‘crs’ R package as considered in Scenario S4 in Section 4. The residual plots shown in Figure 1 and the computed R2R^{2}-value (=0.453)(=0.453) signify the assumed distribution on the respondents’ outcome fit well. Table 2 reports the estimated parameters and their estimated standard errors calculated by 1,000 bootstrap samples. The results of the five response models were almost similar. This suggests that the response mechanism is robust to the choice of response models. Although we cannot determine whether it is MNAR or MAR because the estimated standard error for β\beta is large, the point estimate is negative, as we expected. This result is consistent with the result in Zhao et al. (2021).

Refer to caption
Figure 1: Residual plots of respondents’ outcome in ACTG175 data.
Table 2: Estimated parameters: Estimates and standard errors for the target parameters are reported. Logistic and Cauchy are Fractional Imputation using Logistic and Cauchy distributions for the response mechanism. TvT_{v}: tt distribution function with degrees of freedom v(=2,5,10)v\,(=2,5,10).
Parameter Model Estimate SE Parameter Model Estimate SE
Logistic 0.464 0.104 Logistic 0.125 0.156
Cauchy 0.417 0.260 Cauchy 0.108 0.139
α0\alpha_{0} T2T_{2} 0.341 0.081 α1\alpha_{1} T2T_{2} 0.091 0.113
T5T_{5} 0.306 0.069 T5T_{5} 0.082 0.102
T10T_{10} 0.295 0.066 T10T_{10} 0.080 0.099
Logistic 0.255 0.192 Logistic 0.093 0.107
Cauchy 0.244 0.207 Cauchy 0.083 0.097
α2\alpha_{2} T2T_{2} 0.196 0.148 α3\alpha_{3} T2T_{2} 0.069 0.079
T5T_{5} 0.169 0.126 T5T_{5} 0.062 0.070
T10T_{10} 0.160 0.120 T10T_{10} 0.060 0.068
Logistic -0.032 0.314 Logistic 276.70 13.476
Cauchy -0.030 0.387 Cauchy 276.51 14.107
β\beta T2T_{2} -0.027 0.235 E[y]E[y] T2T_{2} 276.57 13.437
T5T_{5} -0.021 0.203 T5T_{5} 276.61 13.271
T10T_{10} -0.019 0.194 T10T_{10} 276.63 13.217

6 Conclusion

In this paper, we proposed a new identification condition for models using respondents’ outcome and response models. Although our method requires the specification of the two models, the model can be very general with the help of an instrument. As considered in Scenario S4 in Section 4, the mean function in the respondents’ outcome model can be nonparametric, and the response model can be any strictly monotone function, other than Logistic models. Our condition guarantees model identifiability even when instruments are categorical, which is not covered by previous conditions. Another advantage of using our method is the identification condition is easy to verify with observed data.

However, our method has some limitations. First, respondents’ outcome models need to have the monotone-likelihood property by Condition (C5). For example, we cannot deal with mixture models in our framework. Second, the specification of instruments is necessary in advance. To date, some studies on finding the instruments have been proposed (Zhao et al., 2021), but there are still no gold standard methods.

Funding

Research by the second author was supported by MEXT Project for Seismology toward Research Innovation with Data of Earthquake (STAR-E) Grant Number JPJ010217.

References

  • (1)
  • Beppu et al. (2021) Beppu, K., Morikawa, K., and Im, J. (2021), “Imputation with verifiable identification condition for nonignorable missing outcomes,” arXiv preprint arXiv:2204.10508, .
  • Cui et al. (2017) Cui, X., Guo, J., and Yang, G. (2017), “On the identifiability and estimation of generalized linear models with parametric nonignorable missing data mechanism,” Computational Statistics & Data Analysis, 107, 64–80.
  • D’Haultfœuille (2010) D’Haultfœuille, X. (2010), “A new instrumental method for dealing with endogenous selection,” Journal of Econometrics, 154(1), 1–15.
  • D’Haultfœuille (2011) D’Haultfœuille, X. (2011), “On the completeness condition in nonparametric instrumental problems,” Econometric Theory, 27(3), 460–471.
  • Hammer et al. (1996) Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundacker, H., Schooley, R. T., Haubrich, R. H., Henry, W. K., Lederman, M. M., Phair, J. P., Niu, M. et al. (1996), “A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter,” New England Journal of Medicine, 335(15), 1081–1090.
  • Hu et al. (2022) Hu, W., Wang, R., Li, W., and Miao, W. (2022), “Paradoxes and resolutions for semiparametric fusion of individual and summary data,” arXiv preprint arXiv:2210.00200, .
  • Ibrahim et al. (2001) Ibrahim, J. G., Lipsitz, S. R., and Horton, N. (2001), “Using auxiliary data for parameter estimation with non-ignorably missing outcomes,” Journal of the Royal Statistical Society: Series C (Applied Statistics), 50(3), 361–373.
  • Imbens and Rubin (2015) Imbens, G. W., and Rubin, D. B. (2015), Causal inference in statistics, social, and biomedical sciences Cambridge University Press.
  • Kim (2011) Kim, J. K. (2011), “Parametric fractional imputation for missing data analysis,” Biometrika, 98(1), 119–132.
  • Kim and Shao (2021) Kim, J. K., and Shao, J. (2021), Statistical methods for handling incomplete data CRC press.
  • Little and Rubin (2019) Little, R. J., and Rubin, D. B. (2019), Statistical analysis with missing data, Vol. 793 John Wiley & Sons.
  • Miao et al. (2016) Miao, W., Ding, P., and Geng, Z. (2016), “Identifiability of normal and normal mixture models with nonignorable missing data,” Journal of the American Statistical Association, 111(516), 1673–1683.
  • Miao et al. (2019) Miao, W., Liu, L., Tchetgen, E. T., and Geng, Z. (2019), “Identification, doubly robust estimation, and semiparametric efficiency theory of nonignorable missing data with a shadow variable,” arXiv preprint arXiv:1509.02556, .
  • Miao and Tchetgen (2018) Miao, W., and Tchetgen, E. T. (2018), “Identification and inference with nonignorable missing covariate data,” Statistica Sinica, 28(4), 2049.
  • Miao and Tchetgen Tchetgen (2016) Miao, W., and Tchetgen Tchetgen, E. J. (2016), “On varieties of doubly robust estimators under missingness not at random with a shadow variable,” Biometrika, 103(2), 475–482.
  • Morikawa and Kim (2021) Morikawa, K., and Kim, J. K. (2021), “Semiparametric optimal estimation with nonignorable nonresponse data,” The Annals of Statistics, 49(5), 2991–3014.
  • Nie and Racine (2012) Nie, Z., and Racine, J. S. (2012), “The crs Package: Nonparametric Regression Splines for Continuous and Categorical Predictors.,” R Journal, 4(2).
  • Riddles et al. (2016) Riddles, M. K., Kim, J. K., and Im, J. (2016), “A propensity-score-adjustment method for nonignorable nonresponse,” Journal of Survey Statistics and Methodology, 4(2), 215–245.
  • Robins et al. (1994) Robins, J. M., Rotnitzky, A., and Zhao, L. P. (1994), “Estimation of regression coefficients when some regressors are not always observed,” Journal of the American statistical Association, 89(427), 846–866.
  • Tang et al. (2003) Tang, G., Little, R. J., and Raghunathan, T. E. (2003), “Analysis of multivariate missing data with nonignorable nonresponse,” Biometrika, 90(4), 747–764.
  • Wang et al. (2014) Wang, S., Shao, J., and Kim, J. K. (2014), “An instrumental variable approach for identification and estimation with nonignorable nonresponse,” Statistica Sinica, 24, 1097–1116.
  • Yang and Kim (2020) Yang, S., and Kim, J. K. (2020), “Statistical data integration in survey sampling: A review,” Japanese Journal of Statistics and Data Science, 3, 625–650.
  • Yang et al. (2019) Yang, S., Wang, L., and Ding, P. (2019), “Causal inference with confounders missing not at random,” Biometrika, 106(4), 875–888.
  • Zhao and Ma (2022) Zhao, J., and Ma, Y. (2022), “A versatile estimation procedure without estimating the nonignorable missingness mechanism,” Journal of the American Statistical Association, 117(540), 1916–1930.
  • Zhao et al. (2021) Zhao, P., Wang, L., and Shao, J. (2021), “Sufficient dimension reduction and instrument search for data with nonignorable nonresponse,” Bernoulli, 27, 930–945.

Appendix A Technical Proofs

We first provide a technical result to prove Theorem 3.4.

Lemma A.1.

Let aa, bb, and cc be any positive real numbers. Assume that r1r_{1} and r2r_{2} are positive real numbers satisfying

aba+b<r12r22r12r22<c.\displaystyle-\frac{ab}{a+b}<\frac{r_{1}^{2}-r_{2}^{2}}{r_{1}^{2}r_{2}^{2}}<c. (5)

Then, there exist 0<πj(k)<1(j=1,2,3;k=1,2)0<\pi^{(k)}_{j}<1\,(j=1,2,3;k=1,2) such that

j=13πj(1)=r12,j=13πj(2)=r22,\displaystyle\sum_{j=1}^{3}\pi^{(1)}_{j}=r_{1}^{2},\quad\sum_{j=1}^{3}\pi^{(2)}_{j}=r_{2}^{2}, (6)

and

1π1(1)1π1(2)=a,1π2(1)1π2(2)=b,1π3(1)1π3(2)=c.\displaystyle\begin{split}\frac{1}{\pi^{(1)}_{1}}-\frac{1}{\pi^{(2)}_{1}}=a,\quad\frac{1}{\pi^{(1)}_{2}}-\frac{1}{\pi^{(2)}_{2}}=b,\quad\frac{1}{\pi^{(1)}_{3}}-\frac{1}{\pi^{(2)}_{3}}=-c.\end{split} (7)
Proof of Lemma A.1.

By using a polar coordinate system, we transform πj(k)(j=1,2,3;k=1,2)\pi^{(k)}_{j}\,(j=1,2,3;k=1,2) into

(π1(1),π2(1),π3(1))\displaystyle(\sqrt{\pi^{(1)}_{1}},\sqrt{\pi^{(1)}_{2}},\sqrt{\pi^{(1)}_{3})} =r1(sinϕ1cosϕ2,sinϕ1sinϕ2,cosϕ1),\displaystyle=r_{1}(\sin\phi_{1}\cos\phi_{2},\sin\phi_{1}\sin\phi_{2},\cos\phi_{1}),
(π1(2),π2(2),π3(2))\displaystyle(\sqrt{\pi^{(2)}_{1}},\sqrt{\pi^{(2)}_{2}},\sqrt{\pi^{(2)}_{3}}) =r2(sinψ1cosψ2,sinψ1sinψ2,cosψ1),\displaystyle=r_{2}(\sin\psi_{1}\cos\psi_{2},\sin\psi_{1}\sin\psi_{2},\cos\psi_{1}),

where 0<ϕ1,ϕ2,ψ1,ψ2<π/20<\phi_{1},\phi_{2},\psi_{1},\psi_{2}<\pi/2 to ensure πj(k)(j=1,2,3;k=1,2)\pi^{(k)}_{j}\,(j=1,2,3;k=1,2) satisfy (6). It follows from (7) and double-angular formulas that we have

r12(1ω1)(1+ω2)r22(1ω3)(1+ω4)=ar12r224(1ω1)(1+ω2)(1ω3)(1+ω4),\displaystyle\begin{split}&r_{1}^{2}(1-\omega_{1})(1+\omega_{2})-r_{2}^{2}(1-\omega_{3})(1+\omega_{4})\\ &=-\frac{ar_{1}^{2}r_{2}^{2}}{4}(1-\omega_{1})(1+\omega_{2})(1-\omega_{3})(1+\omega_{4}),\end{split} (8)
r12(1ω1)(1ω2)r22(1ω3)(1ω4)=br12r224(1ω1)(1ω2)(1ω3)(1ω4),\displaystyle\begin{split}&r_{1}^{2}(1-\omega_{1})(1-\omega_{2})-r_{2}^{2}(1-\omega_{3})(1-\omega_{4})\\ &=-\frac{br_{1}^{2}r_{2}^{2}}{4}(1-\omega_{1})(1-\omega_{2})(1-\omega_{3})(1-\omega_{4}),\end{split} (9)
r12(1+ω1)r22(1+ω3)=cr12r222(1+ω1)(1+ω3),\displaystyle r_{1}^{2}(1+\omega_{1})-r_{2}^{2}(1+\omega_{3})=\frac{cr_{1}^{2}r_{2}^{2}}{2}(1+\omega_{1})(1+\omega_{3}), (10)

where ω1=cos2ϕ1,ω2=cos2ϕ2,ω3=cos2ψ1\omega_{1}=\cos 2\phi_{1},\omega_{2}=\cos 2\phi_{2},\omega_{3}=\cos 2\psi_{1}, and ω4=cos2ψ2\omega_{4}=\cos 2\psi_{2}. Setting ω2=ω4\omega_{2}=\omega_{4} and equations (8) and (9) yield

r12(1ω1)r22(1ω3)\displaystyle r_{1}^{2}(1-\omega_{1})-r_{2}^{2}(1-\omega_{3}) =ar12r224(1ω1)(1+ω2)(1ω3),\displaystyle=-\frac{ar_{1}^{2}r_{2}^{2}}{4}(1-\omega_{1})(1+\omega_{2})(1-\omega_{3}),
r12(1ω1)r22(1ω3)\displaystyle r_{1}^{2}(1-\omega_{1})-r_{2}^{2}(1-\omega_{3}) =br12r224(1ω1)(1ω2)(1ω3).\displaystyle=-\frac{br_{1}^{2}r_{2}^{2}}{4}(1-\omega_{1})(1-\omega_{2})(1-\omega_{3}).

Fixing ω2=12a/(a+b)\omega_{2}=1-2a/(a+b) reduces the above equations to the one common equation

r12(1ω1)r22(1ω3)=r12r22ab2(a+b)(1ω1)(1ω3),\displaystyle r_{1}^{2}(1-\omega_{1})-r_{2}^{2}(1-\omega_{3})=-\frac{r_{1}^{2}r_{2}^{2}ab}{2(a+b)}(1-\omega_{1})(1-\omega_{3}), (11)

maintaing the condition 1<ω2<1-1<\omega_{2}<1. It remains to show that there exists 1<ω3<1-1<\omega_{3}<1 satisfying (10) and (11). Solving the equation (11) with respect to ω1\omega_{1}, we have

ω1=1r22(1ω3)r12+r12r22ab(1ω3)/{2(a+b)}.\displaystyle\omega_{1}=1-\frac{r_{2}^{2}(1-\omega_{3})}{r_{1}^{2}+r_{1}^{2}r_{2}^{2}ab(1-\omega_{3})/\{2(a+b)\}}. (12)

Substituting (12) into (10) leads to the following quadratic equation with respect to ω3\omega_{3}:

f(ω3)=\displaystyle f(\omega_{3})= (r12r24ab+cr14r24ab2(a+b)cr12r242)ω32(r14r22aba+b+cr14r22)ω3\displaystyle\left(\frac{r_{1}^{2}r_{2}^{4}ab+cr_{1}^{4}r_{2}^{4}ab}{2(a+b)}-\frac{cr_{1}^{2}r_{2}^{4}}{2}\right)\omega_{3}^{2}-\left(\frac{r_{1}^{4}r_{2}^{2}ab}{a+b}+cr_{1}^{4}r_{2}^{2}\right)\omega_{3}
+(r12r22ab(2r12r22cr12r22)2(a+b)+cr12r242+2r142r12r22cr14r22)=0.\displaystyle+\left(\frac{r_{1}^{2}r_{2}^{2}ab\left(2r_{1}^{2}-r_{2}^{2}-cr_{1}^{2}r_{2}^{2}\right)}{2(a+b)}+\frac{cr_{1}^{2}r_{2}^{4}}{2}+2r_{1}^{4}-2r_{1}^{2}r_{2}^{2}-cr_{1}-4r_{2}^{2}\right)=0.

It follows from (5) that

f(1)=r12(2r122r222cr12r22)<0,f(1)=2r12(r12r22+r12r22aba+b)>0,\displaystyle f(1)=r_{1}^{2}\left(2r_{1}^{2}-2r_{2}^{2}-2cr_{1}^{2}r_{2}^{2}\right)<0,\quad f(-1)=2r_{1}^{2}\left(r_{1}^{2}-r_{2}^{2}+\frac{r_{1}^{2}r_{2}^{2}ab}{a+b}\right)>0,

which implies that there is at least one solution of ω3\omega_{3} to the equation f(ω3)=0f(\omega_{3})=0 in the open interval (1,1)(-1,1) .

Finally, we prove Theorem 3.4 with the help of Lemma A.1.

Proof of Theorem 3.4.

Without loss of generality, we set the value of 𝒖\bm{u} to a fixed vector because the following proof holds for each 𝒖\bm{u}. Let the categorical variables yy and zz take values in {1,2,,p}\{1,2,\ldots,p\} and {1,2,,q}\{1,2,\ldots,q\}, respectively. We show that model identifiability implies the completeness condition (C2) by individually addressing three cases: (i) p=2p=2, (ii) p=3p=3, and (iii) p4p\geq 4 because “if” part has been already established by Lemma 3.2.

When p=2p=2, condition (C1) results in the rank of a q×2q\times 2 matrix, composed of p(y=jδ=1,z=i)p(y=j\mid\delta=1,z=i) in its (i,j)(i,j)-th element (i=1,2;j=1,qi=1,2;j=1\dots,q), being 22. Hence, identifiable models always satisfy the completeness condition (C2).

For cases where p3p\geq 3, we must show that the model becomes unidentifiable when the completeness condition is violated. The breach of the completeness condition indicates the existence of a non-zero vector (h1,,hp)(h_{1},\dots,h_{p}) such that for z=1,,qz=1,\ldots,q, we have

E[hyδ=1,z]=y=1phyp(yδ=1,z)=0.\displaystyle E[h_{y}\mid\delta=1,z]=\sum_{y=1}^{p}h_{y}p(y\mid\delta=1,z)=0. (13)

The elements in (h1,,hp)(h_{1},\cdots,h_{p}) do not all share the same sign, and multiplying this vector by any constant does not affect the above equation. Recall that the model’s unidentifiability implies that πy(1)πy(2)\pi^{(1)}_{y}\neq\pi^{(2)}_{y} exists for some y{1,,p}y\in\{1,\dots,p\}, satisfying y=1pp(yδ=1,z)/πy(1)=y=1pp(yδ=1,z)/πy(2)\sum_{y=1}^{p}p(y\mid\delta=1,z)/\pi^{(1)}_{y}=\sum_{y=1}^{p}p(y\mid\delta=1,z)/\pi^{(2)}_{y}. We now construct an unidentifiable model when the completeness condition is violated.

When p=3p=3, without loss of generality, we assume h1>0h_{1}>0, h2>0h_{2}>0, and h3<0h_{3}<0 satisfying the condition y=13hyp(yδ=1,z)=0\sum_{y=1}^{3}h_{y}p(y\mid\delta=1,z)=0 for all z{1,,q}z\in\{1,\dots,q\}. Employing Lemma A.1 with a=h1a=h_{1}, b=h2b=h_{2}, c=h3c=-h_{3}, and r1=r2=1r_{1}=r_{2}=1, we derive:

1π1(1)1π1(2)=h1,1π2(1)1π2(2)=h2,1π3(1)1π3(2)=h3,\displaystyle\begin{split}\frac{1}{\pi^{(1)}_{1}}-\frac{1}{\pi^{(2)}_{1}}=h_{1},\quad\frac{1}{\pi^{(1)}_{2}}-\frac{1}{\pi^{(2)}_{2}}=h_{2},\quad\frac{1}{\pi^{(1)}_{3}}-\frac{1}{\pi^{(2)}_{3}}=h_{3},\end{split}

where j=13πj(1)=j=13πj(2)=1\sum_{j=1}^{3}\pi^{(1)}_{j}=\sum_{j=1}^{3}\pi^{(2)}_{j}=1. Substituting h1h_{1}, h2h_{2}, and h3h_{3} into y=13hyp(yδ=1,z)=0\sum_{y=1}^{3}h_{y}p(y\mid\delta=1,z)=0 shows that the model is unidentifiable.

Lastly, we consider the case of p4p\geq 4. Suppose hy(y=1,,p)h_{y}\,(y=1,\dots,p) satisfies (13). Within (h1,,hp)(h_{1},\cdots,h_{p}), we select three elements with signs as positive, positive, and negative, respectively, and define them as aa, bb, and c-c where a,b,c>0a,b,c>0, and λ\lambda is set to be sufficiently large to ensure that

λ>2max{a+bab,1c}.\displaystyle\lambda>2\max\left\{\frac{a+b}{ab},~{}\frac{1}{c}\right\}. (14)

For ease of notation, we denote (h1,,hp)=(h1,,hp3,a,b,c)(h_{1},\cdots,h_{p})=(h_{1},\cdots,h_{p-3},a,b,-c). The remaining part of the proof is similar when the combination of the signs is negative, negative, and positive. With the selected λ\lambda, 0<πy(k)<1(y=1,,p3;k=1,2)0<\pi_{y}^{(k)}<1\,(y=1,\ldots,p-3;k=1,2) are determined to be sufficiently small to satisfy

(1y=1p3πy(1))(1y=1p3πy(2))12,y=1p3πy(1)<1,y=1p3πy(2)<1,\displaystyle\left(1-\sum_{y=1}^{p-3}\pi_{y}^{(1)}\right)\left(1-\sum_{y=1}^{p-3}\pi_{y}^{(2)}\right)\geq\frac{1}{2},\quad\sum_{y=1}^{p-3}\pi_{y}^{(1)}<1,\quad\sum_{y=1}^{p-3}\pi_{y}^{(2)}<1, (15)
1πy(1)1πy(2)=λhy,fory=1,,p3.\displaystyle\frac{1}{\pi^{(1)}_{y}}-\frac{1}{\pi^{(2)}_{y}}=\lambda h_{y},\quad\mathrm{for}~{}y=1,\ldots,p-3.

Furthermore, we define r1r_{1} and r2r_{2} as

r12=1y=1p3πy(1),r22=1y=1p3πy(2).\displaystyle r_{1}^{2}=1-\sum_{y=1}^{p-3}\pi_{y}^{(1)},\ r_{2}^{2}=1-\sum_{y=1}^{p-3}\pi_{y}^{(2)}. (16)

By determining the variables through these steps, it follows from (14), (15), and (16) that condition (5) with a=λaa=\lambda a, b=λbb=\lambda b, and c=λcc=\lambda c is fulfilled:

r12r22r12r222(r12r22)2cc<(λc),\displaystyle\frac{r_{1}^{2}-r_{2}^{2}}{r_{1}^{2}r_{2}^{2}}\leq 2(r_{1}^{2}-r_{2}^{2})\leq\frac{2}{c}c<(\lambda c),
(λa)(λb)(λa)+(λb)<aba+b2(a+b)ab=2r12r221r12r221r12r22<r12r22r12r22.\displaystyle-\frac{(\lambda a)(\lambda b)}{(\lambda a)+(\lambda b)}<-\frac{ab}{a+b}\frac{2(a+b)}{ab}=-2r_{1}^{2}r_{2}^{2}\frac{1}{r_{1}^{2}r_{2}^{2}}\leq-\frac{1}{r_{1}^{2}r_{2}^{2}}<\frac{r_{1}^{2}-r_{2}^{2}}{r_{1}^{2}r_{2}^{2}}.

Therefore, by applying Lemma A.1, we demonstrate that there exist πp2(k)\pi^{(k)}_{p-2}, πp1(k)\pi^{(k)}_{p-1}, and πp(k)(k=1,2)\pi^{(k)}_{p}\,(k=1,2) such that

y=p2pπy(1)=r12,y=p2pπy(2)=r22,\displaystyle\sum_{y=p-2}^{p}\pi_{y}^{(1)}=r_{1}^{2},\ \sum_{y=p-2}^{p}\pi_{y}^{(2)}=r_{2}^{2},
1πp2(1)1πp2(2)=λa,1πp1(1)1πp1(2)=λb,1πp(1)1πp(2)=λc.\displaystyle\frac{1}{\pi_{p-2}^{(1)}}-\frac{1}{\pi_{p-2}^{(2)}}=\lambda a,\quad\frac{1}{\pi_{p-1}^{(1)}}-\frac{1}{\pi_{p-1}^{(2)}}=\lambda b,\quad\frac{1}{\pi_{p}^{(1)}}-\frac{1}{\pi_{p}^{(2)}}=-\lambda c.

The condition (13) suggests that the constructed πy(k)(y=1,,p;k=1,2)\pi^{(k)}_{y}\,(y=1,\dots,p;k=1,2) satisfy y=1pπy(k)=1\sum_{y=1}^{p}\pi^{(k)}_{y}=1 for k=1,2k=1,2 and, for any z{1,,q}z\in\{1,\dots,q\},

y=1p(1πy(1)1πy(2))p(yδ=1,z)=λy=1phyp(yδ=1,z)=0.\sum_{y=1}^{p}\left(\frac{1}{\pi^{(1)}_{y}}-\frac{1}{\pi^{(2)}_{y}}\right)p(y\mid\delta=1,z)=\lambda\sum_{y=1}^{p}h_{y}p(y\mid\delta=1,z)=0.

Therefore, the model is unidentifiable.

Proof of Theorem 3.5.

We consider when yy is continuous because when yy is discrete, we just need to change the integral to summation. To simplify the discussion, we consider the case where 𝒮y=\mathcal{S}_{y}=\mathbb{R}. Let 𝒖\bm{u} be a fixed value. Because hh and gg are injective functions, it is sufficient to prove the case where α:=h(𝒖;𝜶)\alpha:=h(\bm{u};\bm{\alpha}) and β:=g(𝒖;𝜷)\beta:=g(\bm{u};\bm{\beta}). Therefore, our goal is to prove

p(y𝒙,δ=1;𝜸)p(y𝒙,δ=1;𝜸)Ψ{α+βm(y)}1𝑑y=p(y𝒙,δ=1;𝜸)p(y𝒙,δ=1;𝜸)Ψ{α+βm(y)}1𝑑y,\displaystyle\frac{p(y\mid\bm{x},\delta=1;\bm{\gamma})}{\int p(y\mid\bm{x},\delta=1;\bm{\gamma})\Psi\{\alpha+\beta m(y)\}^{-1}dy}=\frac{p(y\mid\bm{x},\delta=1;\bm{\gamma^{\prime}})}{\int p(y\mid\bm{x},\delta=1;\bm{\gamma^{\prime}})\Psi\{\alpha^{\prime}+\beta^{\prime}m(y)\}^{-1}dy},

implies α=α\alpha=\alpha^{\prime}, β=β\beta=\beta^{\prime} and 𝜸=𝜸\bm{\gamma}=\bm{\gamma^{\prime}}. Integrating both sides of the above equation with respect to yy yields the equality of the denominator. Thus, we have p(y𝒙,δ=1;𝜸)=p(y𝒙,δ=1;𝜸)p(y\mid\bm{x},\delta=1;\bm{\gamma})=p(y\mid\bm{x},\delta=1;\bm{\gamma^{\prime}}); this implies 𝜸=𝜸\bm{\gamma}=\bm{\gamma^{\prime}} by (C4).

Next, we consider the identification of β\beta. Taking 𝒛1\bm{z}_{1} and 𝒛2\bm{z}_{2} such that they satisfy (C5), we show that

p(y𝒖,𝒛1,δ=1;𝜸)Ψ{α+βm(y)}𝑑y\displaystyle\int\frac{p(y\mid\bm{u},\bm{z}_{1},\delta=1;\bm{\gamma})}{\Psi\{\alpha+\beta m(y)\}}dy =p(y𝒖,𝒛1,δ=1;𝜸)Ψ{α+βm(y)}𝑑y,\displaystyle=\int\frac{p(y\mid\bm{u},\bm{z}_{1},\delta=1;\bm{\gamma})}{\Psi\{\alpha^{\prime}+\beta^{\prime}m(y)\}}dy, (17)
p(y𝒖,𝒛2,δ=1;𝜸)Ψ{α+βm(y)}𝑑y\displaystyle\int\frac{p(y\mid\bm{u},\bm{z}_{2},\delta=1;\bm{\gamma})}{\Psi\{\alpha+\beta m(y)\}}dy =p(y𝒖,𝒛2,δ=1;𝜸)Ψ{α+βm(y)}𝑑y,\displaystyle=\int\frac{p(y\mid\bm{u},\bm{z}_{2},\delta=1;\bm{\gamma})}{\Psi\{\alpha^{\prime}+\beta^{\prime}m(y)\}}dy, (18)

implies β=β\beta=\beta^{\prime}. It follows from (17) and (18) that

K(y;α,α,β,β)p(y𝒖,𝒛1,δ=1;𝜸)𝑑y\displaystyle\int K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})p(y\mid\bm{u},\bm{z}_{1},\delta=1;\bm{\gamma})dy
=K(y;α,α,β,β)p(y𝒖,𝒛2,δ=1;𝜸)𝑑y=0,\displaystyle=\int K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})p(y\mid\bm{u},\bm{z}_{2},\delta=1;\bm{\gamma})dy=0, (19)

where K(y;α,α,β,β)=Ψ1{α+βm(y)}Ψ1{α+βm(y)}K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})=\Psi^{-1}\{\alpha+\beta m(y)\}-\Psi^{-1}\{\alpha^{\prime}+\beta^{\prime}m(y)\}. It remains to show that (19) implies β=β\beta=\beta^{\prime} in the following two steps:

Step I\mathrm{I}. We prove that the function K(y;α,α,β,β)K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime}) has a single change of sign when ββ\beta\neq\beta^{\prime}. Assume that ββ\beta\neq\beta^{\prime}. The equation K(y;α,α,β,β)=0K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})=0 has only one solution y𝒮yy^{*}\in\mathcal{S}_{y} satisfying m(y)=(αα)/(ββ)m(y^{*})=(\alpha-\alpha^{\prime})/(\beta^{\prime}-\beta) because of the injectivity of the function m()m(\cdot) and Ψ()\Psi(\cdot). This implies K(y)K(y) has a single change of sign.

Step II\mathrm{I}\mathrm{I}. We prove that the equation (19) does not hold when β=β\beta=\beta^{\prime}. Without loss of generality, by Step I\mathrm{I}, we consider a case where K(y;α,α,β,β)<0(y<y)K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})<0\ (y<y^{*}) and K(y;α,α,β,β)>0(y>y)K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})>0\ (y>y^{*}), and p(y𝒖,𝒛2,δ=1)/p(y𝒖,𝒛1,δ=1)p(y\mid\bm{u},\bm{z}_{2},\delta=1)/p(y\mid\bm{u},\bm{z}_{1},\delta=1) is monotone increasing. Let cc be the upper bound of the density ratio

c:=supy<yp(y𝒖,𝒛2,δ=1)p(y𝒖,𝒛1,δ=1).\displaystyle c:=\sup_{y<y^{*}}\frac{p(y\mid\bm{u},\bm{z}_{2},\delta=1)}{p(y\mid\bm{u},\bm{z}_{1},\delta=1)}.

By a property on K(y;α,α,β,β)K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime}) shown in (19), we have

0\displaystyle 0 =K(y;α,α,β,β)p(y𝒖,𝒛2,δ=1)𝑑y\displaystyle=\int K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})p(y\mid\bm{u},\bm{z}_{2},\delta=1)dy
=yK(y;α,α,β,β)p(y𝒖,𝒛2,δ=1)p(y𝒖,𝒛1,δ=1)p(y𝒖,𝒛1,δ=1)𝑑y\displaystyle=\int_{-\infty}^{y^{*}}K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})\frac{p(y\mid\bm{u},\bm{z}_{2},\delta=1)}{p(y\mid\bm{u},\bm{z}_{1},\delta=1)}p(y\mid\bm{u},\bm{z}_{1},\delta=1)dy
+yK(y;α,α,β,β)p(y𝒖,𝒛2,δ=1)p(y𝒖,𝒛1,δ=1)p(y𝒖,𝒛1,δ=1)𝑑y\displaystyle\hskip 20.00003pt+\int_{y^{*}}^{\infty}K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})\frac{p(y\mid\bm{u},\bm{z}_{2},\delta=1)}{p(y\mid\bm{u},\bm{z}_{1},\delta=1)}p(y\mid\bm{u},\bm{z}_{1},\delta=1)dy
ycK(y;α,α,β,β)p(y𝒖,𝒛1,δ=1)𝑑y+ycK(y;α,α,β,β)p(y𝒖,𝒛1,δ=1)𝑑y\displaystyle\geq\int_{-\infty}^{y^{*}}cK(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})p(y\mid\bm{u},\bm{z}_{1},\delta=1)dy+\int_{y^{*}}^{\infty}cK(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})p(y\mid\bm{u},\bm{z}_{1},\delta=1)dy
=cK(y;α,α,β,β)p(y𝒖,𝒛1,δ=1)𝑑y=0,\displaystyle=c\int K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})p(y\mid\bm{u},\bm{z}_{1},\delta=1)dy=0,

where the inequality follows from the definition of cc. This results in the density ratio p(y𝒖,𝒛2,δ=1)/p(y𝒖,𝒛1,δ=1)p(y\mid\bm{u},\bm{z}_{2},\delta=1)/p(y\mid\bm{u},\bm{z}_{1},\delta=1) being a constant on 𝒮y\mathcal{S}_{y}, hence, p(y𝒖,𝒛2,δ=1)=p(y𝒖,𝒛1,δ=1)p(y\mid\bm{u},\bm{z}_{2},\delta=1)=p(y\mid\bm{u},\bm{z}_{1},\delta=1) on 𝒮y\mathcal{S}_{y}. This contradicts with (C5), thus β=β\beta=\beta^{\prime}.

Finally, from the strict monotonicity of Ψ\Psi, it follows that the integration

p(y𝒖,𝒛1,δ=1;𝜸)Ψ{α+βm(y)}𝑑y,\displaystyle\int\frac{p(y\mid\bm{u},\bm{z}_{1},\delta=1;\bm{\gamma})}{\Psi\{\alpha+\beta m(y)\}}dy,

is injective with respect to α\alpha. Therefore, equation (17) implies that α=α\alpha=\alpha^{\prime}.

Proof of Proposition 3.8.

It follows from the assumption (4) that there exist M,C>0M,C>0 such that

p(y𝒙,δ=1;𝜸)Ψ{h(𝒖;𝜶)+g(𝒖;𝜷)m(y)}𝑑y\displaystyle\int\frac{p(y\mid\bm{x},\delta=1;\bm{\gamma})}{\Psi\{h(\bm{u};\bm{\alpha})+g(\bm{u};\bm{\beta})m(y)\}}dy
exp{12(yh(𝒖;𝜶)βμ(𝒙,𝜿))2β2σ2}1Ψ(y)exp(|y|s)exp(|y|s)𝑑y\displaystyle\propto\int_{-\infty}^{\infty}\exp\left\{-\frac{1}{2}\frac{(y-h(\bm{u};\bm{\alpha})-\beta\mu(\bm{x},\bm{\kappa}))^{2}}{\beta^{2}\sigma^{2}}\right\}\frac{1}{\Psi(y)\exp(|y|^{s})}\exp(|y|^{s})dy
Mexp{12(yh(𝒖;𝜶)βμ(𝒙,𝜿))2β2σ2}Cexp(|y|s)𝑑y+C<,\displaystyle\leq\int_{-\infty}^{-M}\exp\left\{-\frac{1}{2}\frac{(y-h(\bm{u};\bm{\alpha})-\beta\mu(\bm{x},\bm{\kappa}))^{2}}{\beta^{2}\sigma^{2}}\right\}C\exp(|y|^{s})dy+C<\infty,

where 0<s<20<s<2. The first and the second terms of the last equation hold by the condition (4) and the increasing assumption of Ψ\Psi, respectively. ∎