Verifiable identification condition for nonignorable nonresponse data with categorical instrumental variables

\nameKenji Beppu^a and Kosuke Morikawa^a Email: [email protected] ^aGraduate School of Engineering Science, Osaka University, Osaka, Japan

Abstract

We consider a model identification problem in which an outcome variable contains nonignorable missing values. Statistical inference requires a guarantee of the model identifiability to obtain estimators enjoying theoretically reasonable properties such as consistency and asymptotic normality. Recently, instrumental or shadow variables, combined with the completeness condition in the outcome model, have been highlighted to make a model identifiable. In this paper, we elucidate the relationship between the completeness condition and model identifiability when the instrumental variable is categorical. We first show that when both the outcome and instrumental variables are categorical, the two conditions are equivalent. However, when one of the outcome and instrumental variables is continuous, the completeness condition may not necessarily hold, even for simple models. Consequently, we provide a sufficient condition that guarantees the identifiability of models exhibiting a monotone-likelihood property, a condition particularly useful in instances where establishing the completeness condition poses significant challenges. Using observed data, we demonstrate that the proposed conditions are easy to check for many practical models and outline their usefulness in numerical experiments and real data analysis.

keywords:

missing not at random; nonignorable missingness; identification; instrumental variable; exponential family

1 Introduction

There has been a rapidly growing movement to utilize all the available data that may explicitly, even implicitly, contain missing values, such as causal inference (Imbens and Rubin, 2015) and data integration (Yang and Kim, 2020; Hu et al., 2022). For such datasets, appropriate analysis of missing data is indispensable to correct selection bias owing to the missingness. In recent years, analysis of missing data under missing at random (MAR) assumption (Little and Rubin, 2019) has gradually matured (Robins et al., 1994; Kim and Shao, 2021). Although model identifiability is one of the most fundamental conditions in constructing the asymptotic theory, removing the MAR assumption makes statistical inference drastically difficult, especially in model identification (Miao et al., 2016). Estimation with unidentifiable models may provide multiple solutions that have exactly the same model fitting. Several researchers have considered giving sufficient conditions for the model identification under missing not at random (MNAR).

Constructing observed likelihood consists of two distributions: (R) response mechanism and (O) outcome distribution (Kim and Shao, 2021). Miao et al. (2016) considered identification condition with Logistic, Probit, and Robit (cumulative distribution function of $t$ -distribution) models for (R) and normal and $t$ (mixture) distributions for (O). Cui et al. (2017) assumed Logistic, Probit, and cLog-log models for (R) and the generalized linear models for (O). These studies depend heavily on the model specification of both (R) and (O). Wang et al. (2014) introduced a covariate called instrument or shadow variable and demonstrated that the use of the instrument could considerably relax conditions on (R) and (O). For example, (O) requires only the monotone-likelihood property, which includes a variety of models, such as the generalized linear model. Tang et al. (2003) and Miao and Tchetgen (2018) derived conditions for model identifiability without postulating any assumptions on (R) with the help of the instrument. Miao et al. (2019) further relaxed the assumption under an assumption referred to as the completeness condition on (R) (D’Haultfœuille, 2010, 2011). For example, the generalized linear model with continuous covariates satisfies the completeness condition. To the best of our knowledge, this combination of an instrument on (R) and completeness on (O) is the most general condition for model identification and has been accepted in numerous studies (Zhao and Ma, 2022; Yang et al., 2019).

Generally, assumptions on (O) rely on the distribution of the complete data, which is untestable from observed data. Recently, modeling (O’) the observed or respondents’ outcome model, instead of (O), has been used to relax the subjective assumption (Miao et al., 2019; Riddles et al., 2016). However, the observed likelihood with (R) and (O’) involves an integration that makes the identification problem intractable. Morikawa and Kim (2021) and Beppu et al. (2021) established that the integration can be computed explicitly with Logistic models for (R) and generalized linear models for (O’) and derived identification condition. For general response mechanisms and respondents’ outcome distributions, the model identification remains an open question. Furthermore, when the instrument is categorical such as smoking history and sex, the completeness condition is not available. For example, Ibrahim et al. (2001) considered a study on the mental health of children in Connecticut and used the parents’ report of the psychopathology of the child as the binary instrument.

In this paper, we consider an identification problem with an instrument for (R) and (O’) that satisfies the monotone-likelihood ratio property. Note that although our model setup is similar to Wang et al. (2014), we can check the validity of (O’) with observed data, for example, by using the information criteria such as AIC and BIC. Furthermore, we can use semiparametric/nonparametric methods for modeling both (O’) and (R).

The rest of this paper is organized as follows. Section 2 introduces the notation and defines model identifiability. Section 3 derives the proposed identification condition. We demonstrate the effects of identifiability via a limited numerical study in Section 4. Moreover, application to real data is presented in Section 5. Finally, concluding remarks are summarized in Section 6. All the technical proofs are relegated to the Appendix.

2 Basic setup

2.1 Observed likelihood

Let $\{\bm{x}_{i},y_{i},\delta_{i}\}_{i=1}^{n}$ be independent and identically distributed samples from a distribution of $(\bm{x},y,\delta)$ , where $\bm{x}$ is a fully observed covariate vector, $y$ is an outcome variable subject to missingness, and $\delta$ is a response indicator of $y$ being $1(0)$ if $y$ is observed (missing). We use the generic notation $p(\cdot)$ and $p(\cdot\mid\cdot)$ for the marginal density and conditional density, respectively. For example, $p(\bm{x})$ is the marginal density of $\bm{x}$ , and $p(y\mid\bm{x})$ is the conditional density of $y$ given $\bm{x}$ . We model the MNAR response mechanism $P(\delta=1\mid\bm{x},y)$ and consider its identification. The observed likelihood is defined as

\displaystyle\prod_{i:\delta_{i}=1}P(\delta_{i}=1\mid y_{i},\bm{x}_{i})p(y_{i}\mid\bm{x}_{i})\prod_{i:\delta_{i}=0}\int\left\{1-P(\delta_{i}=1\mid y,\bm{x}_{i})\right\}p(y\mid\bm{x}_{i})dy.

(1)

We say that this model is identifiable if parameters in (1) are identified, which is equivalent to parameters in $P(\delta=1\mid y,\bm{x})p(y\mid\bm{x})$ being identified. This identification condition is essential even for semiparametric models such as an estimator defined by moment conditions (Morikawa and Kim, 2021). However, simple models can be easily unidentifiable. For example, Example 1 in Wang et al. (2014) presented an unidentifiable model when the outcome model is normal, and the response mechanism is a Logistic model.

There is an alternative way to express the relationship between $y$ and $\bm{x}$ . A disadvantage of modeling $p(y\mid\bm{x})$ is its subjective assumption on the distribution of complete data, not of observed data. In other words, if we made assumptions about $p(y\mid\bm{x})$ and ensured its identifiability, we could not verify the assumptions using the observed data. By contrast, this issue can be overcome by modeling $p(y\mid\bm{x},\delta=1)$ because $p(y\mid\bm{x},\delta=1)$ is the outcome model for the observed data, and we can check its validity using ordinal information criteria such as AIC and BIC. Therefore, we model $p(y\mid\bm{x},\delta=1)$ and consider the identification condition in Section 3. Hereafter, we assume two parametric models $p(y\mid\bm{x},\delta=1;\bm{\gamma})$ and $P(\delta=1\mid\bm{x},y;\bm{\phi})$ , where $\bm{\gamma}$ and $\bm{\phi}$ are parameters of the outcome and response models, respectively. Although our method requires two parametric models, the class of identifiable models is very large. For example, it can include semiparametric outcome models for $p(y\mid\bm{x},\delta=1;\bm{\gamma})$ and general response models $P(\delta=1\mid\bm{x},y;\bm{\phi})$ other than Logistic models, as discussed in Example 3.7.

2.2 Estimation

We present a procedure of parameter estimation based on parametric models of $p(y\mid\bm{x},\delta=1;\bm{\gamma})$ and $P(\delta=1\mid\bm{x},y;\bm{\phi})$ . Let $\hat{\bm{\gamma}}$ be the maximum likelihood estimator of $\bm{\gamma}$ . The observed likelihood (1) yields to the mean score equation for $\bm{\phi}$ (Kim and Shao, 2021):

\displaystyle\sum_{i=1}^{n}\left\{\delta_{i}\frac{\partial\log\pi(\bm{x}_{i},y_{i};\bm{\phi})}{\bm{\phi}}-(1-\delta_{i})\frac{\int\partial\pi(\bm{x}_{i},y;\bm{\phi})/\partial\bm{\phi}\cdot p(y\mid\bm{x})dy}{\int\{1-\pi(\bm{x}_{i},y;\bm{\phi})\}p(y\mid\bm{x})dy}\right\}=0

where $\pi(\bm{x},y;\bm{\phi})=P(\delta=1\mid\bm{x},y;\bm{\phi})$ . By using Bayes’ formula $p(y\mid\bm{x})\propto p(y\mid\bm{x},\delta=1)/\pi(\bm{x},y;\bm{\phi})$ , the mean score can be written as

\displaystyle\sum_{i=1}^{n}\left\{\delta_{i}s_{1}(\bm{x}_{i},y_{i};\bm{\phi})+(1-\delta_{i})s_{0}(\bm{x}_{i};\bm{\phi})\right\}=0,

where

\displaystyle s_{1}(\bm{x},y;\bm{\phi})=\frac{\partial\log\pi(\bm{x},y;\bm{\phi})}{\partial\bm{\phi}},\ s_{0}(\bm{x};\bm{\phi})=-\frac{\int s_{1}(\bm{x},y;\bm{\phi})p(y\mid\bm{x},\delta=1)dy}{\int\left\{1/\pi(\bm{x},y;\bm{\phi})-1\right\}p(y\mid\bm{x},\delta=1)dy}.

To compute the two integrations in $s_{0}(\cdot)$ , we can use the fractional imputation (Kim, 2011). As described in Riddles et al. (2016), the EM algorithm is also applicable.

3 Identifiability

3.1 Definition of identification

Recall that the identification condition in (1) is for parameters in $P(\delta=1\mid y,\bm{x})p(y\mid\bm{x})$ . As seen in Section 2.2, the conditional density $p(y\mid\bm{x})$ is represented by $p(y\mid\bm{x},\delta=1;\bm{\gamma})$ and $P(\delta=1\mid\bm{x},y;\bm{\alpha},\bm{\phi})$ by Bayes’ formula. Thus, using the formula, identification with these models changes to parameters in $\varphi(y,\bm{x};\bm{\phi},\bm{\gamma})$ , where

\displaystyle\varphi(y,\bm{x};\bm{\phi},\bm{\gamma})=\frac{p(y\mid\bm{x},\delta=1;\bm{\gamma})}{\int p(y\mid\bm{x},\delta=1;\bm{\gamma})/\pi(\bm{x},y;\bm{\phi})dy}.

(2)

Strictly speaking, the identification condition is $\varphi(y,\bm{x};\bm{\phi},\bm{\gamma})=\varphi(y,\bm{x};\bm{\phi}^{\prime},\bm{\gamma}^{\prime})$ with probability $1$ implies that $(\bm{\phi}^{\top},\bm{\gamma}^{\top})=({\bm{\phi}^{\prime}}^{\top},{\bm{\gamma}^{\prime}}^{\top})$ . Generally, the integral in the denominator of (2) does not have the closed form, which makes deriving a sufficient condition for the identifiability quite challenging. Morikawa and Kim (2021) identified a combination of Logistic models and normal distributions for response and outcome models has a closed form of the integration and derived a sufficient condition for the model identifiability. Beppu et al. (2021) extended the model to a case where the outcome model belongs to the exponential family while the response model is still a Logistic model. However, when the response mechanism is general, simple outcome models such as normal distribution can be unidentifiable.

Example 3.1.

Suppose that the respondents’ outcome model is $y\mid(\delta=1,x)\sim N(\gamma_{0}+\gamma_{1}x,1)$ , and the response model is $P(\delta=1\mid x,y)=\Psi(\alpha_{0}+\alpha_{1}x+\beta y)$ , where $\Psi$ is a known distribution function such that the integration in (2) exists; then, this model is unidentifiable. For example, different parametrization $(\alpha_{0},\alpha_{1},\beta,\gamma_{0},\gamma_{1})=(0,1,1,0,1)$ , $(\alpha_{0}^{\prime},\alpha_{1}^{\prime},\beta^{\prime},\gamma_{0}^{\prime},\gamma_{1}^{\prime})=(0,3,-1,0,1)$ yields the same value of the observed likelihood.

Recently, widely applicable sufficient conditions have been proposed. Assume that a covariate $\bm{x}$ has two components, $\bm{x}=(\bm{u}^{\top},\bm{z}^{\top})^{\top}$ , such that

(C1)

$\bm{z}\perp\!\!\!\perp\delta\mid(\bm{u},y)$ and $\bm{z}\not\perp\!\!\!\perp y\mid(\delta=1,\bm{u}).$

The covariate $\bm{z}$ is called an instrument (D’Haultfœuille, 2010) or a shadow variable (Miao and Tchetgen Tchetgen, 2016). Miao et al. (2019) derived sufficient conditions for model identifiability by combining the instrument and the completeness condition:

(C2)

For all square-integrable function $h(\bm{u},y)$ , $E[h(\bm{u},y)\mid\delta=1,\bm{u},\bm{z}]=0$ almost surely implies $h(\bm{u},y)=0$ almost surely.

Lemma 3.2 (Identification condition by Miao et al. (2019)).

Under the conditions (C1) and (C2), the joint distribution $p(y,\bm{u},\bm{z},\delta)$ is identifiable.

Although the completeness condition is useful and applicable for general models, a simple model with a categorical instrument does not hold the completeness condition.

Example 3.3 (Violating completeness with categorical instrument).

Suppose $y\mid(\delta=1,u,z)$ follows the normal distribution $N(u+z,1)$ , and an instrument $z$ is binary taking $0$ or $1$ . This distribution does not satisfy the completeness condition because the conditional expectation $E[h(u,y)\mid\delta=1,u,z]=0$ when $h(u,y)=1+y-u-(y-u)^{2}$ .

A vital implication of Example 3.3 is that instruments are no longer evidence of model identification when the instrument is categorical. Developing the identification condition for models with discrete instruments is important in applications (Ibrahim et al., 2001). We separately discuss two cases: (i) both $y$ and $\bm{z}$ are categorical; (ii) respondents’ outcome model has the monotone-likelihood ratio property.

When all variables, $y$ and $\bm{z}$ , are categorical, the model can be fully nonparametric. Theorem 3.4 demonstrates that, under these conditions, the completeness and identifiability conditions are equivalent. See Appendix 2 in Riddles et al. (2016) for the estimation of such fully nonparametric models.

Theorem 3.4.

When both $y$ and $\bm{z}$ are categorical, under condition (C1), the joint distribution $p(y,\bm{u},\bm{z},\delta)$ is identifiable if and only if condition (C2) holds.

As evidenced in Lemma 3.2, condition (C2) is generally sufficient for model identifiability, but Theorem 3.4 also reveals that it is necessary when $y$ and $\bm{z}$ are categorical.

Next, we consider the identification condition for the other case (ii). Let $\mathcal{S}_{y}$ be the support of the random variable $y$ . We assume the following four conditions:

(C3)

The response mechanism is

\displaystyle P(\delta=1\mid y,\bm{x};\bm{\phi})=P(\delta=1\mid y,\bm{u};\bm{\phi})=\Psi\{h(\bm{u};\bm{\alpha})+g(\bm{u};\bm{\beta})m(y)\},

(3)

where $\bm{\phi}=(\bm{\alpha}^{\top},\bm{\beta})^{\top}$ , $m:\mathcal{S}_{y}\to\mathbb{R}$ and $\Psi:\mathbb{R}\to(0,1]$ are known continuous strictly monotone functions, and $h(\bm{u};\bm{\alpha})$ and $g(\bm{u};\bm{\beta})$ are known injective functions of $\bm{\alpha}$ and $\bm{\beta}$ , respectively.

(C4)

The density or mass function $p(y\mid\bm{x},\delta=1;\bm{\gamma})$ is identifiable, and its support does not depend on $\bm{x}$ .
(C5)

For all $\bm{u}\in\mathcal{S}_{\bm{u}}$ , there exist $\bm{z}_{1}$ and $\bm{z}_{2}$ , such that $p(y\mid\bm{u},\bm{z}_{1},\delta=1)\neq p(y\mid\bm{u},\bm{z}_{2},\delta=1)$ , and $p(y\mid\bm{u},\bm{z}_{1},\delta=1)/p(y\mid\bm{u},\bm{z}_{2},\delta=1)$ is monotone.

(C6)

\displaystyle\int\frac{p(y\mid\bm{x},\delta=1;\bm{\gamma})}{\Psi\{h(\bm{u};\bm{\alpha})+g(\bm{u};\bm{\beta})m(y)\}}dy<\infty\ \ \ \mathrm{a.s.}

The condition (C3) means that the random variable $\bm{z}$ plays a role of an instrument. The condition (C4) is the identifiability of $p(y\mid\bm{x},\delta=1;\bm{\gamma})$ , which is testable from the observed data. The condition (C5) assumes a monotone-likelihood property on the outcome model, which was also used in Wang et al. (2014) for the complete data. The condition (C6) is necessary for (1) to be well-defined. It is essentially the same condition as Theorem 3.1 (I1) of Morikawa and Kim (2021). This condition is always true when the support of $y$ is finite. However, it must be carefully verified when $y$ is continuous. See Proposition 3.8 below for useful sufficient conditions when the respondents’ outcome model is normal distribution.

Under conditions (C3)–(C6), we obtain the desired identification condition.

Theorem 3.5.

The parameter $(\bm{\phi}^{\top},\bm{\gamma}^{\top})^{\top}$ is identifiable if the conditions (C1) and (C3)–(C6) hold.

We provide an example of outcome models satisfying the condition (C5).

Example 3.6 (Model satisfying (C5)).

Let density functions in the exponential family be

\displaystyle p(y\mid\bm{x},\delta=1;\bm{\gamma})=\exp\left(\frac{y\theta-b(\theta)}{\tau}+c(y;\tau)\right),

where $\theta=\theta(\eta)$ , $\eta=\sum_{l=1}^{L}\eta_{l}(\bm{x})\kappa_{l}$ , $\bm{\kappa}=(\kappa_{1},\ldots,\kappa_{L})^{\top}$ , and $\bm{\gamma}=(\tau,\bm{\kappa}^{\top})^{\top}$ . Then the density ratio becomes

\displaystyle\frac{p(y\mid\bm{u},\bm{z}_{1},\delta=1)}{p(y\mid\bm{u},\bm{z}_{2},\delta=1)}\propto\exp\left(\frac{\theta_{1}-\theta_{2}}{\tau}y\right),

where $\bm{x}_{i}=(\bm{u},\bm{z}_{i})$ and $\theta_{i}=\theta\{\sum_{l=1}^{L}\eta_{l}(\bm{x}_{i})\kappa_{l}\},\ i=1,2$ . Therefore, the density ratio is monotone.

Example 3.7 (Model satisfying (C6)).

In application, it is often reasonable to assume a normal distribution on the respondents’ outcome model. Focusing on the tail of the outcome model, we provide a sufficient condition to check (C6) for models with general response mechanisms.

Proposition 3.8.

Suppose that the observed distribution $p(y\mid\bm{x},\delta=1)$ is normal distribution $N(\mu(\bm{x};\bm{\kappa}),\sigma^{2})$ , the response mechanism is (3) with $m(y)=y$ and $g(\bm{u};\bm{\beta})=\beta$ , and the strictly monotone increasing function $\Psi$ meets the following condition:

{}^{\exists}s\in(0,2)\ \mathrm{s.t.}\ \liminf_{z\to-\infty}\Psi(z)\exp(|z|^{s})>0.

(4)

Then, this model satisfies (C6).

The condition (4) is easy to check. For example, it holds for Logistic and Robit functions but not for the Probit function. According to Proposition 3.8, it is possible to estimate $\mu(x;\bm{\kappa})$ with observed data using splines and other nonparametric methods, which allows us to use very flexible models. Furthermore, we can also estimate the response mechanism using nonparametric methods because it does not impose any restrictions on the functional form of $h(\bm{u};\bm{\alpha})$ .

4 Numerical experiment

We present the effects of identifiability in numerical experiments by comparing weak and strong identifiable models. We prepared four Scenarios S1–S4:

S1:

(Outcome: Normal, Response: Logistic)
$[y\mid u,z,\delta=1]\sim N(\kappa_{0}+\kappa_{1}u+\kappa_{2}z,\sigma^{2})$ , $\text{logit}\{P(\delta=1\mid u,y;\bm{\alpha},\beta)\}=\alpha_{0}+\alpha_{1}u+\beta y$ , $u\sim N(0,1^{2})$ , and $z\sim B(1,0.5)$ , where $(\kappa_{0},\kappa_{1},\sigma^{2})^{\top}=(0.3,0.4,1/{\sqrt{2}^{2}})^{\top}$ and $(\alpha_{0},\alpha_{1},\beta)^{\top}=(0.7,-0.2,0.29)^{\top}$ .
S2:

(Outcome: Normal, Response: Cauchy)
$[y\mid u,z,\delta=1]\sim N(\kappa_{0}+\kappa_{1}u+\kappa_{2}z,\sigma^{2})$ , $P(\delta=1\mid u,y;\bm{\alpha},\beta)=\Psi(\alpha_{0}+\alpha_{1}u+\beta y)$ , $u\sim\mathrm{Unif}(-1,1)$ , and $z\sim B(1,0.7)$ , where $(\kappa_{0},\kappa_{1},\sigma^{2})^{\top}=(-0.36,0.59,1/{\sqrt{2}^{2}})^{\top}$ , $(\alpha_{0},\alpha_{1},\beta)^{\top}=(0.24,-0.1,0.42)^{\top}$ , and $\Psi$ is the cumulative distribution function of the Cauchy distribution.
S3:

(Outcome: Bernoulli, Response: Probit)
$[y\mid u,z,\delta=1]\sim B(1,p(u,z;\bm{\kappa})\})$ , $P(\delta=1\mid u,y;\bm{\alpha},\beta)=\Psi(\alpha_{0}+\alpha_{1}u+\beta y)$ , $u\sim N(0,1^{2})$ , and $z\sim N(0,1^{2})$ , where $p(u,z;\bm{\kappa})=1/\{1+\exp(-\kappa_{0}-\kappa_{1}u-\kappa_{2}z)$ , $(\kappa_{0},\kappa_{1},\kappa_{2})^{\top}=(-0.21,3.8,1.0)^{\top}$ , $(\alpha_{0},\alpha_{1},\beta)^{\top}=(0.4,0.39,0.3)^{\top}$ , and $\Psi$ is the cumulative distribution function of the standard normal.
S4:

(Outcome: Normal+nonlinear mean structure, Response: Cauchy or Logistic)
$[y\mid u,z,\delta=1]\sim N(\mu(\bm{x}),0.5^{2})$ , $P(\delta=1\mid u,y;\bm{\alpha},\beta)=\Psi(\alpha_{0}+\alpha_{1}u+\beta y)$ , $u\sim\mathrm{Unif}(-1,1)$ , and $z\sim B(1,0.5)$ , where $\mu(\bm{x})=z+\cos(2\pi u)+\exp(z+u)$ , $(\alpha_{0},\alpha_{1},\beta)^{\top}=(0.1,-0.2,0.3)^{\top}$ , and $\Psi$ is the cumulative distribution function of the Cauchy or Logistic distribution.

In S1 and S2, the strength of the identification can be adjusted by changing the parameter $\kappa_{2}$ because $\kappa_{2}=0$ indicates that the model is unidentifiable by Example 3.1. On the other hand, we can verify that the models in S3 and S4 are identifiable by Theorem 3.5. For example, in S4, we can see that checking (C3) and (C4) is straightforward to the setting, while (C5) and (C6) hold from Example 3.6 and Proposition 3.8, respectively. From S3 and S4, we can confirm the successful inference even in the case of discrete outcome and complex mean structures, respectively.

We generated 1,000 independent Monte Carlo samples and computed two estimators for $E[y]$ and $\beta$ with two methods: fractional imputation (FI) and complete case (CC) estimators, which use only completely observed data. The estimator for $E[y]$ is computed by the standard inverse probability weighting method with estimated response models (Riddles et al., 2016). We used correctly specified models for Scenarios S1–S3 but used nonparametric models for Scenario S4 because it is unrealistic to assume that the complicated mean structure is known. The R package ‘crs’ specialized in nonparametric spline regression on the mixture of categorical and continuous covariates (Nie and Racine, 2012) is used to estimate the respondents’ outcome model. Response models are estimated by using the method discussed in Section 2.2.

Bias, root mean squared error (RMSE), and coverage rate for 95% confidence intervals in S1–S4 are reported in Table 1. In all the Scenarios, CC estimators have a significant bias, and the coverage rates are far from 95%, while FI estimators work well when the model is surely identifiable. When $\kappa_{2}$ is small in S1 and S2, the performance of variance estimation with FI is poor, as expected, although that of point estimates is acceptable. The results in S4 indicate that the model is identifiable even if we use a nonparametric mean structure, and the estimates are almost the same between the two response models.

Table 1: Results of S1–S4: Bias, root mean square error (RMSE), and coverage rate (CR,%) with

95\%

confidence interval are reported. CC: complete case; FI: fractional imputation.

Scenario	Parameter	$\kappa_{2}$	Method	Bias	RMSE	CR
		1.0	CC	0.053	0.066	73.5
			FI	0.000	0.043	95.4
	$E[y]$	0.5	CC	0.039	0.053	80.9
			FI	-0.001	0.059	97.1
S1		0.1	CC	0.034	0.049	83.0
			FI	0.021	0.136	99.8
		1.0	FI	0.001	0.163	95.2
	$\beta$	0.5	FI	0.003	0.330	98.6
		0.1	FI	-0.146	0.865	100
		1.0	CC	0.146	0.152	5.7
			FI	-0.004	0.051	94.8
	$E[y]$	0.5	CC	0.130	0.136	7.7
			FI	-0.008	0.086	86.2
S2		0.1	CC	0.127	0.133	9.4
			FI	-0.007	0.105	92.4
		1.0	FI	0.008	0.148	95.4
	$\beta$	0.5	FI	0.044	0.365	100
		0.1	FI	0.033	0.448	100
	$E[y]$	–	CC	0.100	0.102	0.3
S3		–	FI	0.001	0.022	95.3
	$\beta$	–	FI	-0.023	0.279	95.0
		–	CC(Logistic)	0.341	0.355	5.4
	$E[y]$	–	FI(Logistic)	0.005	0.079	95.4
		–	CC(Cauchy)	0.296	0.312	10.7
S4		–	FI(Cauchy)	0.007	0.080	94.3
	$\beta$	–	FI(Logistic)	0.006	0.050	94.7
		–	FI(Cauchy)	0.011	0.063	93.8

5 Real data analysis

We analyzed a dataset of 2139 HIV-positive patients enrolled in AIDS Clinical Trials Group Study 175 (ACTG175; Hammer et al. (1996)). In this analysis, we specify 532 patients for analysis who received zidovudine (ZDV) monotherapy. Let each $y$ , $x_{1}$ , and $x_{2}$ be the CD4 cell count at $96\pm 5$ weeks, at the baseline, and at $20\pm 5$ weeks, $x_{3}$ be the CD8 cell count at the baseline, and $z$ be sex. The outcome was subject to missingness with a 60.34% observation rate, while all covariates were observed. To make estimation stable and easy, we standardized all the data. We expect that $z$ (sex) is a reasonable choice for an instrument variable because the information is a biological value, which affects the value of CD4, but has little effect on the response probability.

Patients who are suffering from a mild illness of HIV tend to have higher CD4 cell count; thus, one may consider that missingness of the outcome relates to serious conditions and may expect that the missing value of the outcome would be a lower CD4 cell count than the respondent. We therefore considered five different MNAR response models:

\displaystyle P(\delta=1\mid x_{1},x_{2},x_{3},y)=\Psi(\alpha_{0}+\alpha_{1}x_{1}+\alpha_{2}x_{2}+\alpha_{3}x_{3}+\beta y),

where $\Psi$ represents either the Logistic function or the distribution functions of the Cauchy or $t$ distribution with degrees of freedom $v\,(=2,5,10)$ . Theorem 3.5 and Proposition 3.8 ensure that all the models with these five response models are identifiable, even when the instrumental variable $z$ is discrete. From the above conjecture on missing values, the sign of $\beta$ is expected to be negative. We assumed that the respondent’s outcome is a normal distribution with a nonparametric mean structure and estimated by the ‘crs’ R package as considered in Scenario S4 in Section 4. The residual plots shown in Figure 1 and the computed $R^{2}$ -value $(=0.453)$ signify the assumed distribution on the respondents’ outcome fit well. Table 2 reports the estimated parameters and their estimated standard errors calculated by 1,000 bootstrap samples. The results of the five response models were almost similar. This suggests that the response mechanism is robust to the choice of response models. Although we cannot determine whether it is MNAR or MAR because the estimated standard error for $\beta$ is large, the point estimate is negative, as we expected. This result is consistent with the result in Zhao et al. (2021).

Refer to caption — Figure 1: Residual plots of respondents’ outcome in ACTG175 data.

Table 2: Estimated parameters: Estimates and standard errors for the target parameters are reported. Logistic and Cauchy are Fractional Imputation using Logistic and Cauchy distributions for the response mechanism.

T_{v}

t

distribution function with degrees of freedom

v\,(=2,5,10)

Parameter	Model	Estimate	SE	Parameter	Model	Estimate	SE
	Logistic	0.464	0.104		Logistic	0.125	0.156
	Cauchy	0.417	0.260		Cauchy	0.108	0.139
$\alpha_{0}$	$T_{2}$	0.341	0.081	$\alpha_{1}$	$T_{2}$	0.091	0.113
	$T_{5}$	0.306	0.069		$T_{5}$	0.082	0.102
	$T_{10}$	0.295	0.066		$T_{10}$	0.080	0.099
	Logistic	0.255	0.192		Logistic	0.093	0.107
	Cauchy	0.244	0.207		Cauchy	0.083	0.097
$\alpha_{2}$	$T_{2}$	0.196	0.148	$\alpha_{3}$	$T_{2}$	0.069	0.079
	$T_{5}$	0.169	0.126		$T_{5}$	0.062	0.070
	$T_{10}$	0.160	0.120		$T_{10}$	0.060	0.068
	Logistic	-0.032	0.314		Logistic	276.70	13.476
	Cauchy	-0.030	0.387		Cauchy	276.51	14.107
$\beta$	$T_{2}$	-0.027	0.235	$E[y]$	$T_{2}$	276.57	13.437
	$T_{5}$	-0.021	0.203		$T_{5}$	276.61	13.271
	$T_{10}$	-0.019	0.194		$T_{10}$	276.63	13.217

6 Conclusion

In this paper, we proposed a new identification condition for models using respondents’ outcome and response models. Although our method requires the specification of the two models, the model can be very general with the help of an instrument. As considered in Scenario S4 in Section 4, the mean function in the respondents’ outcome model can be nonparametric, and the response model can be any strictly monotone function, other than Logistic models. Our condition guarantees model identifiability even when instruments are categorical, which is not covered by previous conditions. Another advantage of using our method is the identification condition is easy to verify with observed data.

However, our method has some limitations. First, respondents’ outcome models need to have the monotone-likelihood property by Condition (C5). For example, we cannot deal with mixture models in our framework. Second, the specification of instruments is necessary in advance. To date, some studies on finding the instruments have been proposed (Zhao et al., 2021), but there are still no gold standard methods.

Funding

Research by the second author was supported by MEXT Project for Seismology toward Research Innovation with Data of Earthquake (STAR-E) Grant Number JPJ010217.

References

(1)
Beppu et al. (2021) Beppu, K., Morikawa, K., and Im, J. (2021), “Imputation with verifiable identification condition for nonignorable missing outcomes,” arXiv preprint arXiv:2204.10508, .
Cui et al. (2017) Cui, X., Guo, J., and Yang, G. (2017), “On the identifiability and estimation of generalized linear models with parametric nonignorable missing data mechanism,” Computational Statistics & Data Analysis, 107, 64–80.
D’Haultfœuille (2010) D’Haultfœuille, X. (2010), “A new instrumental method for dealing with endogenous selection,” Journal of Econometrics, 154(1), 1–15.
D’Haultfœuille (2011) D’Haultfœuille, X. (2011), “On the completeness condition in nonparametric instrumental problems,” Econometric Theory, 27(3), 460–471.
Hammer et al. (1996) Hammer, S. M., Katzenstein, D. A., Hughes, M. D., Gundacker, H., Schooley, R. T., Haubrich, R. H., Henry, W. K., Lederman, M. M., Phair, J. P., Niu, M. et al. (1996), “A trial comparing nucleoside monotherapy with combination therapy in HIV-infected adults with CD4 cell counts from 200 to 500 per cubic millimeter,” New England Journal of Medicine, 335(15), 1081–1090.
Hu et al. (2022) Hu, W., Wang, R., Li, W., and Miao, W. (2022), “Paradoxes and resolutions for semiparametric fusion of individual and summary data,” arXiv preprint arXiv:2210.00200, .
Ibrahim et al. (2001) Ibrahim, J. G., Lipsitz, S. R., and Horton, N. (2001), “Using auxiliary data for parameter estimation with non-ignorably missing outcomes,” Journal of the Royal Statistical Society: Series C (Applied Statistics), 50(3), 361–373.
Imbens and Rubin (2015) Imbens, G. W., and Rubin, D. B. (2015), Causal inference in statistics, social, and biomedical sciences Cambridge University Press.
Kim (2011) Kim, J. K. (2011), “Parametric fractional imputation for missing data analysis,” Biometrika, 98(1), 119–132.
Kim and Shao (2021) Kim, J. K., and Shao, J. (2021), Statistical methods for handling incomplete data CRC press.
Little and Rubin (2019) Little, R. J., and Rubin, D. B. (2019), Statistical analysis with missing data, Vol. 793 John Wiley & Sons.
Miao et al. (2016) Miao, W., Ding, P., and Geng, Z. (2016), “Identifiability of normal and normal mixture models with nonignorable missing data,” Journal of the American Statistical Association, 111(516), 1673–1683.
Miao et al. (2019) Miao, W., Liu, L., Tchetgen, E. T., and Geng, Z. (2019), “Identification, doubly robust estimation, and semiparametric efficiency theory of nonignorable missing data with a shadow variable,” arXiv preprint arXiv:1509.02556, .
Miao and Tchetgen (2018) Miao, W., and Tchetgen, E. T. (2018), “Identification and inference with nonignorable missing covariate data,” Statistica Sinica, 28(4), 2049.
Miao and Tchetgen Tchetgen (2016) Miao, W., and Tchetgen Tchetgen, E. J. (2016), “On varieties of doubly robust estimators under missingness not at random with a shadow variable,” Biometrika, 103(2), 475–482.
Morikawa and Kim (2021) Morikawa, K., and Kim, J. K. (2021), “Semiparametric optimal estimation with nonignorable nonresponse data,” The Annals of Statistics, 49(5), 2991–3014.
Nie and Racine (2012) Nie, Z., and Racine, J. S. (2012), “The crs Package: Nonparametric Regression Splines for Continuous and Categorical Predictors.,” R Journal, 4(2).
Riddles et al. (2016) Riddles, M. K., Kim, J. K., and Im, J. (2016), “A propensity-score-adjustment method for nonignorable nonresponse,” Journal of Survey Statistics and Methodology, 4(2), 215–245.
Robins et al. (1994) Robins, J. M., Rotnitzky, A., and Zhao, L. P. (1994), “Estimation of regression coefficients when some regressors are not always observed,” Journal of the American statistical Association, 89(427), 846–866.
Tang et al. (2003) Tang, G., Little, R. J., and Raghunathan, T. E. (2003), “Analysis of multivariate missing data with nonignorable nonresponse,” Biometrika, 90(4), 747–764.
Wang et al. (2014) Wang, S., Shao, J., and Kim, J. K. (2014), “An instrumental variable approach for identification and estimation with nonignorable nonresponse,” Statistica Sinica, 24, 1097–1116.
Yang and Kim (2020) Yang, S., and Kim, J. K. (2020), “Statistical data integration in survey sampling: A review,” Japanese Journal of Statistics and Data Science, 3, 625–650.
Yang et al. (2019) Yang, S., Wang, L., and Ding, P. (2019), “Causal inference with confounders missing not at random,” Biometrika, 106(4), 875–888.
Zhao and Ma (2022) Zhao, J., and Ma, Y. (2022), “A versatile estimation procedure without estimating the nonignorable missingness mechanism,” Journal of the American Statistical Association, 117(540), 1916–1930.
Zhao et al. (2021) Zhao, P., Wang, L., and Shao, J. (2021), “Sufficient dimension reduction and instrument search for data with nonignorable nonresponse,” Bernoulli, 27, 930–945.

Appendix A Technical Proofs

We first provide a technical result to prove Theorem 3.4.

Lemma A.1.

Let $a$ , $b$ , and $c$ be any positive real numbers. Assume that $r_{1}$ and $r_{2}$ are positive real numbers satisfying

\displaystyle-\frac{ab}{a+b}<\frac{r_{1}^{2}-r_{2}^{2}}{r_{1}^{2}r_{2}^{2}}<c.

(5)

Then, there exist $0<\pi^{(k)}_{j}<1\,(j=1,2,3;k=1,2)$ such that

\displaystyle\sum_{j=1}^{3}\pi^{(1)}_{j}=r_{1}^{2},\quad\sum_{j=1}^{3}\pi^{(2)}_{j}=r_{2}^{2},

(6)

and

\displaystyle\begin{split}\frac{1}{\pi^{(1)}_{1}}-\frac{1}{\pi^{(2)}_{1}}=a,\quad\frac{1}{\pi^{(1)}_{2}}-\frac{1}{\pi^{(2)}_{2}}=b,\quad\frac{1}{\pi^{(1)}_{3}}-\frac{1}{\pi^{(2)}_{3}}=-c.\end{split}

(7)

Proof of Lemma A.1.

By using a polar coordinate system, we transform $\pi^{(k)}_{j}\,(j=1,2,3;k=1,2)$ into

	$\displaystyle(\sqrt{\pi^{(1)}_{1}},\sqrt{\pi^{(1)}_{2}},\sqrt{\pi^{(1)}_{3})}$	$\displaystyle=r_{1}(\sin\phi_{1}\cos\phi_{2},\sin\phi_{1}\sin\phi_{2},\cos\phi_{1}),$
	$\displaystyle(\sqrt{\pi^{(2)}_{1}},\sqrt{\pi^{(2)}_{2}},\sqrt{\pi^{(2)}_{3}})$	$\displaystyle=r_{2}(\sin\psi_{1}\cos\psi_{2},\sin\psi_{1}\sin\psi_{2},\cos\psi_{1}),$

where $0<\phi_{1},\phi_{2},\psi_{1},\psi_{2}<\pi/2$ to ensure $\pi^{(k)}_{j}\,(j=1,2,3;k=1,2)$ satisfy (6). It follows from (7) and double-angular formulas that we have

$\displaystyle\begin{split}&r_{1}^{2}(1-\omega_{1})(1+\omega_{2})-r_{2}^{2}(1-\omega_{3})(1+\omega_{4})\\ &=-\frac{ar_{1}^{2}r_{2}^{2}}{4}(1-\omega_{1})(1+\omega_{2})(1-\omega_{3})(1+\omega_{4}),\end{split}$		(8)
$\displaystyle\begin{split}&r_{1}^{2}(1-\omega_{1})(1-\omega_{2})-r_{2}^{2}(1-\omega_{3})(1-\omega_{4})\\ &=-\frac{br_{1}^{2}r_{2}^{2}}{4}(1-\omega_{1})(1-\omega_{2})(1-\omega_{3})(1-\omega_{4}),\end{split}$		(9)
	$\displaystyle r_{1}^{2}(1+\omega_{1})-r_{2}^{2}(1+\omega_{3})=\frac{cr_{1}^{2}r_{2}^{2}}{2}(1+\omega_{1})(1+\omega_{3}),$	(10)

where $\omega_{1}=\cos 2\phi_{1},\omega_{2}=\cos 2\phi_{2},\omega_{3}=\cos 2\psi_{1}$ , and $\omega_{4}=\cos 2\psi_{2}$ . Setting $\omega_{2}=\omega_{4}$ and equations (8) and (9) yield

	$\displaystyle r_{1}^{2}(1-\omega_{1})-r_{2}^{2}(1-\omega_{3})$	$\displaystyle=-\frac{ar_{1}^{2}r_{2}^{2}}{4}(1-\omega_{1})(1+\omega_{2})(1-\omega_{3}),$
	$\displaystyle r_{1}^{2}(1-\omega_{1})-r_{2}^{2}(1-\omega_{3})$	$\displaystyle=-\frac{br_{1}^{2}r_{2}^{2}}{4}(1-\omega_{1})(1-\omega_{2})(1-\omega_{3}).$

Fixing $\omega_{2}=1-2a/(a+b)$ reduces the above equations to the one common equation

\displaystyle r_{1}^{2}(1-\omega_{1})-r_{2}^{2}(1-\omega_{3})=-\frac{r_{1}^{2}r_{2}^{2}ab}{2(a+b)}(1-\omega_{1})(1-\omega_{3}),

(11)

maintaing the condition $-1<\omega_{2}<1$ . It remains to show that there exists $-1<\omega_{3}<1$ satisfying (10) and (11). Solving the equation (11) with respect to $\omega_{1}$ , we have

\displaystyle\omega_{1}=1-\frac{r_{2}^{2}(1-\omega_{3})}{r_{1}^{2}+r_{1}^{2}r_{2}^{2}ab(1-\omega_{3})/\{2(a+b)\}}.

(12)

Substituting (12) into (10) leads to the following quadratic equation with respect to $\omega_{3}$ :

	$\displaystyle f(\omega_{3})=$	$\displaystyle\left(\frac{r_{1}^{2}r_{2}^{4}ab+cr_{1}^{4}r_{2}^{4}ab}{2(a+b)}-\frac{cr_{1}^{2}r_{2}^{4}}{2}\right)\omega_{3}^{2}-\left(\frac{r_{1}^{4}r_{2}^{2}ab}{a+b}+cr_{1}^{4}r_{2}^{2}\right)\omega_{3}$
		$\displaystyle+\left(\frac{r_{1}^{2}r_{2}^{2}ab\left(2r_{1}^{2}-r_{2}^{2}-cr_{1}^{2}r_{2}^{2}\right)}{2(a+b)}+\frac{cr_{1}^{2}r_{2}^{4}}{2}+2r_{1}^{4}-2r_{1}^{2}r_{2}^{2}-cr_{1}-4r_{2}^{2}\right)=0.$

It follows from (5) that

\displaystyle f(1)=r_{1}^{2}\left(2r_{1}^{2}-2r_{2}^{2}-2cr_{1}^{2}r_{2}^{2}\right)<0,\quad f(-1)=2r_{1}^{2}\left(r_{1}^{2}-r_{2}^{2}+\frac{r_{1}^{2}r_{2}^{2}ab}{a+b}\right)>0,

which implies that there is at least one solution of $\omega_{3}$ to the equation $f(\omega_{3})=0$ in the open interval $(-1,1)$ .

∎

Finally, we prove Theorem 3.4 with the help of Lemma A.1.

Proof of Theorem 3.4.

Without loss of generality, we set the value of $\bm{u}$ to a fixed vector because the following proof holds for each $\bm{u}$ . Let the categorical variables $y$ and $z$ take values in $\{1,2,\ldots,p\}$ and $\{1,2,\ldots,q\}$ , respectively. We show that model identifiability implies the completeness condition (C2) by individually addressing three cases: (i) $p=2$ , (ii) $p=3$ , and (iii) $p\geq 4$ because “if” part has been already established by Lemma 3.2.

When $p=2$ , condition (C1) results in the rank of a $q\times 2$ matrix, composed of $p(y=j\mid\delta=1,z=i)$ in its $(i,j)$ -th element ( $i=1,2;j=1\dots,q$ ), being $2$ . Hence, identifiable models always satisfy the completeness condition (C2).

For cases where $p\geq 3$ , we must show that the model becomes unidentifiable when the completeness condition is violated. The breach of the completeness condition indicates the existence of a non-zero vector $(h_{1},\dots,h_{p})$ such that for $z=1,\ldots,q$ , we have

\displaystyle E[h_{y}\mid\delta=1,z]=\sum_{y=1}^{p}h_{y}p(y\mid\delta=1,z)=0.

(13)

The elements in $(h_{1},\cdots,h_{p})$ do not all share the same sign, and multiplying this vector by any constant does not affect the above equation. Recall that the model’s unidentifiability implies that $\pi^{(1)}_{y}\neq\pi^{(2)}_{y}$ exists for some $y\in\{1,\dots,p\}$ , satisfying $\sum_{y=1}^{p}p(y\mid\delta=1,z)/\pi^{(1)}_{y}=\sum_{y=1}^{p}p(y\mid\delta=1,z)/\pi^{(2)}_{y}$ . We now construct an unidentifiable model when the completeness condition is violated.

When $p=3$ , without loss of generality, we assume $h_{1}>0$ , $h_{2}>0$ , and $h_{3}<0$ satisfying the condition $\sum_{y=1}^{3}h_{y}p(y\mid\delta=1,z)=0$ for all $z\in\{1,\dots,q\}$ . Employing Lemma A.1 with $a=h_{1}$ , $b=h_{2}$ , $c=-h_{3}$ , and $r_{1}=r_{2}=1$ , we derive:

\displaystyle\begin{split}\frac{1}{\pi^{(1)}_{1}}-\frac{1}{\pi^{(2)}_{1}}=h_{1},\quad\frac{1}{\pi^{(1)}_{2}}-\frac{1}{\pi^{(2)}_{2}}=h_{2},\quad\frac{1}{\pi^{(1)}_{3}}-\frac{1}{\pi^{(2)}_{3}}=h_{3},\end{split}

where $\sum_{j=1}^{3}\pi^{(1)}_{j}=\sum_{j=1}^{3}\pi^{(2)}_{j}=1$ . Substituting $h_{1}$ , $h_{2}$ , and $h_{3}$ into $\sum_{y=1}^{3}h_{y}p(y\mid\delta=1,z)=0$ shows that the model is unidentifiable.

Lastly, we consider the case of $p\geq 4$ . Suppose $h_{y}\,(y=1,\dots,p)$ satisfies (13). Within $(h_{1},\cdots,h_{p})$ , we select three elements with signs as positive, positive, and negative, respectively, and define them as $a$ , $b$ , and $-c$ where $a,b,c>0$ , and $\lambda$ is set to be sufficiently large to ensure that

\displaystyle\lambda>2\max\left\{\frac{a+b}{ab},~{}\frac{1}{c}\right\}.

(14)

For ease of notation, we denote $(h_{1},\cdots,h_{p})=(h_{1},\cdots,h_{p-3},a,b,-c)$ . The remaining part of the proof is similar when the combination of the signs is negative, negative, and positive. With the selected $\lambda$ , $0<\pi_{y}^{(k)}<1\,(y=1,\ldots,p-3;k=1,2)$ are determined to be sufficiently small to satisfy

	$\displaystyle\left(1-\sum_{y=1}^{p-3}\pi_{y}^{(1)}\right)\left(1-\sum_{y=1}^{p-3}\pi_{y}^{(2)}\right)\geq\frac{1}{2},\quad\sum_{y=1}^{p-3}\pi_{y}^{(1)}<1,\quad\sum_{y=1}^{p-3}\pi_{y}^{(2)}<1,$		(15)
	$\displaystyle\frac{1}{\pi^{(1)}_{y}}-\frac{1}{\pi^{(2)}_{y}}=\lambda h_{y},\quad\mathrm{for}~{}y=1,\ldots,p-3.$

Furthermore, we define $r_{1}$ and $r_{2}$ as

\displaystyle r_{1}^{2}=1-\sum_{y=1}^{p-3}\pi_{y}^{(1)},\ r_{2}^{2}=1-\sum_{y=1}^{p-3}\pi_{y}^{(2)}.

(16)

By determining the variables through these steps, it follows from (14), (15), and (16) that condition (5) with $a=\lambda a$ , $b=\lambda b$ , and $c=\lambda c$ is fulfilled:

	$\displaystyle\frac{r_{1}^{2}-r_{2}^{2}}{r_{1}^{2}r_{2}^{2}}\leq 2(r_{1}^{2}-r_{2}^{2})\leq\frac{2}{c}c<(\lambda c),$
	$\displaystyle-\frac{(\lambda a)(\lambda b)}{(\lambda a)+(\lambda b)}<-\frac{ab}{a+b}\frac{2(a+b)}{ab}=-2r_{1}^{2}r_{2}^{2}\frac{1}{r_{1}^{2}r_{2}^{2}}\leq-\frac{1}{r_{1}^{2}r_{2}^{2}}<\frac{r_{1}^{2}-r_{2}^{2}}{r_{1}^{2}r_{2}^{2}}.$

Therefore, by applying Lemma A.1, we demonstrate that there exist $\pi^{(k)}_{p-2}$ , $\pi^{(k)}_{p-1}$ , and $\pi^{(k)}_{p}\,(k=1,2)$ such that

	$\displaystyle\sum_{y=p-2}^{p}\pi_{y}^{(1)}=r_{1}^{2},\ \sum_{y=p-2}^{p}\pi_{y}^{(2)}=r_{2}^{2},$
	$\displaystyle\frac{1}{\pi_{p-2}^{(1)}}-\frac{1}{\pi_{p-2}^{(2)}}=\lambda a,\quad\frac{1}{\pi_{p-1}^{(1)}}-\frac{1}{\pi_{p-1}^{(2)}}=\lambda b,\quad\frac{1}{\pi_{p}^{(1)}}-\frac{1}{\pi_{p}^{(2)}}=-\lambda c.$

The condition (13) suggests that the constructed $\pi^{(k)}_{y}\,(y=1,\dots,p;k=1,2)$ satisfy $\sum_{y=1}^{p}\pi^{(k)}_{y}=1$ for $k=1,2$ and, for any $z\in\{1,\dots,q\}$ ,

\sum_{y=1}^{p}\left(\frac{1}{\pi^{(1)}_{y}}-\frac{1}{\pi^{(2)}_{y}}\right)p(y\mid\delta=1,z)=\lambda\sum_{y=1}^{p}h_{y}p(y\mid\delta=1,z)=0.

Therefore, the model is unidentifiable.

∎

Proof of Theorem 3.5.

We consider when $y$ is continuous because when $y$ is discrete, we just need to change the integral to summation. To simplify the discussion, we consider the case where $\mathcal{S}_{y}=\mathbb{R}$ . Let $\bm{u}$ be a fixed value. Because $h$ and $g$ are injective functions, it is sufficient to prove the case where $\alpha:=h(\bm{u};\bm{\alpha})$ and $\beta:=g(\bm{u};\bm{\beta})$ . Therefore, our goal is to prove

\displaystyle\frac{p(y\mid\bm{x},\delta=1;\bm{\gamma})}{\int p(y\mid\bm{x},\delta=1;\bm{\gamma})\Psi\{\alpha+\beta m(y)\}^{-1}dy}=\frac{p(y\mid\bm{x},\delta=1;\bm{\gamma^{\prime}})}{\int p(y\mid\bm{x},\delta=1;\bm{\gamma^{\prime}})\Psi\{\alpha^{\prime}+\beta^{\prime}m(y)\}^{-1}dy},

implies $\alpha=\alpha^{\prime}$ , $\beta=\beta^{\prime}$ and $\bm{\gamma}=\bm{\gamma^{\prime}}$ . Integrating both sides of the above equation with respect to $y$ yields the equality of the denominator. Thus, we have $p(y\mid\bm{x},\delta=1;\bm{\gamma})=p(y\mid\bm{x},\delta=1;\bm{\gamma^{\prime}})$ ; this implies $\bm{\gamma}=\bm{\gamma^{\prime}}$ by (C4).

Next, we consider the identification of $\beta$ . Taking $\bm{z}_{1}$ and $\bm{z}_{2}$ such that they satisfy (C5), we show that

	$\displaystyle\int\frac{p(y\mid\bm{u},\bm{z}_{1},\delta=1;\bm{\gamma})}{\Psi\{\alpha+\beta m(y)\}}dy$	$\displaystyle=\int\frac{p(y\mid\bm{u},\bm{z}_{1},\delta=1;\bm{\gamma})}{\Psi\{\alpha^{\prime}+\beta^{\prime}m(y)\}}dy,$		(17)
	$\displaystyle\int\frac{p(y\mid\bm{u},\bm{z}_{2},\delta=1;\bm{\gamma})}{\Psi\{\alpha+\beta m(y)\}}dy$	$\displaystyle=\int\frac{p(y\mid\bm{u},\bm{z}_{2},\delta=1;\bm{\gamma})}{\Psi\{\alpha^{\prime}+\beta^{\prime}m(y)\}}dy,$		(18)

implies $\beta=\beta^{\prime}$ . It follows from (17) and (18) that

	$\displaystyle\int K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})p(y\mid\bm{u},\bm{z}_{1},\delta=1;\bm{\gamma})dy$
	$\displaystyle=\int K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})p(y\mid\bm{u},\bm{z}_{2},\delta=1;\bm{\gamma})dy=0,$		(19)

where $K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})=\Psi^{-1}\{\alpha+\beta m(y)\}-\Psi^{-1}\{\alpha^{\prime}+\beta^{\prime}m(y)\}$ . It remains to show that (19) implies $\beta=\beta^{\prime}$ in the following two steps:

Step $\mathrm{I}$ . We prove that the function $K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})$ has a single change of sign when $\beta\neq\beta^{\prime}$ . Assume that $\beta\neq\beta^{\prime}$ . The equation $K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})=0$ has only one solution $y^{*}\in\mathcal{S}_{y}$ satisfying $m(y^{*})=(\alpha-\alpha^{\prime})/(\beta^{\prime}-\beta)$ because of the injectivity of the function $m(\cdot)$ and $\Psi(\cdot)$ . This implies $K(y)$ has a single change of sign.

Step $\mathrm{I}\mathrm{I}$ . We prove that the equation (19) does not hold when $\beta=\beta^{\prime}$ . Without loss of generality, by Step $\mathrm{I}$ , we consider a case where $K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})<0\ (y<y^{*})$ and $K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})>0\ (y>y^{*})$ , and $p(y\mid\bm{u},\bm{z}_{2},\delta=1)/p(y\mid\bm{u},\bm{z}_{1},\delta=1)$ is monotone increasing. Let $c$ be the upper bound of the density ratio

\displaystyle c:=\sup_{y<y^{*}}\frac{p(y\mid\bm{u},\bm{z}_{2},\delta=1)}{p(y\mid\bm{u},\bm{z}_{1},\delta=1)}.

By a property on $K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})$ shown in (19), we have

	$\displaystyle 0$	$\displaystyle=\int K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})p(y\mid\bm{u},\bm{z}_{2},\delta=1)dy$
		$\displaystyle=\int_{-\infty}^{y^{*}}K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})\frac{p(y\mid\bm{u},\bm{z}_{2},\delta=1)}{p(y\mid\bm{u},\bm{z}_{1},\delta=1)}p(y\mid\bm{u},\bm{z}_{1},\delta=1)dy$
		$\displaystyle\hskip 20.00003pt+\int_{y^{*}}^{\infty}K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})\frac{p(y\mid\bm{u},\bm{z}_{2},\delta=1)}{p(y\mid\bm{u},\bm{z}_{1},\delta=1)}p(y\mid\bm{u},\bm{z}_{1},\delta=1)dy$
		$\displaystyle\geq\int_{-\infty}^{y^{}}cK(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})p(y\mid\bm{u},\bm{z}_{1},\delta=1)dy+\int_{y^{}}^{\infty}cK(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})p(y\mid\bm{u},\bm{z}_{1},\delta=1)dy$
		$\displaystyle=c\int K(y;\alpha,\alpha^{\prime},\beta,\beta^{\prime})p(y\mid\bm{u},\bm{z}_{1},\delta=1)dy=0,$

where the inequality follows from the definition of $c$ . This results in the density ratio $p(y\mid\bm{u},\bm{z}_{2},\delta=1)/p(y\mid\bm{u},\bm{z}_{1},\delta=1)$ being a constant on $\mathcal{S}_{y}$ , hence, $p(y\mid\bm{u},\bm{z}_{2},\delta=1)=p(y\mid\bm{u},\bm{z}_{1},\delta=1)$ on $\mathcal{S}_{y}$ . This contradicts with (C5), thus $\beta=\beta^{\prime}$ .

Finally, from the strict monotonicity of $\Psi$ , it follows that the integration

\displaystyle\int\frac{p(y\mid\bm{u},\bm{z}_{1},\delta=1;\bm{\gamma})}{\Psi\{\alpha+\beta m(y)\}}dy,

is injective with respect to $\alpha$ . Therefore, equation (17) implies that $\alpha=\alpha^{\prime}$ .

∎

Proof of Proposition 3.8.

It follows from the assumption (4) that there exist $M,C>0$ such that

	$\displaystyle\int\frac{p(y\mid\bm{x},\delta=1;\bm{\gamma})}{\Psi\{h(\bm{u};\bm{\alpha})+g(\bm{u};\bm{\beta})m(y)\}}dy$
	$\displaystyle\propto\int_{-\infty}^{\infty}\exp\left\{-\frac{1}{2}\frac{(y-h(\bm{u};\bm{\alpha})-\beta\mu(\bm{x},\bm{\kappa}))^{2}}{\beta^{2}\sigma^{2}}\right\}\frac{1}{\Psi(y)\exp(\|y\|^{s})}\exp(\|y\|^{s})dy$
	$\displaystyle\leq\int_{-\infty}^{-M}\exp\left\{-\frac{1}{2}\frac{(y-h(\bm{u};\bm{\alpha})-\beta\mu(\bm{x},\bm{\kappa}))^{2}}{\beta^{2}\sigma^{2}}\right\}C\exp(\|y\|^{s})dy+C<\infty,$

where $0<s<2$ . The first and the second terms of the last equation hold by the condition (4) and the increasing assumption of $\Psi$ , respectively. ∎