This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: Reyhaneh Hosseinpourkhoshkbari 22institutetext: Cognitive Informatics and Statistics Lab, School of Behavioral and Brain Sciences (GR4.302),
University of Texas at Dallas, Richardson, TX. 22email: [email protected]
33institutetext: Richard M. Golden 44institutetext: Cognitive Informatics and Statistics Lab, School of Behavioral and Brain Sciences (GR4.1),
University of Texas at Dallas, Richardson, TX. 44email: [email protected]

This project was partially funded by The University of Texas at Dallas Office of Research and Innovation by an award to Richard Golden through the SPARK program.

Assessment of Misspecification in CDMs using a Generalized Information Matrix Test

Reyhaneh Hosseinpourkhoshkbari [Uncaptioned image] and Richard M. Golden [Uncaptioned image]
Abstract

If the probability model is correctly specified, then we can estimate the covariance matrix of the asymptotic maximum likelihood estimate distribution using either the first or second derivatives of the likelihood function. Therefore, if the determinants of these two different covariance matrix estimation formulas differ this indicates model misspecification. This misspecification detection strategy is the basis of the Determinant Information Matrix Test (GIMTDetGIMT_{Det}). To investigate the performance of the GIMTDetGIMT_{Det}, a Deterministic Input Noisy And gate (DINA) Cognitive Diagnostic Model (CDM) was fit to the Fraction-Subtraction dataset. Next, various misspecified versions of the original DINA CDM were fit to bootstrap data sets generated by sampling from the original fitted DINA CDM. The GIMTDetGIMT_{Det} showed good discrimination performance for larger levels of misspecification. In addition, the GIMTDetGIMT_{Det} did not detect model misspecification when model misspecification was not present and additionally did not detect model misspecification when the level of misspecification was very low. However, the GIMTDetGIMT_{Det} discrimation performance was highly variable across different misspecification strategies when the misspecification level was moderately sized. The proposed new misspecification detection methodology is promising but additional empirical studies are required to further characterize its strengths and limitations.

Keywords:
misspecification, information matrix test, cognitive diagnostic model

0.1 Introduction

Cognitive diagnostic models (CDMs) are a family of restricted latent class psychometric models designed to assess examinee skill mastery by incorporating prior knowledge of the relationship between latent skills and student responses (Torre2009 (17)). In addition, a CDM outputs the attribute distribution or the probability that an examinee will have a particular set of skills given the examinee’s exam performance. Characteristics of latent skills in CDMs are defined through a Q-matrix that specifies which specific skills are relevant for answering a particular exam question. Furthermore, Q-matrix misspecification can affect parameter estimates and respondent classification accuracy (RuppTemplin2008 (28)). Therefore, careful validation of the Q-matrix is crucial to ensure that the model accurately represents the underlying relationships between the test items and the measured attributes. Methods have been developed to estimate and validate expert knowledge. However, despite best intentions, the possibility of CDM misspecification is always present.

The effects of model misspecification in CDMs have been investigated by directly comparing observed frequencies and predicted probabilities (Kunina2012 (23)), by comparing a correctly specified model to a nested model using Wald test methodology (Torre2011 (18)), and comparing selected first and second order observed and expected moments (Chen2018 (16)). These methodologies often face the challenge of test statistics with poor statistical power due to the difficulty of reliably estimating parameters in fully saturated (highly flexible models). For example, consider the problem of comparing the predicted probability of every pattern of responses to an exam with dd items using a probability model with qq free parameters. The degrees of freedom (and variance) for the Pearson Goodness-Of-Fit (GOF) chi-squared test statistic increases as an exponential function of the number of items dd. The M2 statistic (maydeu2014 (26)) provides an improvement in statistical power by only examining the first and second moments resulting in a chi-squared test statistic whose degrees of freedom increase as a quadratic function of the number of items dd.

Using a different approach, which detects model misspecification by comparing different covariance matrix estimators, Wh82 (30) introduced the Information Matrix Test (IMT) which is a chi-squared test with q(q1)/2q(q-1)/2 degrees of freedom. This test is based on comparing the inverse covariance matrix estimators: 𝐀^1\hat{{\bf A}}^{-1}, derived from the Hessian matrix, and 𝐁^1\hat{{\bf B}}^{-1}, derived from the outer product gradient (OPG). These estimators are calculated from the second and first derivatives of the log-likelihood function, respectively. The Wh82 (30) IMT test statistic for misspecification detection has a chi-squared distribution whose degrees of freedom increase as a quadratic function of the number of parameters qq in the model. golden2013 (21, 22) proposed a generalization of the Wh82 (30) IMT framework called the Generalized Information Matrix Test (GIMT) framework. However, no systematic studies have been conducted to specifically investigate the performance of the GIMT in the context of model misspecification detection in CDMs.

presnell2004ios (27) developed and empirically evaluated a statistical test for comparing the ”in-sample” (training data) log-likelihood model fit to the ”out-of-sample” (test data) log-likelihood model fit. Presnell and Boos (2004) referred to their test as the ”in and out of sample” (IOS) test and showed the test statistic had a chi-squared distribution with 1 degree of freedom regardless of model or data complexity. The IOS test may be interpreted as a type of GIMT as described by golden2013 (21, 22) which we call the Determinant GIMT (GIMTDetGIMT_{Det}) GIMTDet=(1/q)logdet(𝐀^n1𝐁^n)GIMT_{Det}=(1/q)\log\det(\hat{\bf A}_{n}^{-1}\hat{\bf B}_{n}).

More recently, liu2019 (24) examined the performance of different methods in providing consistent standard errors (SEs) for item parameter estimates for situations where a CDM was misspecified or correctly specified. liu2019 (24) showed a difference among OPG, Hessian, and Robust standard errors when the Q-matrix is misspecified. Although liu2019 (24) had not intended to develop a method for misspecification detection, their empirical results nevertheless support a type of GIMT for misspecification detection as introduced by golden2013 (21, 22).

In this paper, we describe another GIMT methodology that focuses on a single statistic for comparing covariance matrices rather than a comparison based upon their diagonal elements. First, we sketch the mathematical derivation of the asymptotic distribution of the GIMTDetGIMT_{Det} statistic for CDMs using the methods described by golden2013 (21, 22). Second, we empirically investigate the asymptotic behavior of the GIMTDetGIMT_{Det} for CDMs in a series of simulation studies. The simulation studies simulate data sets from a known DINA CDM fit to the Ta84 (29) Fraction-Subtraction data set. Next, the resulting bootstrapped simulated data sets are fit to the DINA CDM which generated the bootstrap data as well as different misspecified versions of the original DINA CDM. The discrimination performance of the GIMTDetGIMT_{Det} is then reported to provide an empirical evaluation of the ability of the GIMTDetGIMT_{Det} to detect model misspecification in a DINA CDM.

0.2 Mathematical Theory

0.2.1 Model Misspecification

In the statistical machine learning framework, the Data Generating Process (DGP) generates observable data directly from an underlying unobservable probability distribution, pDGPp_{\text{DGP}}. A probability model,MM, is a collection of probability mass functions. If pDGPMp_{DGP}\in M, then MM is correctly specified; otherwise, MM is misspecified with respect to pDGPp_{\text{DGP}}. Less formally, a model capable of representing the DGP is called a ”correctly specified model.”

0.2.2 Cognitive Diagnostic Model Specification 

Data Set

Consider a scenario where NN examinees are randomly selected to take a diagnostic test of JJ items. The outcomes of this test are recorded in a binary matrix 𝐗\bf X, which has NN rows and JJ columns. Each element, xij{x}_{ij}, represents the response of examinee ii (1iN)(1\leq i\leq N) to item jj (1iN)(1\leq i\leq N), with a value of 11 indicating a correct response and 0 indicating an incorrect response. The rows of the matrix X, 𝐱1,,𝐱n{\bf x}_{1},\ldots,{\bf x}_{n}, are assumed to be a realization of a sequence of independent and identically distributed random vectors with a common probability mass function pDGP(x)p_{DGP}(x).

Evidence Model

Let α=(α1,,αk){\bf\alpha}=(\alpha_{1},\ldots,\alpha_{k}) represent the discrete attribute mastery profile for an examinee, where αk\alpha_{k} is one if and only if that examinee demonstrates mastery of latent skill kk. Let 𝐪j=[qj1,,qjk]{\bf q}_{j}=[q_{j1},\ldots,q_{jk}] represent the jjth row of the 𝐐\bf Q-matrix, where each element, qjkq_{jk} equals one if the kkth skill is required to answer question jj and is equal to zero otherwise. Let logistic sigmoidal function 𝒮(ϕ){\cal S}(\phi) be defined such that 𝒮(ϕ)=1/(1+exp(ϕ)){\cal S}(\phi)=1/(1+\exp(-\phi)). Let 𝜷=[𝜷1,..,𝜷j]T{\boldsymbol{\beta}}=[\boldsymbol{\beta}_{1},..,\boldsymbol{\beta}_{j}]^{T} denote item parameters vector, where 𝜷j=[βj1,βj2]T{\boldsymbol{\beta}_{j}}=[\beta_{j1},\beta_{j2}]^{T} and βj1\beta_{j1} is the main and interaction effect parameters for question jj, and βj2\beta_{j2} is the intercept parameter. The probability of a correct response to question jj, given the mastery profile, item parameters, and the skills required for question jj, specified by 𝐪j{\bf q}_{j}, is calculated using the following formula:

pij=p(xij=1|𝜶,𝜷j,𝒒j)=𝒮(𝜷j1ψ(𝜶,𝐪j)𝜷j2).p_{ij}=p\left({x}_{ij}=1|{\boldsymbol{\alpha}},{\boldsymbol{\beta}}_{j},{\boldsymbol{q}}_{j}\right)={\cal S}({\boldsymbol{\beta}}_{j1}{\psi}({\boldsymbol{\alpha}},{\bf q}_{j})-{\boldsymbol{\beta}}_{j2}).

where larger values of ψ(α,𝐪j)\psi(\alpha,{\bf q}_{j}) indicate an increased likelihood that item jj is correctly answered given that latent skill pattern α\alpha. We have implemented the DINA CDM by defining ψ(𝜶,𝒒j)=(k=1K𝜶kqj,k){0,1}{\psi}({\boldsymbol{\alpha}},{\boldsymbol{q}}_{j})=(\prod_{k=1}^{K}{\boldsymbol{\alpha}}_{k}^{q_{j,k}})\in\{0,1\} to calculate the expected response to item jj.
Let:

p(xij|𝜶,𝜷,𝒒j)=xijpij+(1xij)(1pij).p\left(x_{ij}|{\boldsymbol{\alpha}},{\boldsymbol{\beta}},{\boldsymbol{q}}_{j}\right)=x_{ij}p_{ij}+(1-x_{ij})(1-p_{ij}).

The probability of all observed responses for iith examinee, xix_{i}, given a specific pattern of latent skills, can be expressed as follows:

p(xi|𝜶,𝜷j)=j=1Jp(xij|𝜶,𝜷j,𝒒j).p({x}_{i}|{\boldsymbol{\alpha}},{\boldsymbol{\beta}_{j}})=\prod_{j=1}^{J}p(x_{ij}|{\boldsymbol{\alpha}},{\boldsymbol{\beta}_{j}},{\boldsymbol{q}}_{j}).

Proficiency Model

After specifying the conditional distribution of student responses given latent skill profiles, the subsequent emphasis will be on investigating the joint distribution of attributes. The saturated joint attribute distribution model for all possible values that the kk-dimensional binary vector 𝜶{\boldsymbol{\alpha}} can take on requires 2k12^{k}-1 parameters, so when the number of attributes is moderately large, a more constrained joint attribute model might be desired to represent the joint distribution of 𝜶{\boldsymbol{\alpha}}. In this paper, a Bernoulli latent skill attribute probability model (e.g., Maris1999 (25)) is assumed where the latent skills are independent so that the probability that the kkth latent skill, αk\alpha_{k}, is present in the attribute pattern is given by the formula:

p(αk)=αk𝒮(ηk)+(1αk)(1𝒮(ηk))p(\alpha_{k})=\alpha_{k}{\cal S}(-\eta_{k})+(1-\alpha_{k})(1-{\cal S}(-\eta_{k}))

where ηk\eta_{k} can be a free parameter; however, in this paper, ηk\eta_{k} is a constant chosen such that the guess probability 𝒮(𝜼{\cal S}(-\boldsymbol{\eta}) is 0.354. The probability, p(𝜶|𝜼)p({\boldsymbol{\alpha}}|\boldsymbol{\eta}), of a skill attribute profile, 𝜶=[α1,,αK]{{\boldsymbol{\alpha}}}=[\alpha_{1},\ldots,\alpha_{K}], for an examinee given η\eta is given by the formula:

p(𝜶)=k=1Kp(αk).p({{\boldsymbol{\alpha}}})=\prod_{k=1}^{K}p(\alpha_{k}).

0.2.3 Model Parameter Estimation 

The parameter prior for the two-dimensional parameter vector 𝜷j\boldsymbol{\beta}_{j}, associated with the jjth question, is represented by a bivariate Gaussian density, denoted as p(𝜷j)p(\boldsymbol{\beta}_{j}). This density has a two-dimensional mean vector 𝝁j\boldsymbol{\mu}_{j} and a two-dimensional covariance matrix 𝐂𝜷\bf C_{\boldsymbol{\beta}} for j=1,,Jj=1,\ldots,J. It is assumed that the constants μ1,,μJ\mu_{1},\ldots,\mu_{J} are known, and 𝑪𝜷\boldsymbol{C}_{\boldsymbol{\beta}} is a positive number. Let 𝝁β=[μ1,,μJ]\boldsymbol{\mu}_{\beta}=[\mu_{1},\ldots,\mu_{J}]. The joint distribution of 𝜷=[𝜷1,,𝜷J]\boldsymbol{\beta}=[\boldsymbol{\beta}_{1},\ldots,\boldsymbol{\beta}_{J}] is specified as the parameter prior:

p(𝜷)=j=1Jp(𝜷j).p(\boldsymbol{\beta})=\prod_{j=1}^{J}p({\boldsymbol{\beta}}_{j}). (1)

The likelihood function of the response vector of examinee ii who is assumed to have attribute pattern 𝜶{\boldsymbol{\alpha}} is given by:

p(𝐱i,𝜶|𝜷)=p(𝜶)p(𝐱i|𝜶,𝜷)=p(𝜶)j=1Jp(xij|𝜶,𝜷j,𝐪j).p({\bf x}_{i},{\boldsymbol{\alpha}}|{\boldsymbol{\beta}})=p({\boldsymbol{\alpha}})p({\bf x}_{i}|{\boldsymbol{\alpha}},{\boldsymbol{\beta}})=p({\boldsymbol{\alpha}})\prod_{j=1}^{J}p(x_{ij}|{\boldsymbol{\alpha}},{\boldsymbol{\beta}}_{j},{\bf q}_{j}). (2)

yielding the MAP empirical risk function:

^n(𝜷)=(1/n)logp(𝜷)+(1/n)i=1nc(𝐱i,𝜷).\hat{\ell}_{n}({\boldsymbol{\beta}})=-(1/n)\log p({\boldsymbol{\beta}})+(1/n)\sum_{i=1}^{n}c({\bf x}_{i},{\boldsymbol{\beta}}). (3)

where

c(𝐱i;𝜷)=logp(𝐱i|𝜷)=logαp(𝐱i|𝜶,𝜷)p(𝜶).\;\;c({\bf x}_{i};{\boldsymbol{\beta}})=-\log p({\bf x}_{i}|{\boldsymbol{\beta}})=-\log\sum_{\alpha}p({\bf x}_{i}|{\boldsymbol{\alpha}},{\boldsymbol{\beta}})p({\boldsymbol{\alpha}}).

In this study, the MAP empirical risk function (3) involves summing over all possible latent skill attribute patterns. Assume ^n\hat{\ell}_{n} is a twice-continuously differentiable objective function. Once a critical point is reached, then the Hessian of ^n\hat{\ell}_{n} can be evaluated at that point to check if the critical point is a strict local minimizer (e.g., Golden2020 (20)). Here we assume that a parameter estimate 𝜷^n\hat{\boldsymbol{\beta}}_{n} has been obtained which is a strict local minimizer of ^n(𝜷)\hat{\ell}_{n}({\boldsymbol{\beta}}) in some (possibly very small) closed, bounded, and convex region of the parameter space Ω\Omega for all sufficiently large nn. Furthermore, we assume that the expected value of ^n(𝜷)\hat{\ell}_{n}({\boldsymbol{\beta}}), (𝜷)\ell({\boldsymbol{\beta}}), has a strict global minimizer, 𝜷{\boldsymbol{\beta}}^{*}, in the interior of Ω\Omega. This setup of the problem thus allows for situations where \ell has multiple minimizers, maximizers, and saddlepoints over the entire unrestricted parameter space. Given these assumptions, it can be shown (e.g., Wh82 (30, 20)) that 𝜷^n\hat{\boldsymbol{\beta}}_{n} in this case is a consistent estimator of 𝜷{\boldsymbol{\beta}}^{*}. In addition, for large sample sizes, the effects of the parameter prior becomes negligible and the MAP estimate has the same asymptotic distribution as the maximum likelihood estimate.

0.2.4 Information Matrix Test Methods for Detection of Model Misspecification

Determinant Generalized Information Matrix Test Statistical Theory

In this section, we present explicit details regarding the derivation of the asymptotic distribution of the GIMTDetGIMT_{Det}. Let 𝐠(𝐱~;𝜷)logp(𝐱~i;𝜷){\bf g}({\bf\tilde{x}};\boldsymbol{\beta})\equiv-\nabla\log p({\bf\tilde{x}}_{i};{\boldsymbol{\beta}}). Let 𝐀{\bf A}^{*} denote the Hessian of \ell, evaluated at β{\beta}^{*}, 𝐀=𝐀(𝜷)=2E{logp((𝐱~i;𝜷)}.{\bf A}^{*}={\bf A}({\boldsymbol{\beta}^{*}})=-\nabla^{2}E\{\log p(({\bf\tilde{x}}_{i};{\boldsymbol{\beta}})\}. Let 𝐁{\bf B}^{*} denote 𝐁=𝐁(𝜷)=E{𝐠(𝐱~;𝜷)𝐠(𝐱~;𝜷)T}{\bf B^{*}}={\bf B}({\boldsymbol{\beta}^{*}})=E\{{\bf g}({\bf\tilde{x}};\boldsymbol{\beta}){\bf g}({\bf\tilde{x}};\boldsymbol{\beta})^{T}\} evaluated at the point 𝜷{\boldsymbol{\beta}}^{*}. It is well known (e.g., Wh82 (30, 21, 22)) that if 𝐀{\bf A}^{*} and 𝐁{\bf B}^{*} are positive definite, then the asymptotic distribution of 𝜷^n\hat{\boldsymbol{\beta}}_{n} is a multivariate Gaussian with mean 𝜷{\boldsymbol{\beta}}^{*} and covariance matrix (1/n)𝐂(1/n)(𝐀)1𝐁(𝐀)1(1/n){\bf C}^{*}\equiv(1/n)({\bf A}^{*})^{-1}{\bf B}^{*}({\bf A}^{*})^{-1} as nn\to\infty. In the special case where the model MM is correctly specified in the sense that the observed data is i.i.d. with a common probability mass function such that: p(𝐱|𝜷)=pDGP(𝐱)p({\bf x}|{\boldsymbol{\beta}}^{*})=p_{DGP}({\bf x}), then the covariance matrix 𝐂{\bf C}^{*} can be computed using either (𝐀)1({\bf A}^{*})^{-1} or (𝐁)1({\bf B}^{*})^{-1}. It then follows that if MM is correctly specified with respect to pDGPp_{DGP}, then 𝐀{\bf A}^{*} = 𝐁{\bf B}^{*}. Consequently, if 𝐀𝐁{\bf A}^{*}\neq{\bf B}^{*}, then MM is misspecified with respect to pDGP.p_{DGP}.

Let s:q×q×q×qs:{\cal R}^{q\times q}\times{\cal R}^{q\times q}\rightarrow{\cal R} be a continuously differentiable function and assume s\nabla s evaluated at 𝐀{\bf A}^{*} and 𝐁{\bf B}^{*} has full row rank. The function ss is called a GIMT Hypothesis Function that has the property that if 𝐀{\bf A} = 𝐁{\bf B}, then s(𝐀,𝐁)=𝟎{s(\bf A,\bf B)=0} for every symmetric positive definite matrix A and for every symmetric positive definite matrix B. Let ss(𝐀,𝐁){s^{*}\equiv s({\bf A}^{*},{\bf B}^{*})} following golden2013 (21, 22).

Let 𝐀^n\hat{\bf A}_{n} denote the Hessian of ^n\hat{\ell}_{n} evaluated at 𝜷^n\hat{\boldsymbol{\beta}}_{n}. Let

𝐁^n=(1/n)i=1n𝐠(𝐱~i,𝜷^n)𝐠(𝐱~i,𝜷^n)T.\hat{\bf B}_{n}=(1/n)\sum_{i=1}^{n}{\bf g}(\tilde{\bf x}_{i},\hat{\boldsymbol{\beta}}_{n}){\bf g}(\tilde{\bf x}_{i},\hat{\boldsymbol{\beta}}_{n})^{T}.

A GIMT is defined as a test statistic s^ns(𝐀^n,𝐁^n)\hat{s}_{n}\equiv s({\hat{\bf A}_{n}},{\hat{\bf B}_{n}}) that evaluates the null hypothesis:

H0:s=0.{H_{0}:s^{*}={0}}.

The GIMTDetGIMT_{Det} (golden2013 (21); also see Golden2020 (20, 22)) is specified by the GIMT hypothesis function ss is defined such that: s(𝐀,𝐁)=1qlogdet((𝐀)𝟏𝐁).s({\bf{A^{*}}},{\bf{B^{*}}})=\frac{1}{\textit{q}}\log\det(\bf(A^{*})^{-1}\bf B^{*}). Let the Wald test statistic:

𝒲n=n(s^n)T𝐂^n,s1(s^n).{\bf\mathcal{W}}_{\text{n}}=n(\hat{s}_{n})^{T}\hat{\bf C}_{n,s}^{-1}(\hat{s}_{n}).

where s^n=1qlogdet((𝐀^)𝟏𝐁^)\hat{s}_{n}=\frac{1}{q}\log\det((\bf\hat{A})^{-1}\bf\hat{B}), If the null hypothesis H0:s=0{H_{0}:s^{*}=0} holds, then 𝒲^n{\bf\hat{\mathcal{W}}}_{n} converges in distribution to a chi-squared random variable with one degree of freedom as nn\rightarrow\infty (see Theorem 7 for Golden2016 (22)). If the null hypothesis fails, 𝒲^n{\bf\hat{\mathcal{W}}}_{n}\rightarrow\infty as nn\rightarrow\infty with a probability of one (see Theorem 7 for Golden2016 (22)). Here the estimator 𝐂^n,s1\hat{\bf C}_{n,s}^{-1} is an estimate of the asymptotic covariance matrix of n1/2(s^ns)n^{1/2}({\hat{s}_{n}}-s^{*}), (𝐂s)1({\bf C}_{s}^{*})^{-1}. golden2013 (21, 22) shows how this asymptotic covariance matrix can be estimated using the first, second, and third derivatives of the log-likelihood function. Notice that since the degrees of freedom of the Wald statistic for the GIMTDetGIMT_{Det} is equal to 1, the test statistic has a distribution that is the square of a normalized Gaussian random variable.

0.3 Simulation Study

0.3.1 Dataset

The simulation studies used CDMs with parameters estimated from the Ta84 (29) fraction-subtraction data set. The dataset consists of a set of math problems involving fractions subtraction, designed to assess students’ ability to solve problems involving fractions. In this case, we utilized the ’Fraction.1’ dataset from the ’CDM’ R package (cdmR (19)) containing the dichotomous responses of 536 middle school students over 15 fraction subtraction test items. The 𝐐\bf Q-matrix specifying which of five distinct skills were required to answer a particular test item was based upon the 𝐐\bf Q-matrix used by Ta84 (29) and provided in cdmR (19).

0.3.2 Methods

First, the described DINA CDM was fit to the data set with a sample size of n=536n=536 using the previously described MAP estimation method (see Equation 3). A multivariate Gaussian prior with mean vector 𝝁𝜷\boldsymbol{\mu}_{\boldsymbol{\beta}} and covariance matrix 𝐂𝜷\bf C_{\boldsymbol{\beta}} was used for all items (see Equation 1). To ensure guess (g) and slip (s) probabilities both equaled 0.354 for a specific item, the Gaussian Parameter Prior mean vector for that item was set as 𝝁𝜷=[1.20.6]T\boldsymbol{\mu}_{\boldsymbol{\beta}}=\begin{bmatrix}1.2&0.6\end{bmatrix}^{T}. An uninformative Gaussian Parameter Prior covariance matrix 𝐂𝜷\bf C_{\boldsymbol{\beta}} was chosen to be σ2𝐈2\sigma^{2}{\bf I}_{2}, where σ2=4500\sigma^{2}=4500. The fitted DINA CDM will be considered as the Data Generating Process CDM (DGPCDMDGP_{CDM}). To create a scenario in which the CDM is correctly specified, we refitted the initial original DINA CDM by sampling from the original fitted DINA CDM. Five different versions of the original DINA CDM, corresponding to ”misspecified models,” were used. These versions were created by flipping Tatsuoka’s 𝐐\bf Q-matrix elements with different probabilities (0%, 1%, 5%, 10%, 15%, and 20%). The 0% misspecification level corresponds to the correctly specified case. For each level of 𝐐\bf Q-matrix misspecification, we generate 5 different misspecified Q matrices at that level, allowing for variations in which elements are altered. We refer to a specific way of misspecifying the 𝐐\bf Q-matrix at a given level as a ”replication.” Then, we fit each of the 5 replications of the 𝐐\bf Q-matrix at a particular misspecification level to 50 bootstrap-generated datasets from the original DINA CDM.

We computed GIMTDetGIMT_{Det} statistics as a function of misspecification level. Then we evaluated the performance of the GIMTDetGIMT_{Det} in classifying correctly specified and misspecified models using the Receiver Operating Characteristic (ROC) curve. The ROC curve plots the true positive rate against the false positive rate at different decision thresholds, allowing for an investigation of the discrimination performance of the GIMTDetGIMT_{Det}. Additionally, the area under the ROC curve (AUROC) can provide a quantitative measure of the GIMTDetGIMT_{Det} discrimination, with a high AUROC value indicating that the GIMTDetGIMT_{Det} effectively distinguishes between correctly specified and misspecified models. Conversely, an AUROC close to 0.5 indicates no discrimination performance.

0.3.3 Results and Discussion

Table 1 displays the AUROC values under various misspecification levels which shows that AUROC values increase as the misspecification level increases. The results of the simulation study are also presented in Figure 1, which displays how the ROC curves evolve at various misspecification levels (0%, 1%, 5%, 10%, 15%, and 20%) with a constant sample size of n=536. At the 20% misspecification level, the ROC curve demonstrates good discrimination performance, closely approaching the upper-left corner.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 1: Influence of Misspecification Levels on ROC Curves: Level of misspecification: 0%, 1%, 5%, 10%, 15%, and 20%; Sample Size n = 536; Conducted with five replications for each level. The discrimination performance of GIMTDetGIMT_{Det} was effective in the 10%, 15%, and 20% cases, while there was no indication of discrimination performance in the 0% and 1% misspecification cases.

Misspecification level AUROC Mean  Lower CI  Upper CI
Correctly Specified 0.51 0.46 0.57
1% Misspecification 0.46 0.38 0.54
5% Misspecification 0.54 0.23 0.85
10% Misspecification 0.87 0.75 0.99
15% Misspecification 0.91 0.78 1.00
20% Misspecification 0.92 0.82 1.00
Table 1: Average AUROC and Confidence Intervals at Different Levels of Model Misspecification, Conducted with five versions for Each Level. The average AUROC increases with higher levels of misspecification. Notably, the 5% misspecification case exhibits the most variation in the ”version of misspecification,” while reasonably good discrimination performance is consistently achieved for the 10%, 15%, and 20% cases.

When the misspecification level was greater than 5%, the GIMT showed good discrimination performance regardless of the manner in which the Q matrix was decimated. In addition, the GIMT did not detect model misspecification for the correctly specified case (0%) and did not detect model misspecification for the slightly misspecified case (1%). For moderate misspecification levels (5%), however, only two of the five replications showed effective discrimination performance. There was also substantial variability across replications for the 5% case. Currently, we view the 5% level as a ”transition region” and we plan to investigate this further using additional replications. In summary, the new GIMT method for misspecification detection in CDMs appears to be promising. Further research will continue to explore the capabilities and limitations of this approach.

References

  • (1) Chen, F., Liu, Y., Xin, T. & Cui, Y. Applying the M2 Statistic to Evaluate the Fit of Diagnostic Classification Models in the Presence of Attribute Hierarchies. Frontiers In Psychology. 9 (2018,10), https://www.frontiersin.org/article/10.3389/fpsyg.2018.01875/full
  • (2) de la Torre, J. DINA Model and Parameter Estimation: A Didactic. Journal Of Educational And Behavioral Statistics. 34, 115-130 (2009,3), http://journals.sagepub.com/doi/10.3102/1076998607309474
  • (3) de la Torre, J. The Generalized DINA Model Framework. Psychometrika. 76 (2011)
  • (4) George, A., Robitzsch, A., Kiefer, T., Groß, J. & Ünlü, A. The R Package CDM for Cognitive Diagnosis Models. Journal Of Statistical Software. 74, 24 (2016), https://www.jstatsoft.org/v074/i02
  • (5) Golden, R. Statistical Machine Learning: A unified framework. (CRC Press,2020)
  • (6) Golden, R., Henley, S., White, H. & Kashner, T. New Directions in Information Matrix Testing: Eigenspectrum Tests. Recent Advances And Future Directions In Causality, Prediction, And Specification Analysis: Essays In Honor Of Halbert L. White Jr. pp. 145-177 (2013)
  • (7) Golden, R., Henley, S., White, H. & Kashner, T. Generalized Information Matrix Tests for Detecting Model Misspecification. Econometrics. 4, 46 (2016)
  • (8) Kunina-Habenicht, O., Rupp, A., Wilhelm, O. The Impact of Model Misspecification on Parameter Estimation and Item-Fit Assessment in Log-Linear Diagnostic Classification Models. Journal Of Educational Measurement. 49, 59-81 (2012,3), https://onlinelibrary.wiley.com/doi/10.1111/j.1745-3984.2011.00160.x
  • (9) Liu, Y., Xin, T., Andersson, B. & Tian, W. Information matrix estimation procedures for cognitive diagnostic models. Br J Math Stat Psychol. 72, 18-37
  • (10) Maris, E. Estimating multiple classification latent class models. Psychometrika. 64 pp. 187-212 (1999)
  • (11) Maydeu-Olivares, A. & Joe, H. Assessing Approximate Fit in Categorical Data Analysis. Multivariate Behavioral Research. 49, 305-328 (2014)
  • (12) Presnell, B. & Boos, D. The IOS Test for Model Misspecification. Journal Of The American Statistical Association. 99, 216-227 (2004)
  • (13) Rupp, A. & Templin, J. The Effects of Q-Matrix Misspecification on Parameter Estimates and Classification Accuracy in the DINA Model. Educational And Psychological Measurement. 68, 78-96 (2008,2), http://journals.sagepub.com/doi/10.1177/0013164407301545
  • (14) Tatsuoka, K. Analysis of Errors in Fraction Addition and Subtraction Problems. Final Report for NIE-G-81-0002. (1984)
  • (15) White, H. Maximum Likelihood Estimation of Misspecified Models. Econometrica. 50, 1-25 (1982), http://www.jstor.org/stable/1912526

References

  • (1) Fu Chen, Yanlou Liu, Tao Xin and Ying Cui “Applying the M2 Statistic to Evaluate the Fit of Diagnostic Classification Models in the Presence of Attribute Hierarchies” In Frontiers in Psychology 9, 2018 DOI: 10.3389/fpsyg.2018.01875
  • (2) Jimmy Torre “DINA Model and Parameter Estimation: A Didactic” In Journal of Educational and Behavioral Statistics 34, 2009, pp. 115–130 DOI: 10.3102/1076998607309474
  • (3) Jimmy Torre “The Generalized DINA Model Framework” In Psychometrika 76, 2011 DOI: 10.1007/s11336-011-9207-7
  • (4) Ann Cathrice George et al. “The R Package CDM for Cognitive Diagnosis Models” In Journal of Statistical Software 74.2, 2016, pp. 1–24 DOI: 10.18637/jss.v074.i02
  • (5) Richard Golden “Statistical machine learning: A unified framework” CRC Press, 2020
  • (6) Richard Golden, Steven S. Henley, Halbert White and T. Kashner “New directions in information matrix testing: eigenspectrum tests” In Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis: Essays in Honor of Halbert L. White Jr Springer, 2013, pp. 145–177
  • (7) Richard Golden, Steven S. Henley, Halbert White and T. Kashner “Generalized Information Matrix Tests for Detecting Model Misspecification” In Econometrics 4.4 MDPI, 2016, pp. 46
  • (8) Olga Kunina-Habenicht, André A. Rupp and Oliver Wilhelm “The Impact of Model Misspecification on Parameter Estimation and Item-Fit Assessment in Log-Linear Diagnostic Classification Models” In Journal of Educational Measurement 49, 2012, pp. 59–81 DOI: 10.1111/j.1745-3984.2011.00160.x
  • (9) Yanlou Liu, Tao Xin, Björn Andersson and Wei Tian “Information matrix estimation procedures for cognitive diagnostic models” In British Journal of Mathematical and Statistical Psychology 72.1 Wiley Online Library, 2019, pp. 18–37
  • (10) Eric Maris “Estimating multiple classification latent class models” In Psychometrika 64 Springer, 1999, pp. 187–212
  • (11) Alberto Maydeu-Olivares and Harry Joe “Assessing approximate fit in categorical data analysis” In Multivariate Behavioral Research 49.4 Taylor & Francis, 2014, pp. 305–328
  • (12) Brett Presnell and Dennis D Boos “The IOS test for model misspecification” In Journal of the American Statistical Association 99.465 Taylor & Francis, 2004, pp. 216–227
  • (13) André A. Rupp and Jonathan Templin “The Effects of Q-Matrix Misspecification on Parameter Estimates and Classification Accuracy in the DINA Model” In Educational and Psychological Measurement 68, 2008, pp. 78–96 DOI: 10.1177/0013164407301545
  • (14) K.. Tatsuoka “Analysis of errors in fraction addition and subtraction problems. Final Report for NIE-G-81-0002”, 1984
  • (15) Halbert White “Maximum Likelihood Estimation of Misspecified Models” In Econometrica 50.1, 1982, pp. 1–25 DOI: 10.2307/1912526

References

  • (1) Fu Chen, Yanlou Liu, Tao Xin and Ying Cui “Applying the M2 Statistic to Evaluate the Fit of Diagnostic Classification Models in the Presence of Attribute Hierarchies” In Frontiers in Psychology 9, 2018 DOI: 10.3389/fpsyg.2018.01875
  • (2) Jimmy Torre “DINA Model and Parameter Estimation: A Didactic” In Journal of Educational and Behavioral Statistics 34, 2009, pp. 115–130 DOI: 10.3102/1076998607309474
  • (3) Jimmy Torre “The Generalized DINA Model Framework” In Psychometrika 76, 2011 DOI: 10.1007/s11336-011-9207-7
  • (4) Ann Cathrice George et al. “The R Package CDM for Cognitive Diagnosis Models” In Journal of Statistical Software 74.2, 2016, pp. 1–24 DOI: 10.18637/jss.v074.i02
  • (5) Richard Golden “Statistical machine learning: A unified framework” CRC Press, 2020
  • (6) Richard Golden, Steven S. Henley, Halbert White and T. Kashner “New directions in information matrix testing: eigenspectrum tests” In Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis: Essays in Honor of Halbert L. White Jr Springer, 2013, pp. 145–177
  • (7) Richard Golden, Steven S. Henley, Halbert White and T. Kashner “Generalized Information Matrix Tests for Detecting Model Misspecification” In Econometrics 4.4 MDPI, 2016, pp. 46
  • (8) Olga Kunina-Habenicht, André A. Rupp and Oliver Wilhelm “The Impact of Model Misspecification on Parameter Estimation and Item-Fit Assessment in Log-Linear Diagnostic Classification Models” In Journal of Educational Measurement 49, 2012, pp. 59–81 DOI: 10.1111/j.1745-3984.2011.00160.x
  • (9) Yanlou Liu, Tao Xin, Björn Andersson and Wei Tian “Information matrix estimation procedures for cognitive diagnostic models” In British Journal of Mathematical and Statistical Psychology 72.1 Wiley Online Library, 2019, pp. 18–37
  • (10) Eric Maris “Estimating multiple classification latent class models” In Psychometrika 64 Springer, 1999, pp. 187–212
  • (11) Alberto Maydeu-Olivares and Harry Joe “Assessing approximate fit in categorical data analysis” In Multivariate Behavioral Research 49.4 Taylor & Francis, 2014, pp. 305–328
  • (12) Brett Presnell and Dennis D Boos “The IOS test for model misspecification” In Journal of the American Statistical Association 99.465 Taylor & Francis, 2004, pp. 216–227
  • (13) André A. Rupp and Jonathan Templin “The Effects of Q-Matrix Misspecification on Parameter Estimates and Classification Accuracy in the DINA Model” In Educational and Psychological Measurement 68, 2008, pp. 78–96 DOI: 10.1177/0013164407301545
  • (14) K.. Tatsuoka “Analysis of errors in fraction addition and subtraction problems. Final Report for NIE-G-81-0002”, 1984
  • (15) Halbert White “Maximum Likelihood Estimation of Misspecified Models” In Econometrica 50.1, 1982, pp. 1–25 DOI: 10.2307/1912526