University of Texas at Dallas, Richardson, TX. 22email: [email protected] 33institutetext: Richard M. Golden 44institutetext: Cognitive Informatics and Statistics Lab, School of Behavioral and Brain Sciences (GR4.1),
University of Texas at Dallas, Richardson, TX. 44email: [email protected]
This project was partially funded by The University of Texas at Dallas Office of Research and Innovation by an award to Richard Golden through the SPARK program.
Assessment of Misspecification in CDMs using a Generalized Information Matrix Test
Abstract
If the probability model is correctly specified, then we can estimate the covariance matrix of the asymptotic maximum likelihood estimate distribution using either the first or second derivatives of the likelihood function. Therefore, if the determinants of these two different covariance matrix estimation formulas differ this indicates model misspecification. This misspecification detection strategy is the basis of the Determinant Information Matrix Test (). To investigate the performance of the , a Deterministic Input Noisy And gate (DINA) Cognitive Diagnostic Model (CDM) was fit to the Fraction-Subtraction dataset. Next, various misspecified versions of the original DINA CDM were fit to bootstrap data sets generated by sampling from the original fitted DINA CDM. The showed good discrimination performance for larger levels of misspecification. In addition, the did not detect model misspecification when model misspecification was not present and additionally did not detect model misspecification when the level of misspecification was very low. However, the discrimation performance was highly variable across different misspecification strategies when the misspecification level was moderately sized. The proposed new misspecification detection methodology is promising but additional empirical studies are required to further characterize its strengths and limitations.
Keywords:
misspecification, information matrix test, cognitive diagnostic model0.1 Introduction
Cognitive diagnostic models (CDMs) are a family of restricted latent class psychometric models designed to assess examinee skill mastery by incorporating prior knowledge of the relationship between latent skills and student responses (Torre2009 (17)). In addition, a CDM outputs the attribute distribution or the probability that an examinee will have a particular set of skills given the examinee’s exam performance. Characteristics of latent skills in CDMs are defined through a Q-matrix that specifies which specific skills are relevant for answering a particular exam question. Furthermore, Q-matrix misspecification can affect parameter estimates and respondent classification accuracy (RuppTemplin2008 (28)). Therefore, careful validation of the Q-matrix is crucial to ensure that the model accurately represents the underlying relationships between the test items and the measured attributes. Methods have been developed to estimate and validate expert knowledge. However, despite best intentions, the possibility of CDM misspecification is always present.
The effects of model misspecification in CDMs have been investigated by directly comparing observed frequencies and predicted probabilities (Kunina2012 (23)), by comparing a correctly specified model to a nested model using Wald test methodology (Torre2011 (18)), and comparing selected first and second order observed and expected moments (Chen2018 (16)). These methodologies often face the challenge of test statistics with poor statistical power due to the difficulty of reliably estimating parameters in fully saturated (highly flexible models). For example, consider the problem of comparing the predicted probability of every pattern of responses to an exam with items using a probability model with free parameters. The degrees of freedom (and variance) for the Pearson Goodness-Of-Fit (GOF) chi-squared test statistic increases as an exponential function of the number of items . The M2 statistic (maydeu2014 (26)) provides an improvement in statistical power by only examining the first and second moments resulting in a chi-squared test statistic whose degrees of freedom increase as a quadratic function of the number of items .
Using a different approach, which detects model misspecification by comparing different covariance matrix estimators, Wh82 (30) introduced the Information Matrix Test (IMT) which is a chi-squared test with degrees of freedom. This test is based on comparing the inverse covariance matrix estimators: , derived from the Hessian matrix, and , derived from the outer product gradient (OPG). These estimators are calculated from the second and first derivatives of the log-likelihood function, respectively. The Wh82 (30) IMT test statistic for misspecification detection has a chi-squared distribution whose degrees of freedom increase as a quadratic function of the number of parameters in the model. golden2013 (21, 22) proposed a generalization of the Wh82 (30) IMT framework called the Generalized Information Matrix Test (GIMT) framework. However, no systematic studies have been conducted to specifically investigate the performance of the GIMT in the context of model misspecification detection in CDMs.
presnell2004ios (27) developed and empirically evaluated a statistical test for comparing the ”in-sample” (training data) log-likelihood model fit to the ”out-of-sample” (test data) log-likelihood model fit. Presnell and Boos (2004) referred to their test as the ”in and out of sample” (IOS) test and showed the test statistic had a chi-squared distribution with 1 degree of freedom regardless of model or data complexity. The IOS test may be interpreted as a type of GIMT as described by golden2013 (21, 22) which we call the Determinant GIMT () .
More recently, liu2019 (24) examined the performance of different methods in providing consistent standard errors (SEs) for item parameter estimates for situations where a CDM was misspecified or correctly specified. liu2019 (24) showed a difference among OPG, Hessian, and Robust standard errors when the Q-matrix is misspecified. Although liu2019 (24) had not intended to develop a method for misspecification detection, their empirical results nevertheless support a type of GIMT for misspecification detection as introduced by golden2013 (21, 22).
In this paper, we describe another GIMT methodology that focuses on a single statistic for comparing covariance matrices rather than a comparison based upon their diagonal elements. First, we sketch the mathematical derivation of the asymptotic distribution of the statistic for CDMs using the methods described by golden2013 (21, 22). Second, we empirically investigate the asymptotic behavior of the for CDMs in a series of simulation studies. The simulation studies simulate data sets from a known DINA CDM fit to the Ta84 (29) Fraction-Subtraction data set. Next, the resulting bootstrapped simulated data sets are fit to the DINA CDM which generated the bootstrap data as well as different misspecified versions of the original DINA CDM. The discrimination performance of the is then reported to provide an empirical evaluation of the ability of the to detect model misspecification in a DINA CDM.
0.2 Mathematical Theory
0.2.1 Model Misspecification
In the statistical machine learning framework, the Data Generating Process (DGP) generates observable data directly from an underlying unobservable probability distribution, . A probability model,, is a collection of probability mass functions. If , then is correctly specified; otherwise, is misspecified with respect to . Less formally, a model capable of representing the DGP is called a ”correctly specified model.”
0.2.2 Cognitive Diagnostic Model Specification
Data Set
Consider a scenario where examinees are randomly selected to take a diagnostic test of items. The outcomes of this test are recorded in a binary matrix , which has rows and columns. Each element, , represents the response of examinee to item , with a value of indicating a correct response and indicating an incorrect response. The rows of the matrix X, , are assumed to be a realization of a sequence of independent and identically distributed random vectors with a common probability mass function .
Evidence Model
Let represent the discrete attribute mastery profile for an examinee, where is one if and only if that examinee demonstrates mastery of latent skill . Let represent the th row of the -matrix, where each element, equals one if the th skill is required to answer question and is equal to zero otherwise. Let logistic sigmoidal function be defined such that . Let denote item parameters vector, where and is the main and interaction effect parameters for question , and is the intercept parameter. The probability of a correct response to question , given the mastery profile, item parameters, and the skills required for question , specified by , is calculated using the following formula:
where larger values of indicate
an increased likelihood that item
is correctly answered given
that latent skill pattern . We have implemented the DINA CDM by defining
to calculate the expected response to item .
Let:
The probability of all observed responses for th examinee, , given a specific pattern of latent skills, can be expressed as follows:
Proficiency Model
After specifying the conditional distribution of student responses given latent skill profiles, the subsequent emphasis will be on investigating the joint distribution of attributes. The saturated joint attribute distribution model for all possible values that the -dimensional binary vector can take on requires parameters, so when the number of attributes is moderately large, a more constrained joint attribute model might be desired to represent the joint distribution of . In this paper, a Bernoulli latent skill attribute probability model (e.g., Maris1999 (25)) is assumed where the latent skills are independent so that the probability that the th latent skill, , is present in the attribute pattern is given by the formula:
where can be a free parameter; however, in this paper, is a constant chosen such that the guess probability ) is 0.354. The probability, , of a skill attribute profile, , for an examinee given is given by the formula:
0.2.3 Model Parameter Estimation
The parameter prior for the two-dimensional parameter vector , associated with the th question, is represented by a bivariate Gaussian density, denoted as . This density has a two-dimensional mean vector and a two-dimensional covariance matrix for . It is assumed that the constants are known, and is a positive number. Let . The joint distribution of is specified as the parameter prior:
(1) |
The likelihood function of the response vector of examinee who is assumed to have attribute pattern is given by:
(2) |
yielding the MAP empirical risk function:
(3) |
where
In this study, the MAP empirical risk function (3) involves summing over all possible latent skill attribute patterns. Assume is a twice-continuously differentiable objective function. Once a critical point is reached, then the Hessian of can be evaluated at that point to check if the critical point is a strict local minimizer (e.g., Golden2020 (20)). Here we assume that a parameter estimate has been obtained which is a strict local minimizer of in some (possibly very small) closed, bounded, and convex region of the parameter space for all sufficiently large . Furthermore, we assume that the expected value of , , has a strict global minimizer, , in the interior of . This setup of the problem thus allows for situations where has multiple minimizers, maximizers, and saddlepoints over the entire unrestricted parameter space. Given these assumptions, it can be shown (e.g., Wh82 (30, 20)) that in this case is a consistent estimator of . In addition, for large sample sizes, the effects of the parameter prior becomes negligible and the MAP estimate has the same asymptotic distribution as the maximum likelihood estimate.
0.2.4 Information Matrix Test Methods for Detection of Model Misspecification
Determinant Generalized Information Matrix Test Statistical Theory
In this section, we present explicit details regarding the derivation of the asymptotic distribution of the . Let . Let denote the Hessian of , evaluated at , Let denote evaluated at the point . It is well known (e.g., Wh82 (30, 21, 22)) that if and are positive definite, then the asymptotic distribution of is a multivariate Gaussian with mean and covariance matrix as . In the special case where the model is correctly specified in the sense that the observed data is i.i.d. with a common probability mass function such that: , then the covariance matrix can be computed using either or . It then follows that if is correctly specified with respect to , then = . Consequently, if , then is misspecified with respect to
Let be a continuously differentiable function and assume evaluated at and has full row rank. The function is called a GIMT Hypothesis Function that has the property that if = , then for every symmetric positive definite matrix A and for every symmetric positive definite matrix B. Let following golden2013 (21, 22).
Let denote the Hessian of evaluated at . Let
A GIMT is defined as a test statistic that evaluates the null hypothesis:
The (golden2013 (21); also see Golden2020 (20, 22)) is specified by the GIMT hypothesis function is defined such that: Let the Wald test statistic:
where , If the null hypothesis holds, then converges in distribution to a chi-squared random variable with one degree of freedom as (see Theorem 7 for Golden2016 (22)). If the null hypothesis fails, as with a probability of one (see Theorem 7 for Golden2016 (22)). Here the estimator is an estimate of the asymptotic covariance matrix of , . golden2013 (21, 22) shows how this asymptotic covariance matrix can be estimated using the first, second, and third derivatives of the log-likelihood function. Notice that since the degrees of freedom of the Wald statistic for the is equal to 1, the test statistic has a distribution that is the square of a normalized Gaussian random variable.
0.3 Simulation Study
0.3.1 Dataset
The simulation studies used CDMs with parameters estimated from the Ta84 (29) fraction-subtraction data set. The dataset consists of a set of math problems involving fractions subtraction, designed to assess students’ ability to solve problems involving fractions. In this case, we utilized the ’Fraction.1’ dataset from the ’CDM’ R package (cdmR (19)) containing the dichotomous responses of 536 middle school students over 15 fraction subtraction test items. The -matrix specifying which of five distinct skills were required to answer a particular test item was based upon the -matrix used by Ta84 (29) and provided in cdmR (19).
0.3.2 Methods
First, the described DINA CDM was fit to the data set with a sample size of using the previously described MAP estimation method (see Equation 3). A multivariate Gaussian prior with mean vector and covariance matrix was used for all items (see Equation 1). To ensure guess (g) and slip (s) probabilities both equaled 0.354 for a specific item, the Gaussian Parameter Prior mean vector for that item was set as . An uninformative Gaussian Parameter Prior covariance matrix was chosen to be , where . The fitted DINA CDM will be considered as the Data Generating Process CDM (). To create a scenario in which the CDM is correctly specified, we refitted the initial original DINA CDM by sampling from the original fitted DINA CDM. Five different versions of the original DINA CDM, corresponding to ”misspecified models,” were used. These versions were created by flipping Tatsuoka’s -matrix elements with different probabilities (0%, 1%, 5%, 10%, 15%, and 20%). The 0% misspecification level corresponds to the correctly specified case. For each level of -matrix misspecification, we generate 5 different misspecified Q matrices at that level, allowing for variations in which elements are altered. We refer to a specific way of misspecifying the -matrix at a given level as a ”replication.” Then, we fit each of the 5 replications of the -matrix at a particular misspecification level to 50 bootstrap-generated datasets from the original DINA CDM.
We computed statistics as a function of misspecification level. Then we evaluated the performance of the in classifying correctly specified and misspecified models using the Receiver Operating Characteristic (ROC) curve. The ROC curve plots the true positive rate against the false positive rate at different decision thresholds, allowing for an investigation of the discrimination performance of the . Additionally, the area under the ROC curve (AUROC) can provide a quantitative measure of the discrimination, with a high AUROC value indicating that the effectively distinguishes between correctly specified and misspecified models. Conversely, an AUROC close to 0.5 indicates no discrimination performance.
0.3.3 Results and Discussion
Table 1 displays the AUROC values under various misspecification levels which shows that AUROC values increase as the misspecification level increases. The results of the simulation study are also presented in Figure 1, which displays how the ROC curves evolve at various misspecification levels (0%, 1%, 5%, 10%, 15%, and 20%) with a constant sample size of n=536. At the 20% misspecification level, the ROC curve demonstrates good discrimination performance, closely approaching the upper-left corner.






Misspecification level | AUROC Mean | Lower CI | Upper CI |
---|---|---|---|
Correctly Specified | 0.51 | 0.46 | 0.57 |
1% Misspecification | 0.46 | 0.38 | 0.54 |
5% Misspecification | 0.54 | 0.23 | 0.85 |
10% Misspecification | 0.87 | 0.75 | 0.99 |
15% Misspecification | 0.91 | 0.78 | 1.00 |
20% Misspecification | 0.92 | 0.82 | 1.00 |
When the misspecification level was greater than 5%, the GIMT showed good discrimination performance regardless of the manner in which the Q matrix was decimated. In addition, the GIMT did not detect model misspecification for the correctly specified case (0%) and did not detect model misspecification for the slightly misspecified case (1%). For moderate misspecification levels (5%), however, only two of the five replications showed effective discrimination performance. There was also substantial variability across replications for the 5% case. Currently, we view the 5% level as a ”transition region” and we plan to investigate this further using additional replications. In summary, the new GIMT method for misspecification detection in CDMs appears to be promising. Further research will continue to explore the capabilities and limitations of this approach.
References
- (1) Chen, F., Liu, Y., Xin, T. & Cui, Y. Applying the M2 Statistic to Evaluate the Fit of Diagnostic Classification Models in the Presence of Attribute Hierarchies. Frontiers In Psychology. 9 (2018,10), https://www.frontiersin.org/article/10.3389/fpsyg.2018.01875/full
- (2) de la Torre, J. DINA Model and Parameter Estimation: A Didactic. Journal Of Educational And Behavioral Statistics. 34, 115-130 (2009,3), http://journals.sagepub.com/doi/10.3102/1076998607309474
- (3) de la Torre, J. The Generalized DINA Model Framework. Psychometrika. 76 (2011)
- (4) George, A., Robitzsch, A., Kiefer, T., Groß, J. & Ünlü, A. The R Package CDM for Cognitive Diagnosis Models. Journal Of Statistical Software. 74, 24 (2016), https://www.jstatsoft.org/v074/i02
- (5) Golden, R. Statistical Machine Learning: A unified framework. (CRC Press,2020)
- (6) Golden, R., Henley, S., White, H. & Kashner, T. New Directions in Information Matrix Testing: Eigenspectrum Tests. Recent Advances And Future Directions In Causality, Prediction, And Specification Analysis: Essays In Honor Of Halbert L. White Jr. pp. 145-177 (2013)
- (7) Golden, R., Henley, S., White, H. & Kashner, T. Generalized Information Matrix Tests for Detecting Model Misspecification. Econometrics. 4, 46 (2016)
- (8) Kunina-Habenicht, O., Rupp, A., Wilhelm, O. The Impact of Model Misspecification on Parameter Estimation and Item-Fit Assessment in Log-Linear Diagnostic Classification Models. Journal Of Educational Measurement. 49, 59-81 (2012,3), https://onlinelibrary.wiley.com/doi/10.1111/j.1745-3984.2011.00160.x
- (9) Liu, Y., Xin, T., Andersson, B. & Tian, W. Information matrix estimation procedures for cognitive diagnostic models. Br J Math Stat Psychol. 72, 18-37
- (10) Maris, E. Estimating multiple classification latent class models. Psychometrika. 64 pp. 187-212 (1999)
- (11) Maydeu-Olivares, A. & Joe, H. Assessing Approximate Fit in Categorical Data Analysis. Multivariate Behavioral Research. 49, 305-328 (2014)
- (12) Presnell, B. & Boos, D. The IOS Test for Model Misspecification. Journal Of The American Statistical Association. 99, 216-227 (2004)
- (13) Rupp, A. & Templin, J. The Effects of Q-Matrix Misspecification on Parameter Estimates and Classification Accuracy in the DINA Model. Educational And Psychological Measurement. 68, 78-96 (2008,2), http://journals.sagepub.com/doi/10.1177/0013164407301545
- (14) Tatsuoka, K. Analysis of Errors in Fraction Addition and Subtraction Problems. Final Report for NIE-G-81-0002. (1984)
- (15) White, H. Maximum Likelihood Estimation of Misspecified Models. Econometrica. 50, 1-25 (1982), http://www.jstor.org/stable/1912526
References
- (1) Fu Chen, Yanlou Liu, Tao Xin and Ying Cui “Applying the M2 Statistic to Evaluate the Fit of Diagnostic Classification Models in the Presence of Attribute Hierarchies” In Frontiers in Psychology 9, 2018 DOI: 10.3389/fpsyg.2018.01875
- (2) Jimmy Torre “DINA Model and Parameter Estimation: A Didactic” In Journal of Educational and Behavioral Statistics 34, 2009, pp. 115–130 DOI: 10.3102/1076998607309474
- (3) Jimmy Torre “The Generalized DINA Model Framework” In Psychometrika 76, 2011 DOI: 10.1007/s11336-011-9207-7
- (4) Ann Cathrice George et al. “The R Package CDM for Cognitive Diagnosis Models” In Journal of Statistical Software 74.2, 2016, pp. 1–24 DOI: 10.18637/jss.v074.i02
- (5) Richard Golden “Statistical machine learning: A unified framework” CRC Press, 2020
- (6) Richard Golden, Steven S. Henley, Halbert White and T. Kashner “New directions in information matrix testing: eigenspectrum tests” In Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis: Essays in Honor of Halbert L. White Jr Springer, 2013, pp. 145–177
- (7) Richard Golden, Steven S. Henley, Halbert White and T. Kashner “Generalized Information Matrix Tests for Detecting Model Misspecification” In Econometrics 4.4 MDPI, 2016, pp. 46
- (8) Olga Kunina-Habenicht, André A. Rupp and Oliver Wilhelm “The Impact of Model Misspecification on Parameter Estimation and Item-Fit Assessment in Log-Linear Diagnostic Classification Models” In Journal of Educational Measurement 49, 2012, pp. 59–81 DOI: 10.1111/j.1745-3984.2011.00160.x
- (9) Yanlou Liu, Tao Xin, Björn Andersson and Wei Tian “Information matrix estimation procedures for cognitive diagnostic models” In British Journal of Mathematical and Statistical Psychology 72.1 Wiley Online Library, 2019, pp. 18–37
- (10) Eric Maris “Estimating multiple classification latent class models” In Psychometrika 64 Springer, 1999, pp. 187–212
- (11) Alberto Maydeu-Olivares and Harry Joe “Assessing approximate fit in categorical data analysis” In Multivariate Behavioral Research 49.4 Taylor & Francis, 2014, pp. 305–328
- (12) Brett Presnell and Dennis D Boos “The IOS test for model misspecification” In Journal of the American Statistical Association 99.465 Taylor & Francis, 2004, pp. 216–227
- (13) André A. Rupp and Jonathan Templin “The Effects of Q-Matrix Misspecification on Parameter Estimates and Classification Accuracy in the DINA Model” In Educational and Psychological Measurement 68, 2008, pp. 78–96 DOI: 10.1177/0013164407301545
- (14) K.. Tatsuoka “Analysis of errors in fraction addition and subtraction problems. Final Report for NIE-G-81-0002”, 1984
- (15) Halbert White “Maximum Likelihood Estimation of Misspecified Models” In Econometrica 50.1, 1982, pp. 1–25 DOI: 10.2307/1912526
References
- (1) Fu Chen, Yanlou Liu, Tao Xin and Ying Cui “Applying the M2 Statistic to Evaluate the Fit of Diagnostic Classification Models in the Presence of Attribute Hierarchies” In Frontiers in Psychology 9, 2018 DOI: 10.3389/fpsyg.2018.01875
- (2) Jimmy Torre “DINA Model and Parameter Estimation: A Didactic” In Journal of Educational and Behavioral Statistics 34, 2009, pp. 115–130 DOI: 10.3102/1076998607309474
- (3) Jimmy Torre “The Generalized DINA Model Framework” In Psychometrika 76, 2011 DOI: 10.1007/s11336-011-9207-7
- (4) Ann Cathrice George et al. “The R Package CDM for Cognitive Diagnosis Models” In Journal of Statistical Software 74.2, 2016, pp. 1–24 DOI: 10.18637/jss.v074.i02
- (5) Richard Golden “Statistical machine learning: A unified framework” CRC Press, 2020
- (6) Richard Golden, Steven S. Henley, Halbert White and T. Kashner “New directions in information matrix testing: eigenspectrum tests” In Recent Advances and Future Directions in Causality, Prediction, and Specification Analysis: Essays in Honor of Halbert L. White Jr Springer, 2013, pp. 145–177
- (7) Richard Golden, Steven S. Henley, Halbert White and T. Kashner “Generalized Information Matrix Tests for Detecting Model Misspecification” In Econometrics 4.4 MDPI, 2016, pp. 46
- (8) Olga Kunina-Habenicht, André A. Rupp and Oliver Wilhelm “The Impact of Model Misspecification on Parameter Estimation and Item-Fit Assessment in Log-Linear Diagnostic Classification Models” In Journal of Educational Measurement 49, 2012, pp. 59–81 DOI: 10.1111/j.1745-3984.2011.00160.x
- (9) Yanlou Liu, Tao Xin, Björn Andersson and Wei Tian “Information matrix estimation procedures for cognitive diagnostic models” In British Journal of Mathematical and Statistical Psychology 72.1 Wiley Online Library, 2019, pp. 18–37
- (10) Eric Maris “Estimating multiple classification latent class models” In Psychometrika 64 Springer, 1999, pp. 187–212
- (11) Alberto Maydeu-Olivares and Harry Joe “Assessing approximate fit in categorical data analysis” In Multivariate Behavioral Research 49.4 Taylor & Francis, 2014, pp. 305–328
- (12) Brett Presnell and Dennis D Boos “The IOS test for model misspecification” In Journal of the American Statistical Association 99.465 Taylor & Francis, 2004, pp. 216–227
- (13) André A. Rupp and Jonathan Templin “The Effects of Q-Matrix Misspecification on Parameter Estimates and Classification Accuracy in the DINA Model” In Educational and Psychological Measurement 68, 2008, pp. 78–96 DOI: 10.1177/0013164407301545
- (14) K.. Tatsuoka “Analysis of errors in fraction addition and subtraction problems. Final Report for NIE-G-81-0002”, 1984
- (15) Halbert White “Maximum Likelihood Estimation of Misspecified Models” In Econometrica 50.1, 1982, pp. 1–25 DOI: 10.2307/1912526