Identifiability of the Multinomial Processing Tree-IRT model for the Philadelphia Naming Test
Technical Report
2 Portland State University, Department of Mathematics and Statistics
3Portland State University, Department of Speech and Hearing Sciences
4Geriatric Research, Education, and Clinical Center, VA Pittsburgh Healthcare System
5Audiology and Speech Pathology Service, VA Pittsburgh Healthcare System
6Department of Communication Science and Disorders, University of Pittsburgh
)
1 Abstract
Naming tests represent an essential tool in gauging the severity of aphasia and monitoring the trajectory of recovery for individuals afflicted with this debilitating condition. In these assessments, patients are presented with images corresponding to common nouns, and their responses are evaluated for accuracy. The Philadelphia Naming Test (PNT) stands as a paragon in this domain, offering nuanced insights into the type of errors made in responses. In a groundbreaking advancement, Walker et al. (2018) introduced a model rooted in Item Response Theory and multinomial processing trees (IRT-MPT). This innovative approach sought to unravel the intricate mechanisms underlying the various errors patients make when responding to an item, seeking to pinpoint the specific stage of word production where a patient’s capability falters. However, given the sophisticated nature of the IRT-MPT model proposed by Walker et al. (2018), it is imperative to scrutinize both its conceptual as well as its statistical validity. Our endeavor here is to closely examine the model’s formulation to ensure its parameters are identifiable as a first step in evaluating its validity.
2 Understanding the Problem
Naming tests represent an essential tool in gauging the severity of aphasia and monitoring the trajectory of recovery for individuals afflicted with this debilitating condition. In these assessments, patients are presented with images corresponding to common nouns, and their responses are meticulously evaluated for accuracy. The Philadelphia Naming Test (PNT) stands as a paragon in this domain, offering nuanced insights into the type of errors made in responses. In a groundbreaking advancement, (Walker et al.,, 2018) introduced a model rooted in Item Response Theory and multinomial processing trees (IRT-MPT). This innovative approach sought to unravel the intricate mechanisms underlying the various errors patients make when responding to an item. By acknowledging the heterogeneity of both items and individuals, this model aims to identify the specific stage of word production where a patient’s capability falters by estimating multiple latent parameters for patients to more precisely determine at which step of word of production a patient’s ability has been affected. These latent parameters aspire to provide a representation of the theoretical cognitive steps taken in responding to an item, as shown in Figure 1. Given the complexity of the model proposed in (Walker et al.,, 2018), here we investigate the identifiability/estimability of the parameters included in the model referenced above.
The probabilities for an edge traversal to an observable node in Figure 1 can either be global, local to respondent, local to item, or local to both. These probabilities arise from IRT models and so usually take the form of some link function evaluated at the difference between respondent ability and item difficulty. There are redundancies in the probabilities that lead to the model only using eight probabilities to describe the sixteen path probabilities. The model is completed with the specification of a distribution on respondent ability, item difficulty, or global probabilities. These distributions all take the form of a fixed distribution without any hyperparameters. The specification of the model proposed by (Walker et al.,, 2018) is that of a fixed effects model given that there is no sharing of information across parameters within or between individuals or items. The prior imposes regularization, which renders all parameters estimable in the sense that a proper posterior distribution is achieved. However, this does not mean that the parameters are identifiable. There are eight parameters and eight categorical observables, which suggests an over-parameterized and unidentifiable model.
Given the direct linking of the observable nodes to only some of the possible latent processes and the multiplicity of the linking of several outcomes to latent processes that are higher in the process hierarchy, it is reasonable to be worried about potential identifiability issues. Below we reproduce in Figure 1 the original diagram of the IRT-MPT model, and in Figure 2 we provide a modified version of the diagram (Figure 2) that better demonstrates the potential for identifiability issues. In Figure 2 we collapse the multiplicities of the observable nodes that appear in Figure 1 while conveying the same information as a directed acyclic graph (DAG). Instead of appearing as leaves in the tree as they are in Figure 1, the observable nodes are terminal nodes of paths in the DAG in Figure 2. This DAG has a unique root node (Attempt) and eight terminal nodes (the observable categories). Four observables (NA, S, M, C) have unique directed paths from the root, one observable (AN) has two length paths from the root , two observables (U, N) have three paths from the root, and one terminal node has a four paths from the root.
Identifiability and estimability in the context of this model need to be addressed individually. Identifiability is defined by different parameter settings leading to different distributions on the sample space of observable data. This is an inherent trait of the model and is tied to whether the parameters of the model could be perfectly reconstructed over an infinite number of independent and identical draws from an instance (particular choice of parameter values) of the process. Estimability is tied more closely to the observed data. The parameter is estimable from the observed data if derived estimates of the parameter are unique. This is also strongly connected to what is being estimated and what the estimation methodology is. In a purely likelihood context, this is usually associated with the uniqueness of the maximum likelihood estimator. In the context of Bayes estimators, prior assumptions often penalize all points on a likelihood ridge differently, which leads to unique estimators due solely to prior influence. Because of this, and the general arbitrariness of prior assumptions, it is often best to determine Bayes estimability in an objective prior framework (for instance under a Jeffreys’ prior). However, this often reduces to the same problem one would be addressing when investigating estimability using likelihoods and maximum likelihood estimators.
3 Model Architecture
Figure 2 shows the structure of the MPT model; there the probability of successfully performing each of the processes is denoted by the probabilities , where indexes the cognitive process, the respondent and the word item. The probabilities for (Attempt, LexSem, LexPhon, LexSel, and Phon, respectively) are defined in terms of respondent skill and item difficulties by . For respondents evaluated on a set of common items, this leads to a dimensional parameter space (five and one per respondent, five and one per item, and one global ). Given the fact that the number of observed values, , is greater than the number of parameters, , this suggests an identifiable model so long as there are no structural identification issues in the model.
One structural issue that is common in IRT models is that adding a constant to all and for a given process s leads to identical latent probabilities. This is usually ameliorated by forcing the average of either the respondent skills or the item difficulties to be 0 for each process . The observed outcome is dependent on which processes the subject performs correctly, and corresponds to one of eight possible response categories: Correct (C), Semantic (S), Formal (F), Mixed (M), Unrelated (U), Neologism (N), Abstruse Neologism (AN), and Non-naming Attempt (NA).
In the model, the respondent-item dependent probabilities are represented in terms of the ability , item difficulty , and include an intercept term , such that is given by
(1) |
and specify two points in to be equivalent if
(2) |
for any . Furthermore, we impose two linear restrictions on . The standard choice for representing the equivalence class is to make the restrictions
(3) |
for each we have a dimensional parameter. This provides parameters for the that depend of and . The total number of parameters is total parameters.
4 Basic probability equations to use
(4a) | |||||
(4b) | |||||
(4c) | |||||
(4d) | |||||
(4e) | |||||
(4f) | |||||
(4g) |
(5a) | |||||
(5b) | |||||
(5c) | |||||
(5d) | |||||
(5e) | |||||
(5f) | |||||
(5g) |
(6a) | |||||
(6b) | |||||
(6c) | |||||
(6d) |
5 Identifiability analysis at a high level
The first thing to notice is that the mapping
is injective. So if we get equalities of the and , then we get equalities of
and .
From (7a), we get for all and so .
From (7b), we get for all and so .
Define so that and so .
From (7g), we get that
for all . We get
Another restriction on that is necessary is for all .
From (7f), we get that
(8) |
for all . Another restriction on that is for all .
The last equality is equivalent to
We plug this into (7e) and get
(9) |
This tells us that
This is always between zero and one, and so no new restriction on is needed.
Plugging these into (7d), we get
(10) |
We get a further restriction on
Using (7c), we get
(11) |
We get one final restriction on
We get a range of values for that could possibly admit a transformation
6 The details of parameter restrictions
From (7f), we get that
(12) |
for all .
We want to see if we can get a contradiction of the existence of this mapping or get a restriction on the values of the parameter that can admit such a mapping.
The former will provide identifiability
and the latter will provide identifiability
for parameter values in a set of full Lebesgue measure.
(12) is equivalent to
and so
(13) |
For different respondents and doing the same item this means that
(14) |
where the right hand side cannot depend on because the left hand side does not depend on .
For the same respondent on different items, we have
(15) |
where the right hand side cannot depend on because the left hand side does not depend on .
What are the implications of (14) and (15)?
First, one way for (14) to hold for all is for and (14) reduces to . This means that in
(13) we have
(16) |
We must have
and
These definitions provide (15) in this case and so we apparently have a possible mapping.
Second, if we do not assume that , then we can instead assume that . This provides (15) and
(17) |
from (13). We must have
and
These definitions provide (14) in this case and so we apparently have a possible mapping.
Third, let us assume that neither of the above cases hold. Then we should be able to reach a contradiction.
Let and be such that and and be such that . Then we must have
and so
which is an equation of the form
isolating we get
and so requires and requires . Because and were arbitrary, we must be in one of the two cases already discussed.
6.1 Assuming and
We work through implications for other part of the parameter space from the assumption of this subcase and the existence of an identifiability issue. For this analysis, we will let and because neither can depend on .
6.1.1 Implications for
From (7e), we have
For different values and , we have to have
So we have
Taking the geometric average over , we get
and so for all .
Now, we have
and taking the geometric average over provides
and so we also have
Finally, we get
and so, as expected, we get
and we have described the mapping on the parameter space that allows this to happen. Notice that the mapping is all about fixing the parameter that changes with and allows the parameters that depend on to change in a way that is consistent with the mapping.
6.1.2 Implications for
From (7c), we have
which is
and so
depends only on . Rewriting a bit we get
and the right hand side can’t depend on and so we need . We get
and we get
and
Taking geometric average over , we get
and so we also get
Of course, we could have just noted that and and get
6.1.3 Implications for
From (7d) we get
which is
The left hand side does not depend on , and so for and we have
and this gives
and so for any and any and we must have
We get that and we could just an easily call and . The implication is that
Because the left hand side does not depend on and the right hand side does not depend on , there must be a constant such that
for all and . Now, this makes a restriction on in terms of , , and .
and so
and of course
Now, we need
which gives
We also need
and so
6.1.4 Pulling this all together
If an identifiability issue exists, then we must have room for some , which means
There must be some that provides
The way to generate such a parameter vector is to generate , , , and that depend only on ; that depends only on ; and that depend on and ; generate a in the range
where we have taken in the range upper bound calculation so that we can define in terms of , , and ; generate in the range
and finally computing
As such, this indicates that the model is not identifiable. Although there exist two additional cases, namely when and and when and , their analysis is analogous to that of the case presented above, and it is therefore omitted for succinctness from this report.
Acknowledgements
AW, DTR, GF, and WH were partially supported by NIH award 5R01DC018813-04. DTR was also partially supported by NSF RTG DMS award 2136228.
References
- Fergadiotis et al., (2015) Fergadiotis, G., Kellough, S., and Hula, W. D. (2015). Item response theory modeling of the philadelphia naming test. Journal of Speech, Language, and Hearing Research, 58(3):865–877.
- Walker et al., (2018) Walker, G. M., Hickok, G., and Fridriksson, J. (2018). A cognitive psychometric model for assessment of picture naming abilities in aphasia. Psychological assessment, 30(6):809.