Identifiability of the Multinomial Processing Tree-IRT model for the Philadelphia Naming Test Technical Report

1 Abstract

Naming tests represent an essential tool in gauging the severity of aphasia and monitoring the trajectory of recovery for individuals afflicted with this debilitating condition. In these assessments, patients are presented with images corresponding to common nouns, and their responses are evaluated for accuracy. The Philadelphia Naming Test (PNT) stands as a paragon in this domain, offering nuanced insights into the type of errors made in responses. In a groundbreaking advancement, Walker et al. (2018) introduced a model rooted in Item Response Theory and multinomial processing trees (IRT-MPT). This innovative approach sought to unravel the intricate mechanisms underlying the various errors patients make when responding to an item, seeking to pinpoint the specific stage of word production where a patient’s capability falters. However, given the sophisticated nature of the IRT-MPT model proposed by Walker et al. (2018), it is imperative to scrutinize both its conceptual as well as its statistical validity. Our endeavor here is to closely examine the model’s formulation to ensure its parameters are identifiable as a first step in evaluating its validity.

2 Understanding the Problem

Naming tests represent an essential tool in gauging the severity of aphasia and monitoring the trajectory of recovery for individuals afflicted with this debilitating condition. In these assessments, patients are presented with images corresponding to common nouns, and their responses are meticulously evaluated for accuracy. The Philadelphia Naming Test (PNT) stands as a paragon in this domain, offering nuanced insights into the type of errors made in responses. In a groundbreaking advancement, (Walker et al.,, 2018) introduced a model rooted in Item Response Theory and multinomial processing trees (IRT-MPT). This innovative approach sought to unravel the intricate mechanisms underlying the various errors patients make when responding to an item. By acknowledging the heterogeneity of both items and individuals, this model aims to identify the specific stage of word production where a patient’s capability falters by estimating multiple latent parameters for patients to more precisely determine at which step of word of production a patient’s ability has been affected. These latent parameters aspire to provide a representation of the theoretical cognitive steps taken in responding to an item, as shown in Figure 1. Given the complexity of the model proposed in (Walker et al.,, 2018), here we investigate the identifiability/estimability of the parameters included in the model referenced above.

The probabilities for an edge traversal to an observable node in Figure 1 can either be global, local to respondent, local to item, or local to both. These probabilities arise from IRT models and so usually take the form of some link function evaluated at the difference between respondent ability and item difficulty. There are redundancies in the probabilities that lead to the model only using eight probabilities to describe the sixteen path probabilities. The model is completed with the specification of a distribution on respondent ability, item difficulty, or global probabilities. These distributions all take the form of a fixed distribution without any hyperparameters. The specification of the model proposed by (Walker et al.,, 2018) is that of a fixed effects model given that there is no sharing of information across parameters within or between individuals or items. The prior imposes regularization, which renders all parameters estimable in the sense that a proper posterior distribution is achieved. However, this does not mean that the parameters are identifiable. There are eight parameters and eight categorical observables, which suggests an over-parameterized and unidentifiable model.

Given the direct linking of the observable nodes to only some of the possible latent processes and the multiplicity of the linking of several outcomes to latent processes that are higher in the process hierarchy, it is reasonable to be worried about potential identifiability issues. Below we reproduce in Figure 1 the original diagram of the IRT-MPT model, and in Figure 2 we provide a modified version of the diagram (Figure 2) that better demonstrates the potential for identifiability issues. In Figure 2 we collapse the multiplicities of the observable nodes that appear in Figure 1 while conveying the same information as a directed acyclic graph (DAG). Instead of appearing as leaves in the tree as they are in Figure 1, the observable nodes are terminal nodes of paths in the DAG in Figure 2. This DAG has a unique root node (Attempt) and eight terminal nodes (the observable categories). Four observables (NA, S, M, C) have unique directed paths from the root, one observable (AN) has two length paths from the root , two observables (U, N) have three paths from the root, and one terminal node has a four paths from the root.

Figure 1: Original IRT-MPT model graph from Walker. Observable states have sharp corners and latent states have rounded corners.

Figure 2: Modified IRT-MPT model graph. Observable states have sharp corners and latent states have rounded corners.

Identifiability and estimability in the context of this model need to be addressed individually. Identifiability is defined by different parameter settings leading to different distributions on the sample space of observable data. This is an inherent trait of the model and is tied to whether the parameters of the model could be perfectly reconstructed over an infinite number of independent and identical draws from an instance (particular choice of parameter values) of the process. Estimability is tied more closely to the observed data. The parameter is estimable from the observed data if derived estimates of the parameter are unique. This is also strongly connected to what is being estimated and what the estimation methodology is. In a purely likelihood context, this is usually associated with the uniqueness of the maximum likelihood estimator. In the context of Bayes estimators, prior assumptions often penalize all points on a likelihood ridge differently, which leads to unique estimators due solely to prior influence. Because of this, and the general arbitrariness of prior assumptions, it is often best to determine Bayes estimability in an objective prior framework (for instance under a Jeffreys’ prior). However, this often reduces to the same problem one would be addressing when investigating estimability using likelihoods and maximum likelihood estimators.

3 Model Architecture

Figure 2 shows the structure of the MPT model; there the probability of successfully performing each of the processes is denoted by the probabilities $\psi_{stk}$ , where $s=1,2,\ldots,8$ indexes the cognitive process, $t$ the respondent and $k$ the word item. The probabilities for $s\in\{1,3,4,5,6\}$ (Attempt, LexSem, LexPhon, LexSel, and Phon, respectively) are defined in terms of respondent skill $\theta_{ts}$ and item difficulties $\delta_{ks}$ by $\log(\psi_{stk}/(1-\psi_{stk}))=\theta_{ts}-\delta_{ks}$ . For $T$ respondents evaluated on a set of $K$ common items, this leads to a $6\times(T+K)+1$ dimensional parameter space (five $\theta_{ts}$ and one $\psi_{2t}$ per respondent, five $\delta_{ks}$ and one $\psi_{7k}$ per item, and one global $\psi_{8}$ ). Given the fact that the number of observed values, $T\times K$ , is greater than the number of parameters, $6\times(T+K)+1$ , this suggests an identifiable model so long as there are no structural identification issues in the model.

One structural issue that is common in IRT models is that adding a constant to all $\theta_{ts}$ and $\delta_{ks}$ for a given process s leads to identical latent probabilities. This is usually ameliorated by forcing the average of either the respondent skills or the item difficulties to be 0 for each process $s$ . The observed outcome is dependent on which processes the subject performs correctly, and corresponds to one of eight possible response categories: Correct (C), Semantic (S), Formal (F), Mixed (M), Unrelated (U), Neologism (N), Abstruse Neologism (AN), and Non-naming Attempt (NA).

In the model, the respondent-item dependent probabilities $\psi_{stk}$ are represented in terms of the ability $\theta_{ts}$ , item difficulty $\delta_{ks}$ , and include an intercept term $\beta_{s}$ , such that $\psi_{stk}$ is given by

\log\left(\frac{\psi_{stk}}{1-\psi_{stk}}\right)=\theta_{ts}-\delta_{ks}+\beta_{s}

(1)

and specify two points in $\mathbb{R}^{T+K+1}$ to be equivalent if

\left(\boldsymbol{\theta}_{\cdot s}^{\prime},\boldsymbol{\delta}_{\cdot s}^{\prime},\beta_{s}^{\prime}\right)-\left(\boldsymbol{\theta}_{\cdot s},\boldsymbol{\delta}_{\cdot s},\beta_{s}\right)=u_{s}(\boldsymbol{1}_{T},\boldsymbol{0}_{K},0)+v_{s}(\boldsymbol{0}_{T},\boldsymbol{1}_{K},0)+(v_{s}-u_{s})(\boldsymbol{0}_{T},\boldsymbol{0}_{K},1)

(2)

for any $(u_{s},v_{s})\in\mathbb{R}^{2}$ . Furthermore, we impose two linear restrictions on $\mathbb{R}^{T+K+1}$ . The standard choice for representing the equivalence class is to make the restrictions

\sum_{t=1}^{T}\theta_{ts}=0\quad\text{and}\quad\sum_{k=1}^{K}\delta_{ks}=0.

(3)

for each $s\in\{1,3,\ldots,6\}$ we have a $T+K-1$ dimensional parameter. This provides $5(T+K-1)$ parameters for the $\psi$ that depend of $t$ and $k$ . The total number of parameters is $\dim_{\text{model}}=5(T+K-1)+T+K+1=6T+6K-4$ total parameters.

4 Basic probability equations to use


$\displaystyle p_{1tk}$	$\displaystyle=$	$\displaystyle P(R_{tk}\neq NA)$	(4a)
$\displaystyle p_{2tk}$	$\displaystyle=$	$\displaystyle P(R_{tk}\notin\{AN,U,S\}\|R_{tk}\neq NA)$	(4b)
$\displaystyle p_{3tk}$	$\displaystyle=$	$\displaystyle P(R_{tk}=S\|R_{tk}\in\{AN,U,S\})$	(4c)
$\displaystyle p_{4tk}$	$\displaystyle=$	$\displaystyle P(R_{tk}=AN\|R_{tk}\in\{AN,U,S\})$	(4d)
$\displaystyle p_{5tk}$	$\displaystyle=$	$\displaystyle P(R_{tk}=C\|R_{tk}\notin\{NA,AN,U,S\})$	(4e)
$\displaystyle p_{6tk}$	$\displaystyle=$	$\displaystyle P(R_{tk}=M\|R_{tk}\notin\{NA,AN,U,S\})$	(4f)
$\displaystyle p_{7tk}$	$\displaystyle=$	$\displaystyle P(R_{tk}=N\|R_{tk}\notin\{NA,AN,U,S\})$	(4g)


$\displaystyle p_{1tk}$	$\displaystyle=$	$\displaystyle\psi_{1tk}$	(5a)
$\displaystyle p_{2tk}$	$\displaystyle=$	$\displaystyle\psi_{2t}\psi_{3tk}$	(5b)
$\displaystyle p_{3tk}$	$\displaystyle=$	$\displaystyle\frac{\psi_{2t}(1-\psi_{3tk})\psi_{6tk}}{1-p_{2tk}}$	(5c)
$\displaystyle p_{4tk}$	$\displaystyle=$	$\displaystyle(1-\psi_{6tk})(1-\psi_{8})$	(5d)
$\displaystyle p_{5tk}$	$\displaystyle=$	$\displaystyle\psi_{4tk}\psi_{5tk}\psi_{6tk}$	(5e)
$\displaystyle p_{6tk}$	$\displaystyle=$	$\displaystyle\psi_{4tk}(1-\psi_{5tk})\psi_{6tk}$	(5f)
$\displaystyle p_{7tk}$	$\displaystyle=$	$\displaystyle(1-\psi_{6tk})(1-\psi_{7k})$	(5g)


$\displaystyle\frac{p_{5tk}}{p_{6tk}}$	$\displaystyle=$	$\displaystyle\frac{\psi_{5tk}}{1-\psi_{5tk}}$	(6a)
$\displaystyle p_{5tk}+p_{6tk}$	$\displaystyle=$	$\displaystyle\psi_{4tk}\psi_{6tk}$	(6b)
$\displaystyle\frac{(1-p_{2tk})p_{3tk}}{p_{2tk}}$	$\displaystyle=$	$\displaystyle\frac{(1-\psi_{3tk})\psi_{6tk}}{\psi_{3tk}}$	(6c)
$\displaystyle\frac{p_{7tk}}{p_{4tk}}$	$\displaystyle=$	$\displaystyle\frac{1-\psi_{7k}}{1-\psi_{8}}$	(6d)

The equations that might even be better for going after the problem are (6a)-(6d) with (5a), (5c), and (5d). If we have different parameter vectors $\boldsymbol{\omega}$ and $\boldsymbol{\omega}^{\prime}$ that lead to the same probabilities, then the left hand sides of these equations are the same. We get


$\displaystyle\psi_{1tk}$	$\displaystyle=$	$\displaystyle\psi_{1tk}^{\prime}$	(7a)
$\displaystyle\frac{\psi_{5tk}}{1-\psi_{5tk}}$	$\displaystyle=$	$\displaystyle\frac{\psi_{5tk}^{\prime}}{1-\psi_{5tk}^{\prime}}$	(7b)
$\displaystyle\psi_{4tk}\psi_{6tk}$	$\displaystyle=$	$\displaystyle\psi_{4tk}^{\prime}\psi_{6tk}^{\prime}$	(7c)
$\displaystyle\psi_{2t}\psi_{3tk}$	$\displaystyle=$	$\displaystyle\psi_{2t}^{\prime}\psi_{3tk}^{\prime}$	(7d)
$\displaystyle\frac{(1-\psi_{3tk})\psi_{6tk}}{\psi_{3tk}}$	$\displaystyle=$	$\displaystyle\frac{\left(1-\psi_{3tk}^{\prime}\right)\psi_{6tk}^{\prime}}{\psi_{3tk}^{\prime}}$	(7e)
$\displaystyle(1-\psi_{6tk})(1-\psi_{8})$	$\displaystyle=$	$\displaystyle\left(1-\psi_{6tk}^{\prime}\right)\left(1-\psi_{8}^{\prime}\right)$	(7f)
$\displaystyle\frac{1-\psi_{7k}}{1-\psi_{8}}$	$\displaystyle=$	$\displaystyle\frac{1-\psi_{7k}^{\prime}}{1-\psi_{8}^{\prime}}$	(7g)

5 Identifiability analysis at a high level

The first thing to notice is that the mapping $\left(\boldsymbol{\theta}_{\cdot s},\boldsymbol{\delta}_{\cdot s},\beta_{s}\right)\mapsto\boldsymbol{\psi}_{s\cdot\cdot}$ is injective. So if we get equalities of the $\boldsymbol{\psi}_{s\cdot\cdot}$ and $\boldsymbol{\psi}_{s\cdot\cdot}^{\prime}$ , then we get equalities of $\left(\boldsymbol{\theta}_{\cdot s},\boldsymbol{\delta}_{\cdot s},\beta_{s}\right)$ and $\left(\boldsymbol{\theta}_{\cdot s}^{\prime},\boldsymbol{\delta}_{\cdot s}^{\prime},\beta_{s}^{\prime}\right)$ .
From (7a), we get $\psi_{1tk}^{\prime}=\psi_{1tk}$ for all $(t,k)$ and so $\left(\boldsymbol{\theta}_{\cdot 1}^{\prime},\boldsymbol{\delta}_{\cdot 1}^{\prime},\beta_{1}^{\prime}\right)=\left(\boldsymbol{\theta}_{\cdot 1},\boldsymbol{\delta}_{\cdot 1},\beta_{1}\right)$ .
From (7b), we get $\psi_{5tk}^{\prime}=\psi_{5tk}$ for all $(t,k)$ and so $\left(\boldsymbol{\theta}_{\cdot 5}^{\prime},\boldsymbol{\delta}_{\cdot 5}^{\prime},\beta_{5}^{\prime}\right)=\left(\boldsymbol{\theta}_{\cdot 5},\boldsymbol{\delta}_{\cdot 5},\beta_{5}\right)$ .
Define $\eta=\frac{1-\psi_{8}}{1-\psi_{8}^{\prime}}$ so that $\psi_{8}^{\prime}=1-\frac{1}{\eta}+\frac{\psi_{8}}{\eta}$ and so $\eta>1-\psi_{8}$ .
From (7g), we get that

\eta=\frac{1-\psi_{7k}}{1-\psi_{7k}^{\prime}}

for all $k$ . We get

\psi_{7k}^{\prime}=1-\frac{1}{\eta}+\frac{\psi_{7k}}{\eta}

Another restriction on $\eta$ that is necessary is $\eta>1-\psi_{7k}$ for all $k$ .
From (7f), we get that

\eta=\frac{1-\psi_{6tk}^{\prime}}{1-\psi_{6tk}}

(8)

for all $(t,k)$ . Another restriction on $\eta$ that is $0<\eta<\frac{1}{1-\psi_{6tk}}$ for all $(t,k)$ .
The last equality is equivalent to

\psi_{6tk}^{\prime}=1-\left(1-\psi_{6tk}\right)\eta=1-\eta+\eta\psi_{6tk}

We plug this into (7e) and get

\frac{(1-\psi_{3tk})\psi_{6tk}}{\psi_{3tk}}=\frac{\left(1-\psi_{3tk}^{\prime}\right)\psi_{6tk}^{\prime}}{\psi_{3tk}^{\prime}}=\frac{\left(1-\psi_{3tk}^{\prime}\right)\left(1-\eta+\eta\psi_{6tk}\right)}{\psi_{3tk}^{\prime}}

(9)

This tells us that

\begin{array}[]{rcl}\psi_{3tk}^{\prime}&=&\left(1+\frac{(1-\psi_{3tk})\psi_{6tk}}{\psi_{3tk}\left(1-\eta+\eta\psi_{6tk}\right)}\right)^{-1}\\ &=&\frac{\psi_{3tk}\left(1-\eta+\eta\psi_{6tk}\right)}{\psi_{3tk}\left(1-\eta+\eta\psi_{6tk}\right)+(1-\psi_{3tk})\psi_{6tk}}\\ &=&\frac{\psi_{3tk}\left(1-\eta+\eta\psi_{6tk}\right)}{\psi_{3tk}\left(1-\eta\right)\left(1-\psi_{6tk}\right)+\psi_{6tk}}\end{array}

This $\psi_{3tk}$ is always between zero and one, and so no new restriction on $\eta$ is needed.
Plugging these into (7d), we get

\begin{array}[]{rcl}\psi_{2t}^{\prime}&=&\frac{\psi_{2t}\psi_{3tk}}{\psi_{3tk}^{\prime}}\\ &=&\psi_{2t}\psi_{3tk}\frac{\psi_{3tk}\left(1-\eta+\eta\psi_{6tk}\right)+(1-\psi_{3tk})\psi_{6tk}}{\psi_{3tk}\left(1-\eta+\eta\psi_{6tk}\right)}\\ &=&\psi_{2t}\psi_{3tk}+\psi_{2t}(1-\psi_{3tk})\frac{\psi_{6tk}}{1-\eta+\eta\psi_{6tk}}\end{array}

(10)

We get a further restriction on $\eta$

\begin{array}[]{rcl}1&>&\psi_{2t}\psi_{3tk}+\psi_{2t}(1-\psi_{3tk})\frac{\psi_{6tk}}{1-\eta+\eta\psi_{6tk}}\\ \frac{1-\psi_{2t}\psi_{3tk}}{\psi_{2t}(1-\psi_{3tk})\psi_{6tk}}&>&\frac{1}{1-\eta+\eta\psi_{6tk}}\\ \frac{\psi_{2t}(1-\psi_{3tk})\psi_{6tk}}{1-\psi_{2t}\psi_{3tk}}&<&1-\eta+\eta\psi_{6tk}\\ \eta(1-\psi_{6tk})&<&1-\frac{\psi_{2t}(1-\psi_{3tk})\psi_{6tk}}{1-\psi_{2t}\psi_{3tk}}\\ \eta&<&\frac{1-\frac{\psi_{2t}(1-\psi_{3tk})\psi_{6tk}}{1-\psi_{2t}\psi_{3tk}}}{1-\psi_{6tk}}\\ \eta&<&\frac{1-\psi_{2t}\psi_{3tk}-\psi_{2t}(1-\psi_{3tk})\psi_{6tk}}{(1-\psi_{2t}\psi_{3tk})(1-\psi_{6tk})}\\ \eta&<&\frac{1-\psi_{2t}(\psi_{3tk}+(1-\psi_{3tk})\psi_{6tk})}{(1-\psi_{2t}\psi_{3tk})(1-\psi_{6tk})}\end{array}

Using (7c), we get

\psi_{4tk}^{\prime}=\frac{\psi_{4tk}\psi_{6tk}}{1-\eta+\eta\psi_{6tk}}

(11)

We get one final restriction on $\eta$

\eta<\frac{\left(1-\psi_{4tk}\psi_{6tk}\right)}{\left(1-\psi_{6tk}\right)}

We get a range of values for $\boldsymbol{\omega}$ that could possibly admit a transformation

\max\left\{1-\psi_{8},\max_{k}\{1-\psi_{7k}\}\right\}<\eta<\min_{t,k}\left\{\frac{1-\psi_{6tk}\max\left\{\psi_{4tk},\frac{\psi_{2t}-\psi_{2t}\psi_{3tk}}{1-\psi_{2t}\psi_{3tk}}\right\}}{1-\psi_{6tk}}\right\}

6 The details of parameter restrictions

From (7f), we get that

\eta=\frac{1-\psi_{6tk}^{\prime}}{1-\psi_{6tk}}

(12)

for all $(t,k)$ .
We want to see if we can get a contradiction of the existence of this mapping or get a restriction on the values of the parameter that can admit such a mapping. The former will provide identifiability and the latter will provide identifiability for parameter values in a set of full Lebesgue measure.
(12) is equivalent to

\frac{\psi_{6tk}^{\prime}}{1-\psi_{6tk}^{\prime}}=\frac{1-\left(1-\psi_{6tk}\right)\eta}{\left(1-\psi_{6tk}\right)\eta}=\frac{\frac{1}{\left(1-\psi_{6tk}\right)}-\eta}{\eta}=\frac{1+\frac{\psi_{6tk}}{\left(1-\psi_{6tk}\right)}-\eta}{\eta}

and so

\exp\left(\theta_{t6}^{\prime}-\delta_{k6}^{\prime}+\beta_{6}^{\prime}\right)=\frac{1+\exp\left(\theta_{t6}-\delta_{k6}+\beta_{6}\right)-\eta}{\eta}

(13)

For different respondents $t_{1}$ and $t_{2}$ doing the same item $k$ this means that

\exp\left(\theta_{t_{1}6}^{\prime}-\theta_{t_{2}6}^{\prime}\right)=\frac{1+\exp\left(\theta_{t_{1}6}-\delta_{k6}+\beta_{6}\right)-\eta}{1+\exp\left(\theta_{t_{2}6}-\delta_{k6}+\beta_{6}\right)-\eta}

(14)

where the right hand side cannot depend on $k$ because the left hand side does not depend on $k$ .
For the same respondent on different items, we have

\exp\left(\delta_{k_{2}6}^{\prime}-\delta_{k_{1}6}^{\prime}\right)=\frac{1+\exp\left(\theta_{t6}-\delta_{k_{1}6}+\beta_{6}\right)-\eta}{1+\exp\left(\theta_{t6}-\delta_{k_{2}6}+\beta_{6}\right)-\eta}

(15)

where the right hand side cannot depend on $t$ because the left hand side does not depend on $t$ .
What are the implications of (14) and (15)?
First, one way for (14) to hold for all $k$ is for $\boldsymbol{\theta}_{\cdot 6}=\boldsymbol{\theta}_{\cdot 6}^{\prime}=\boldsymbol{0}$ and (14) reduces to $1=1$ . This means that in (13) we have

\exp\left(-\delta_{k6}^{\prime}+\beta_{6}^{\prime}\right)=\frac{1+\exp\left(-\delta_{k6}+\beta_{6}\right)-\eta}{\eta}

(16)

We must have

\exp\left(K\beta_{6}^{\prime}\right)=\prod_{j=1}^{K}\frac{1+\exp\left(-\delta_{j6}+\beta_{6}\right)-\eta}{\eta}

and

\exp\left(-\delta_{k6}^{\prime}\right)=\frac{\frac{1+\exp\left(-\delta_{k6}+\beta_{6}\right)-\eta}{\eta}}{\left(\prod_{j=1}^{K}\frac{1+\exp\left(-\delta_{j6}+\beta_{6}\right)-\eta}{\eta}\right)^{\frac{1}{K}}}=\frac{1+\exp\left(-\delta_{k6}+\beta_{6}\right)-\eta}{\left(\prod_{j=1}^{K}1+\exp\left(-\delta_{j6}+\beta_{6}\right)-\eta\right)^{\frac{1}{K}}}

These definitions provide (15) in this case and so we apparently have a possible mapping.
Second, if we do not assume that $\boldsymbol{\theta}_{\cdot 6}=\boldsymbol{\theta}_{\cdot 6}^{\prime}=\boldsymbol{0}$ , then we can instead assume that $\boldsymbol{\delta}_{\cdot 6}=\boldsymbol{\delta}_{\cdot 6}^{\prime}=\boldsymbol{0}$ . This provides (15) and

\exp\left(\theta_{k6}^{\prime}+\beta_{6}^{\prime}\right)=\frac{1+\exp\left(\theta_{k6}+\beta_{6}\right)-\eta}{\eta}

(17)

from (13). We must have

\exp\left(T\beta_{6}^{\prime}\right)=\prod_{i=1}^{T}\frac{1+\exp\left(\theta_{i6}+\beta_{6}\right)-\eta}{\eta}

and

\exp\left(\theta_{t6}^{\prime}\right)=\frac{\frac{1+\exp\left(\theta_{t6}+\beta_{6}\right)-\eta}{\eta}}{\left(\prod_{i=1}^{T}\frac{1+\exp\left(\theta_{i6}+\beta_{6}\right)-\eta}{\eta}\right)^{\frac{1}{T}}}=\frac{1+\exp\left(\theta_{t6}+\beta_{6}\right)-\eta}{\left(\prod_{i=1}^{T}1+\exp\left(-\theta_{i6}+\beta_{6}\right)-\eta\right)^{\frac{1}{T}}}

These definitions provide (14) in this case and so we apparently have a possible mapping.
Third, let us assume that neither of the above cases hold. Then we should be able to reach a contradiction. Let $t_{1}$ and $t_{2}$ be such that $\theta_{t_{1}6}\neq\theta_{t_{2}6}$ and $k_{1}$ and $k_{2}$ be such that $\delta_{k_{1}6}\neq\delta_{k_{2}6}$ . Then we must have

\frac{1+\exp\left(\theta_{t_{1}6}-\delta_{k_{1}6}+\beta_{6}\right)-\eta}{1+\exp\left(\theta_{t_{1}6}-\delta_{k_{2}6}+\beta_{6}\right)-\eta}=\frac{1+\exp\left(\theta_{t_{2}6}-\delta_{k_{1}6}+\beta_{6}\right)-\eta}{1+\exp\left(\theta_{t_{2}6}-\delta_{k_{2}6}+\beta_{6}\right)-\eta}

and so

\begin{array}[]{rl}&\left(1+\exp\left(\theta_{t_{1}6}-\delta_{k_{1}6}+\beta_{6}\right)-\eta\right)\left(1+\exp\left(\theta_{t_{2}6}-\delta_{k_{2}6}+\beta_{6}\right)-\eta\right)\\ =&\left(1+\exp\left(\theta_{t_{2}6}-\delta_{k_{1}6}+\beta_{6}\right)-\eta\right)\left(1+\exp\left(\theta_{t_{1}6}-\delta_{k_{2}6}+\beta_{6}\right)-\eta\right)\\ &\left(\exp\left(\theta_{t_{1}6}-\delta_{k_{1}6}\right)+\exp\left(\theta_{t_{2}6}-\delta_{k_{2}6}\right)\right)\\ =&\left(\exp\left(\theta_{t_{1}6}-\delta_{k_{2}6}\right)+\exp\left(\theta_{t_{2}6}-\delta_{k_{1}6}\right)\right)\\ &\left(\exp\left(\theta_{t_{1}6}-\theta_{t_{2}6}+\delta_{k_{2}6}-\delta_{k_{1}6}\right)+1\right)\\ =&\left(\exp\left(\theta_{t_{1}6}-\theta_{t_{2}6}\right)+\exp\left(\delta_{k_{2}6}-\delta_{k_{1}6}\right)\right)\end{array}

which is an equation of the form

\exp(x)\exp(y)+1=\exp(x)+\exp(y)

isolating $\exp(x)$ we get

\exp(x)(\exp(y)-1)=\exp(y)-1

and so $y\neq 0$ requires $x=0$ and $x\neq 0$ requires $y=0$ . Because $(t_{1},t_{2})$ and $(k_{1},k_{2})$ were arbitrary, we must be in one of the two cases already discussed.

6.1 Assuming $\boldsymbol{\theta}_{\cdot 6}=\boldsymbol{0}$ and $\boldsymbol{\delta}_{\cdot 6}\neq\boldsymbol{0}$

We work through implications for other part of the parameter space from the assumption of this subcase and the existence of an identifiability issue. For this analysis, we will let $\psi_{6tk}=\psi_{6k}$ and $\psi_{6tk}^{\prime}=\psi_{6k}^{\prime}$ because neither can depend on $t$ .

6.1.1 Implications for $\psi_{3tk}$

From (7e), we have

\exp\left(\theta_{t3}^{\prime}-\theta_{t3}+\delta_{k3}-\delta_{k3}^{\prime}+\beta_{3}^{\prime}-\beta_{3}\right)=\frac{\psi_{6k}^{\prime}}{\psi_{6k}}

For different values $t_{1}$ and $t_{2}$ , we have to have

\exp\left(\theta_{t_{1}3}^{\prime}-\theta_{t_{1}3}\right)=\exp\left(\theta_{t_{2}3}^{\prime}-\theta_{t_{2}3}\right)

So we have

\exp\left(\theta_{t_{1}3}^{\prime}-\theta_{t_{2}3}^{\prime}\right)=\exp\left(\theta_{t_{1}3}-\theta_{t_{2}3}\right)

Taking the geometric average over $t_{2}$ , we get

\exp\left(\theta_{t_{1}3}^{\prime}\right)=\exp\left(\theta_{t_{1}3}\right)

and so $\theta_{t3}^{\prime}=\theta_{t3}$ for all $t$ .

Now, we have

\exp\left(\delta_{k3}-\delta_{k3}^{\prime}+\beta_{3}^{\prime}-\beta_{3}\right)=\frac{\psi_{6k}^{\prime}}{\psi_{6k}}

and taking the geometric average over $k$ provides

\exp\left(\beta_{3}^{\prime}\right)=\exp\left(\beta_{3}\right)\left(\prod_{k=1}^{K}\frac{\psi_{6k}^{\prime}}{\psi_{6k}}\right)^{\frac{1}{K}}=\exp\left(\beta_{3}\right)\left(\prod_{k=1}^{K}\frac{1-\eta+\eta\psi_{6k}}{\psi_{6k}}\right)^{\frac{1}{K}}

and so we also have

\exp\left(-\delta_{k3}^{\prime}\right)=\exp\left(-\delta_{k3}-\beta_{3}^{\prime}+\beta_{3}\right)\frac{\psi_{6k}^{\prime}}{\psi_{6k}}=\exp\left(-\delta_{k3}\right)\frac{1-\eta+\eta\psi_{6k}}{\psi_{6k}}\left(\prod_{\tilde{k}=1}^{K}\frac{\psi_{6\tilde{k}}}{1-\eta+\eta\psi_{6\tilde{k}}}\right)^{\frac{1}{K}}

Finally, we get

\exp(\theta_{t3}^{\prime}-\delta_{k3}^{\prime}+\beta_{3}^{\prime})=\exp(\theta_{t3}-\delta_{k3}+\beta_{3})\frac{1-\eta+\eta\psi_{6k}}{\psi_{6k}}

and so, as expected, we get

\psi_{3tk}^{\prime}=\frac{\exp(\theta_{t3}-\delta_{k3}+\beta_{3})\frac{1-\eta+\eta\psi_{6k}}{\psi_{6k}}}{1+\exp(\theta_{t3}-\delta_{k3}+\beta_{3})\frac{1-\eta+\eta\psi_{6k}}{\psi_{6k}}}=\frac{\psi_{3tk}(1-\eta+\eta\psi_{6k})}{(1-\psi_{3tk})\psi_{6k}+\psi_{3tk}(1-\eta+\eta\psi_{6k})}

and we have described the mapping on the parameter space that allows this to happen. Notice that the mapping is all about fixing the parameter that changes with $t$ and allows the parameters that depend on $k$ to change in a way that is consistent with the $\psi_{6k}\mapsto\psi_{6k}^{\prime}$ mapping.

6.1.2 Implications for $\psi_{4tk}$

From (7c), we have

\psi_{4tk}^{\prime}=\psi_{4tk}\frac{\psi_{6k}}{1-\eta+\eta\psi_{6k}}

which is

\frac{\exp(\theta_{t4}^{\prime}-\delta_{k4}^{\prime}+\beta_{4}^{\prime})}{1+\exp(\theta_{t4}^{\prime}-\delta_{k4}^{\prime}+\beta_{4}^{\prime})}=\frac{\exp(\theta_{t4}-\delta_{k4}+\beta_{4})}{1+\exp(\theta_{t4}-\delta_{k4}+\beta_{4})}\frac{\psi_{6k}}{1-\eta+\eta\psi_{6k}}

and so

\frac{\exp(\theta_{t4}^{\prime}-\delta_{k4}^{\prime}+\beta_{4}^{\prime})}{1+\exp(\theta_{t4}^{\prime}-\delta_{k4}^{\prime}+\beta_{4}^{\prime})}\frac{1+\exp(\theta_{t4}-\delta_{k4}+\beta_{4})}{\exp(\theta_{t4}-\delta_{k4}+\beta_{4})}=\frac{\psi_{6k}}{1-\eta+\eta\psi_{6k}}

depends only on $k$ . Rewriting a bit we get

\frac{\exp(-\theta_{t4})+\exp(-\delta_{k4}+\beta_{4})}{\exp(-\theta_{t4}^{\prime})+\exp(-\delta_{k4}^{\prime}+\beta_{4}^{\prime})}\frac{\exp(-\delta_{k4}^{\prime}+\beta_{4}^{\prime})}{\exp(-\delta_{k4}+\beta_{4})}=\frac{\psi_{6k}}{1-\eta+\eta\psi_{6k}}

and the right hand side can’t depend on $t$ and so we need $\boldsymbol{\theta}_{\cdot 4}=\boldsymbol{\theta}_{\cdot 4}^{\prime}=\boldsymbol{0}$ . We get

\frac{1+\exp(-\delta_{k4}+\beta_{4})}{1+\exp(-\delta_{k4}^{\prime}+\beta_{4}^{\prime})}\frac{\exp(-\delta_{k4}^{\prime}+\beta_{4}^{\prime})}{\exp(-\delta_{k4}+\beta_{4})}=\frac{\psi_{6k}}{1-\eta+\eta\psi_{6k}}

and we get

\frac{1}{1+\exp(\delta_{k4}^{\prime}-\beta_{4}^{\prime})}=\frac{\exp(-\delta_{k4}+\beta_{4})}{1+\exp(-\delta_{k4}+\beta_{4})}\frac{\psi_{6k}}{1-\eta+\eta\psi_{6k}}

and

\exp(\delta_{k4}^{\prime}-\beta_{4}^{\prime})=\left(1+\exp(\delta_{k4}-\beta_{4})\right)\frac{1-\eta+\eta\psi_{6k}}{\psi_{6k}}-1

Taking geometric average over $k$ , we get

\exp(-\beta_{4}^{\prime})=\left(\prod_{k=1}^{K}\left(\left(1+\exp(\delta_{k4}-\beta_{4})\right)\frac{1-\eta+\eta\psi_{6k}}{\psi_{6k}}-1\right)\right)^{\frac{1}{K}}

and so we also get

\exp(\delta_{k4}^{\prime})=\frac{\left(\left(1+\exp(\delta_{k4}-\beta_{4})\right)\frac{1-\eta+\eta\psi_{6k}}{\psi_{6k}}-1\right)}{\left(\prod_{k=1}^{K}\left(\left(1+\exp(\delta_{k4}-\beta_{4})\right)\frac{1-\eta+\eta\psi_{6k}}{\psi_{6k}}-1\right)\right)^{\frac{1}{K}}}

Of course, we could have just noted that $\psi_{4tk}=\psi_{4k}$ and $\psi_{4tk}^{\prime}=\psi_{4k}^{\prime}$ and get

\psi_{4k}^{\prime}=\psi_{4k}\frac{\psi_{6k}}{1-\eta+\eta\psi_{6k}}

6.1.3 Implications for $\psi_{2t}$

From (7d) we get

\frac{\psi_{2t}}{\psi_{2t}^{\prime}}=\frac{1-\eta+\eta\psi_{6k}}{(1-\psi_{3tk})\psi_{6k}+\psi_{3tk}(1-\eta+\eta\psi_{6k})}=\frac{1}{1-\psi_{3tk}}\frac{1}{\frac{\psi_{6k}}{1-\eta+\eta\psi_{6k}}+\frac{\psi_{3tk}}{1-\psi_{3tk}}}

which is

\frac{\psi_{2t}}{\psi_{2t}^{\prime}}=\frac{1+\exp(\theta_{t3}-\delta_{k3}+\beta_{3})}{\frac{\psi_{6k}}{1-\eta+\eta\psi_{6k}}+\exp(\theta_{t3}-\delta_{k3}+\beta_{3})}=\frac{\exp(\delta_{k3}-\beta_{3})+\exp(\theta_{t3})}{\frac{\psi_{6k}\exp(\delta_{k3}-\beta_{3})}{1-\eta+\eta\psi_{6k}}+\exp(\theta_{t3})}

The left hand side does not depend on $k$ , and so for $k_{1}$ and $k_{2}$ we have

\frac{\exp(\delta_{k_{1}3}-\beta_{3})+\exp(\theta_{t3})}{\frac{\psi_{6k_{1}}\exp(\delta_{k_{1}3}-\beta_{3})}{1-\eta+\eta\psi_{6k_{1}}}+\exp(\theta_{t3})}=\frac{\exp(\delta_{k_{2}3}-\beta_{3})+\exp(\theta_{t3})}{\frac{\psi_{6k_{2}}\exp(\delta_{k_{2}3}-\beta_{3})}{1-\eta+\eta\psi_{6k_{2}}}+\exp(\theta_{t3})}

and this gives

\begin{array}[]{rl}&\frac{\psi_{6k_{2}}\exp(\delta_{k_{2}3}-\beta_{3})}{1-\eta+\eta\psi_{6k_{2}}}\exp(\delta_{k_{1}3}-\beta_{3})-\frac{\psi_{6k_{1}}\exp(\delta_{k_{1}3}-\beta_{3})}{1-\eta+\eta\psi_{6k_{1}}}\exp(\delta_{k_{2}3}-\beta_{3})\\ &=\exp(\theta_{t3})\left(\frac{\psi_{6k_{1}}\exp(\delta_{k_{1}3}-\beta_{3})}{1-\eta+\eta\psi_{6k_{1}}}+\exp(\delta_{k_{2}3}-\beta_{3})-\frac{\psi_{6k_{2}}\exp(\delta_{k_{2}3}-\beta_{3})}{1-\eta+\eta\psi_{6k_{2}}}-\exp(\delta_{k_{1}3}-\beta_{3})\right)\end{array}

and so for any $t$ and any $k_{1}$ and $k_{2}$ we must have

\exp(\theta_{t3})=\frac{\frac{\psi_{6k_{2}}\exp(\delta_{k_{2}3}-\beta_{3})}{1-\eta+\eta\psi_{6k_{2}}}\exp(\delta_{k_{1}3}-\beta_{3})-\frac{\psi_{6k_{1}}\exp(\delta_{k_{1}3}-\beta_{3})}{1-\eta+\eta\psi_{6k_{1}}}\exp(\delta_{k_{2}3}-\beta_{3})}{\frac{\psi_{6k_{1}}\exp(\delta_{k_{1}3}-\beta_{3})}{1-\eta+\eta\psi_{6k_{1}}}+\exp(\delta_{k_{2}3}-\beta_{3})-\frac{\psi_{6k_{2}}\exp(\delta_{k_{2}3}-\beta_{3})}{1-\eta+\eta\psi_{6k_{2}}}-\exp(\delta_{k_{1}3}-\beta_{3})}

We get that $\boldsymbol{\theta}_{\cdot 3}=\boldsymbol{\theta}_{\cdot 3}^{\prime}=\boldsymbol{0}$ and we could just an easily call $\psi_{3tk}=\psi_{3k}$ and $\psi_{3tk}^{\prime}=\psi_{3k}^{\prime}$ . The implication is that

\frac{\psi_{2t}}{\psi_{2t}^{\prime}}=\frac{1-\eta+\eta\psi_{6k}}{(1-\psi_{3k})\psi_{6k}+\psi_{3k}(1-\eta+\eta\psi_{6k})}

Because the left hand side does not depend on $t$ and the right hand side does not depend on $k$ , there must be a constant $\xi$ such that

\xi=\frac{\psi_{2t}}{\psi_{2t}^{\prime}}=\frac{1-\eta+\eta\psi_{6k}}{(1-\psi_{3k})\psi_{6k}+\psi_{3k}(1-\eta+\eta\psi_{6k})}

for all $t$ and $k$ . Now, this makes a restriction on $\psi_{3k}$ in terms of $\xi$ , $\psi_{6k}$ , and $\eta$ .

\xi=\frac{1-\eta+\eta\psi_{6k}}{\psi_{6k}+\psi_{3k}(1-\eta)(1-\psi_{6k})}

and so

\psi_{3k}=\frac{\frac{1-\eta+\eta\psi_{6k}}{\xi}-\psi_{6k}}{(1-\eta)(1-\psi_{6k})}

and of course

\psi_{3k}^{\prime}=\psi_{3k}\xi

Now, we need

0<\frac{1-\eta+\eta\psi_{6k}}{\xi}-\psi_{6k}

which gives

\xi<\frac{1-\eta+\eta\psi_{6k}}{\psi_{6k}}

We also need

\frac{1-\eta+\eta\psi_{6k}}{\xi}-\psi_{6k}<(1-\eta)(1-\psi_{6k})

and so

\xi>\frac{1-\eta+\eta\psi_{6k}}{(1-\eta)(1-\psi_{6k})+\psi_{6}}=1

6.1.4 Pulling this all together

If an identifiability issue exists, then we must have room for some $\eta$ , which means

\max\left\{1-\psi_{8},\max_{k}\{1-\psi_{7k}\}\right\}<\min_{t,k}\left\{\frac{1-\psi_{6k}\max\left\{\psi_{4k},\frac{\psi_{2t}-\psi_{2t}\psi_{3k}}{1-\psi_{2t}\psi_{3k}}\right\}}{1-\psi_{6k}}\right\}

There must be some $\xi$ that provides

\psi_{3k}=\frac{\frac{1-\eta+\eta\psi_{6k}}{\xi}-\psi_{6k}}{(1-\eta)(1-\psi_{6k})}

The way to generate such a parameter vector is to generate $\psi_{8}$ , $\boldsymbol{\psi}_{7\cdot}$ , $\boldsymbol{\psi}_{6\cdot}$ , and $\boldsymbol{\psi}_{4\cdot}$ that depend only on $k$ ; $\boldsymbol{\psi}_{2\cdot}$ that depends only on $t$ ; $\boldsymbol{\psi}_{1\cdot\cdot}$ and $\boldsymbol{\psi}_{5\cdot\cdot}$ that depend on $t$ and $k$ ; generate a $\eta$ in the range

\left(\max\left\{1-\psi_{8},\max_{k}\{1-\psi_{7k}\}\right\},\min_{t,k}\left\{\frac{1-\psi_{6k}\max\left\{\psi_{4k},\psi_{2t}\right\}}{1-\psi_{6k}}\right\}\right)

where we have taken $\psi_{3k}=1$ in the range upper bound calculation so that we can define $\psi_{3k}=1$ in terms of $\psi_{6k}$ , $\eta$ , and $\xi$ ; generate $\xi$ in the range

\left(1,\frac{1-\eta+\eta\psi_{6k}}{\psi_{6k}}\right);

and finally computing

\psi_{3k}=\frac{\frac{1-\eta+\eta\psi_{6k}}{\xi}-\psi_{6k}}{(1-\eta)(1-\psi_{6k})}

As such, this indicates that the model is not identifiable. Although there exist two additional cases, namely when $\boldsymbol{\delta}_{\cdot 6}=\boldsymbol{0}$ and $\boldsymbol{\theta}_{\cdot 6}\neq\boldsymbol{0}$ and when $\boldsymbol{\theta}_{\cdot 6}=\boldsymbol{0}$ and $\boldsymbol{\delta}_{\cdot 6}=\boldsymbol{0}$ , their analysis is analogous to that of the case presented above, and it is therefore omitted for succinctness from this report.

Acknowledgements

AW, DTR, GF, and WH were partially supported by NIH award 5R01DC018813-04. DTR was also partially supported by NSF RTG DMS award 2136228.

Identifiability of the Multinomial Processing Tree-IRT model for the Philadelphia Naming Test Technical Report

1 Abstract

2 Understanding the Problem

3 Model Architecture

4 Basic probability equations to use

5 Identifiability analysis at a high level

6 The details of parameter restrictions

6.1 Assuming 𝜽⋅6=𝟎\boldsymbol{\theta}_{\cdot 6}=\boldsymbol{0} and 𝜹⋅6≠𝟎\boldsymbol{\delta}_{\cdot 6}\neq\boldsymbol{0}

6.1.1 Implications for ψ3​t​k\psi_{3tk}

6.1.2 Implications for ψ4​t​k\psi_{4tk}

6.1.3 Implications for ψ2​t\psi_{2t}

6.1.4 Pulling this all together

Acknowledgements

References

Identifiability of the Multinomial Processing Tree-IRT model for the Philadelphia Naming Test
Technical Report

6.1 Assuming $\boldsymbol{\theta}_{\cdot 6}=\boldsymbol{0}$ and $\boldsymbol{\delta}_{\cdot 6}\neq\boldsymbol{0}$

6.1.1 Implications for $\psi_{3tk}$

6.1.2 Implications for $\psi_{4tk}$

6.1.3 Implications for $\psi_{2t}$