This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Latent class analysis with weighted responses

Huan Qing qinghuan@u.nus.edu&qinghuan07131995@163.com School of Economics and Finance, Chongqing University of Technology, Chongqing, 400054, China
Abstract

The latent class model has been proposed as a powerful tool for cluster analysis of categorical data in various fields such as social, psychological, behavioral, and biological sciences. However, one important limitation of the latent class model is that it is only suitable for data with binary responses, making it fail to model real-world data with continuous or negative responses. In many applications, ignoring the weights throws out a lot of potentially valuable information contained in the weights. To address this limitation, we propose a novel generative model, the weighted latent class model (WLCM). Our model allows data’s response matrix to be generated from an arbitrary distribution with a latent class structure. In comparison to the latent class model, our WLCM is more realistic and more general. To our knowledge, our WLCM is the first model for latent class analysis with weighted responses. We investigate the identifiability of the model and propose an efficient algorithm for estimating the latent classes and other model parameters. We show that the proposed algorithm enjoys consistent estimation. The performance of the proposed algorithm is investigated using both computer-generated and real-world weighted response data.

keywords:
Categorical data, latent class model, spectral method , SVD, weighted responses
journal:

1 Introduction

Latent class model (LCM) [1, 2, 3] is a powerful tool for categorical data, with many applications across various areas such as social, psychological, behavioral, and biological sciences. These applications include movie rating [4, 5], psychiatric evaluation [6, 7, 8, 9], educational assessments [10], political surveys [11, 12, 13, 14], transport economics personal interview [15], and disease etiology detection [16, 17, 18]. In categorical data, subjects (individuals) typically respond to several items (questions). LCM is a theoretical model that categorizes subjects into disjoint groups, known as latent classes, according to their response pattern to a collection of categorical items. For example, in movie rating, latent classes may represent different groups of users with an affinity for certain movie themes; in psychological tests, latent classes may represent different types of personalities. In educational assessments, latent classes may indicate different levels of abilities. In political surveys, latent classes may represent distinct types of political ideologies. In transport economics personal interview, each latent class stands for a partition of the population. In disease etiology detection, latent classes may represent different disease categories. To infer latent classes for categorical data generated from LCM, various approaches have been developed in recent years, including maximum likelihood estimation techniques [19, 20, 21, 22] and tensor-based methods [23, 24].

To mathematically describe categorical data, let RR be the NN-by-JJ observed response matrix such that R(i,j)R(i,j) represents subject ii’s response to item jj, where NN denotes the number of subjects and JJ denotes the number of items. For LCM, researchers commonly focus on binary choice data where elements of the observed response matrix RR only take 0 or 1 [16, 10, 24, 25, 26, 27, 28, 29, 30, 31, 32]. LCM models binary response matrix by generating its elements from a Bernoulli distribution. In categorical data, binary responses can be agree/disagree responses in psychiatric evaluation, correct/wrong responses in educational assessments, and presence/absence of symptoms in disease etiology detection. However, categorical data is more than binary response. Categorical data with weighted responses is also commonly encountered in the real world and ignoring weighted data may lose potentially meaningful information [33]. For example, in movie rating [4], rating scores range in {1,2,3,4,5}\{1,2,3,4,5\} and simply letting RR be binary by recording rated/not rated loses valuable information that can reflect users’ preference patterns; for real-world categorical data from various online personality tests in the link https://openpsychometrics.org/_rawdata/, the range of most responses are {0,1,2,,m}\{0,1,2,\ldots,m\}, where mm is an integer like 2, 5, and 10; in the buyer-seller rating e-commerce data [34], elements of the observed response matrix take values in {1,0,1}\{-1,0,1\} (for convenience, we call such RR as signed response matrix in this paper) since sellers are rated by users by applying three levels of rating, “Positive”, “Neutral”, and “Negative”. In the users-jokes ratting categorical data Jester 100 [35], all responses (i.e., ratings) are continuous numbers ranging in [10,10][-10,10]. All aforementioned real-world data with weighted responses cannot be generated from a Bernoulli distribution. Therefore, the classical latent class model is inadequate for handling the aforementioned data with weighted responses. As a result, it is desirable to develop a more flexible model for data with weighted responses. With this motivation, our key contributions to the literature of latent class analysis are summarized as follows.

  • 1.

    Model. We propose a novel, identifiable, and generative statistical model, the weighted latent class model (WLCM), for categorical data with weighted responses, where the responses can be continuous or negative values. Our WLCM allows the elements of an observed weighted response matrix RR to be generated from any distribution provided that the population version of RR under WLCM enjoys a latent class structure. For example, our WLCM allows RR to be generated from Bernoulli, Normal, Poisson, Binomial, Uniform, and Exponential distributions, etc. By considering a specifically designed discrete distribution, our WLCM can also model signed response matrices. For details, please refer to Examples 1-7. For comparison, LCM requires RR to be generated from Bernoulli distribution and LCM is a sub-model of our WLCM. Under the proposed model, the elements of the observed weighted response matrix RR can take any value. Therefore, WLCM is more flexible than LCM. As far as we know, our WLCM is the first statistical model for categorical data in which weighted responses can be continuous or negative values.

  • 2.

    Algorithm. We develop an easy-to-implement algorithm, spectral clustering with K-means (SCK), to infer latent classes for weighted response matrices generated from arbitrary distribution under the proposed model. Our algorithm is designed based on a combination of two popular techniques: the singular value decomposition (SVD) and the K-means algorithm.

  • 3.

    Theoretical property. We build a theoretical framework to show that SCK enjoys consistent estimation under WLCM. We also provide Examples 1-7 to show that the theoretical performance of the proposed algorithm can be different when the observed weighted response matrices RR are generated from different distributions under the proposed model.

  • 4.

    Empirical validation. We conduct extensive simulations to validate our theoretical insights. Additionally, we apply our SCK approach to two real-world datasets with meaningful interpretations.

The remainder of this paper is organized as follows. Section 2 describes the model. Section 3 details the algorithm. Section 4 establishes the consistency results and provides examples for further analysis. Section 5 contains numerical studies that verify our theoretical findings and examine the performance of the proposed method. Section 6 demonstrates the proposed method using two real-world datasets. Section 7 concludes the paper with a brief discussion of contributions and future work.

The following notations will be used throughout the paper. For any positive integer mm, let [m][m] and Im×mI_{m\times m} be [m]:={1,2,,m}[m]:=\{1,2,\ldots,m\} and the m×mm\times m identity matrix, respectively. For any vector xx and any q>0q>0, xq\|x\|_{q} denotes xx’s lql_{q}-norm. For any matrix MM, MM^{\prime} denotes its transpose, M\|M\| denotes its spectral norm, MF\|M\|_{F} denotes its Frobenius norm, rank(M)\mathrm{rank}(M) denotes its rank, σi(M)\sigma_{i}(M) denotes its ii-th largest singular value, λi(M)\lambda_{i}(M) denotes its ii-th largest eigenvalue ordered by magnitude, M(i,:)M(i,:) denotes its ii-th row, and M(:,j)M(:,j) denotes its jj-th column. Let \mathbb{R} and \mathbb{N} be the set of real numbers and nonnegative integers, respectively. For any random variable XX, 𝔼(X)\mathbb{E}(X) and (X=a)\mathbb{P}(X=a) are the expectation and the probability that XX equals to aa, respectively. Let 𝕄m,K\mathbb{M}_{m,K} be the collection of all m×Km\times K matrices where each row has only one 11 and all others 0.

2 Weighted latent class model

Unlike most researchers that only focus on binary responses, in our weighted response setting in this paper, all elements of the observed weighted response matrix RR are allowed to be any real value, i.e., RN×JR\in\mathbb{R}^{N\times J}.

Consider categorical data with NN subjects and JJ items, where the NN subjects belong to KK disjoint extreme latent profiles (also known as latent classes). Throughout this paper, the number of classes KK is assumed to be a known integer. To describe the membership of each subject, we let ZZ be a N×KN\times K matrix such that Z(i,k)Z(i,k) is 1 if subject ii belongs to the kk-th extreme latent profile and Z(i,k)Z(i,k) is 0 otherwise. Call ZZ the classification matrix in this paper. For each subject i[N]i\in[N], it is assumed to belong to a single extreme latent profile. For convenience, define \ell as a NN-by-11 vector whose ii-th entry (i)\ell(i) is kk if the ii-th subject belongs to the kk-th extreme latent profile for i[N]i\in[N]. Thus for subject i[N]i\in[N], we have Z(i,(i))=1Z(i,\ell(i))=1 and the other (K1)(K-1) entries of the K×1K\times 1 classification vector Z(i,:)Z(i,:) is 0.

Introduce the J×KJ\times K item parameter matrix ΘJ×K\Theta\in\mathbb{R}^{J\times K}. For k[K]k\in[K], our weighted latent class model (WLCM) assumes that Θ(j,k)\Theta(j,k) collects the conditional-response expectation for the response of the ii-th subject to the jj-th item under arbitrary distribution \mathcal{F} provided that subject ii belongs to the kk-th extreme latent profile. Specifically, for i[N],j[J]i\in[N],j\in[J], given the classification vector Z(i,:)Z(i,:) of subject ii and the item parameter matrix Θ\Theta, our WLCM assumes that for arbitrary distribution \mathcal{F}, the conditional response expectation of the ii-th subject to the jj-th item is

𝔼(R(i,j)|Z(i,:),Θ)=k=1KZ(i,k)Θ(j,k)Θ(j,(i)).\displaystyle\mathbb{E}(R(i,j)|Z(i,:),\Theta)=\sum_{k=1}^{K}Z(i,k)\Theta(j,k)\equiv\Theta(j,\ell(i)). (1)

Based on Equation (1), our WLCM can be simplified as follows.

Definition 1.

Let RN×JR\in\mathbb{R}^{N\times J} denote the observed weighted response matrix. Let Z𝕄N,KZ\in\mathbb{M}_{N,K} be the classification matrix and ΘJ×K\Theta\in\mathbb{R}^{J\times K} be the item parameter matrix. For i[N],j[J]i\in[N],j\in[J], our weighted latent class model (WLCM) assumes that for an arbitrary distribution \mathcal{F}, R(i,j)R(i,j) are independent random variables generated from the distribution \mathcal{F} and the expectation of R(i,j)R(i,j) under the distribution \mathcal{F} should satisfy the following formula:

𝔼(R(i,j))=R0(i,j),whereR0:=ZΘ.\displaystyle\mathbb{E}(R(i,j))=R_{0}(i,j),\mathrm{where~{}}R_{0}:=Z\Theta^{\prime}. (2)

Definition 1 says that WLCM is determined by the classification matrix ZZ, the item parameter matrix Θ\Theta, and the distribution \mathcal{F}. For brevity, we denote WLCM by WLCM(Z,Θ,)WLCM(Z,\Theta,\mathcal{F}). Under WLCM, \mathcal{F} is allowed to be any distribution as long as Equation (2) is satisfied under \mathcal{F}, i.e., WLCM only requires the expectation (i.e., population) response matrix R0R_{0} of the observed weighted response matrix RR to be ZΘZ\Theta^{\prime} under any distribution \mathcal{F}.

Remark 1.

For the case that \mathcal{F} is Bernoulli distribution, all elements of Θ\Theta range in [0,1][0,1], RR only contains binary responses (i.e., R(i,j){0,1}R(i,j)\in\{0,1\} for i[N],j[J]i\in[N],j\in[J] when \mathcal{F} is Bernoulli distribution), and Equation (1) becomes (R(i,j)=1|Z(i,:),Θ)=Θ(j,(i))\mathbb{P}(R(i,j)=1|Z(i,:),\Theta)=\Theta(j,\ell(i)). For this case, WLCM reduces to the LCM model, i.e., LCM is a special case of our WLCM.

Remark 2.

It should be noted that Equation (2) does not hold for all distributions. For instance, we cannot set \mathcal{F} as a t-distribution because the expectation of a t-distribution is always 0, which cannot capture the latent structure required by the WLCM model; \mathcal{F} cannot be a Cauchy distribution whose expectation even does not exist; \mathcal{F} cannot be a Chi-square distribution because the expectation of a Chi-square distribution is its degrees of freedom, which is a fixed positive integer and cannot capture the latent structure required by WLCM. We will provide some examples to demonstrate that Equation (2) can be satisfied for different distribution \mathcal{F}. For details, please refer to Examples 1-7.

Remark 3.

It should be also noted that the ranges of the observed weighted response matrix RR and the item parameter matrix Θ\Theta depend on distribution \mathcal{F}. For example, when \mathcal{F} is Bernoulli distribution, R{0,1}N×JR\in\{0,1\}^{N\times J} and Θ[0,1]J×K\Theta\in[0,1]^{J\times K}; when \mathcal{F} is Poisson distribution, RN×JR\in\mathbb{N}^{N\times J} and Θ[0,+)J×K\Theta\in[0,+\infty)^{J\times K}; If we let \mathcal{F} be Normal distribution, RN×JR\in\mathbb{R}^{N\times J} and Θ(,+)J×K\Theta\in(-\infty,+\infty)^{J\times K}. For details, please refer to Examples 1-7.

The following proposition shows that the WLCM model is identifiable as long as there exists at least one subject for every extreme latent profile.

Proposition 1.

(Identifiability). Consider a WLCM model as in Equation (2), when each extreme latent profile has at least one subject, the model is identifiable: for any other valid parameter set (Z~,Θ~)(\tilde{Z},\tilde{\Theta}), if Z~Θ~=ZΘ\tilde{Z}\tilde{\Theta}^{\prime}=Z\Theta^{\prime}, then (Z,Θ)(Z,\Theta) and (Z~,Θ~)(\tilde{Z},\tilde{\Theta}) are identical up to a permutation of the KK extreme latent profiles.

All proofs of theoretical results developed in this paper are given in the Appendix. The condition that each extreme latent profile must contain at least one subject means that each extreme latent profile cannot be an empty set and we have rank(Z)=K\mathrm{rank}(Z)=K.

Remark 4.

Note that ZZ and Z~\tilde{Z} are the same up to a permutation of the KK latent classes in Proposition 1. A permutation is acceptable since the equivalence of ZZ and Z~\tilde{Z} should not rely on how we label each of the KK extreme latent profiles. A similar argument holds for the identity of Θ\Theta and Θ~\tilde{\Theta}.

The observed weighted response matrix RR along with the ground-truth classification matrix ZZ and the item parameter matrix Θ\Theta can be generated using our WLCM model as follows: let R(i,j)R(i,j) be a random variable generated by distribution \mathcal{F} with expected value R0(i,j)R_{0}(i,j) for i[N],j[J]i\in[N],j\in[J], where R0=ZΘR_{0}=Z\Theta^{\prime} satisfies the latent structure required by WLCM. In latent class analysis, given the observed weighted response matrix RR generated from WLCM(Z,Θ,)WLCM(Z,\Theta,\mathcal{F}), our goal is to infer the classification matrix ZZ and the item parameter matrix Θ\Theta. Proposition 1 ensures that the model parameters ZZ and Θ\Theta can be reliably inferred from the observed weighted response matrix RR. In the following two sections, we will develop a spectral algorithm to fit WLCM and show that this algorithm yields consistent estimation.

3 A spectral method for parameters estimation

We have presented our model, WLCM, and demonstrated its superiority over the classical latent class model. In addition to providing a more general model for latent class analysis, we are also interested in estimating the model parameters. In this section, we focus on the parameter estimation problem within the WLCM framework by developing an efficient and easy-to-implement spectral method.

To provide insight into developing an algorithm for the WLCM model, we first consider an oracle case where we observe the expectation response matrix R0R_{0} given in Equation (2). We would like to estimate ZZ and Θ\Theta from R0R_{0}. Recall that the item parameter matrix Θ\Theta is a JJ-by-KK matrix , here we let rank(Θ)=K0\mathrm{rank}(\Theta)=K_{0}, where K0K_{0} is a positive integer and it is no larger than KK. As R0=ZΘ,rank(Z)=KR_{0}=Z\Theta^{\prime},\mathrm{rank}(Z)=K, and rank(Θ)=K0K\mathrm{rank}(\Theta)=K_{0}\leq K, we see that R0R_{0} is a rank-K0K_{0} matrix. As the number of extreme latent profiles KK is usually far smaller than the number of subjects NN and the number of items JJ, the NN-by-JJ population response matrix R0R_{0} enjoys a low-dimensional structure. Next, we will demonstrate that we can greatly benefit from the low-dimensional structure of R0R_{0} when we aim to develop a method to infer model parameters under the WLCM model.

Let R0=UΣVR_{0}=U\Sigma V^{\prime} be the compact singular value decomposition (SVD) of R0R_{0} such that Σ\Sigma is a K0×K0K_{0}\times K_{0} diagonal matrix collecting the K0K_{0} nonzero singular values of R0R_{0}. Write Σ=diag(σ1(R0),σ2(R0),,σK0(R0))\Sigma=\mathrm{diag}(\sigma_{1}(R_{0}),\sigma_{2}(R_{0}),\ldots,\sigma_{K_{0}}(R_{0})). The N×K0N\times K_{0} matrix UU collects the corresponding left singular vectors and it satisfies UU=IK0×K0U^{\prime}U=I_{K_{0}\times K_{0}}. Similarly, the J×K0J\times K_{0} matrix VV collects the corresponding right singular vectors and it satisfies VV=IK0×K0V^{\prime}V=I_{K_{0}\times K_{0}}. For k[K]k\in[K], let NkN_{k} be the number of subjects that belong to the kk-th extreme latent profile, i.e., Nk=i=1NZ(i,k)N_{k}=\sum_{i=1}^{N}Z(i,k). The ensuing lemma constitutes the foundation of our estimation method.

Lemma 1.

Under WLCM(Z,Θ,)WLCM(Z,\Theta,\mathcal{F}), let R0=UΣVR_{0}=U\Sigma V^{\prime} be the compact SVD of R0R_{0}. The following statements are true.

  • 1.

    (1) The left singular vectors matrix UU can be written as

    U=ZX,\displaystyle U=ZX, (3)

    where XX is a K×K0K\times K_{0} matrix.

  • 2.

    (2) UU has KK distinct rows such that for any two distinct subjects ii and i¯\bar{i} that belong to the same extreme latent profile (i.e., (i)=(i¯)\ell(i)=\ell(\bar{i})), we have U(i,:)=U(i¯,:)U(i,:)=U(\bar{i},:).

  • 3.

    (3) Θ\Theta can be written as

    Θ=VΣUZ(ZZ)1.\displaystyle\Theta=V\Sigma U^{\prime}Z(Z^{\prime}Z)^{-1}. (4)
  • 4.

    (4) Furthermore, when K0=KK_{0}=K, for all k[K],l[K]k\in[K],l\in[K], and klk\neq l, we have

    X(k,:)X(l,:)F=(Nk1+Nl1)1/2.\displaystyle\|X(k,:)-X(l,:)\|_{F}=(N_{k}^{-1}+N_{l}^{-1})^{1/2}. (5)

From now on, for the simplicity of our further analysis, we let K0KK_{0}\equiv K. Hence, the last statement of Lemma 1 always holds.

The second statement of Lemma 1 indicates that the rows of UU corresponding to subjects assigned to the same extreme latent profile are identical. This circumstance implies that the application of a clustering algorithm to the rows of UU can yield an exact reconstruction of the classification matrix ZZ after a permutation of the KK extreme latent profiles.

In this paper, we adopt the K-means clustering algorithm, an unsupervised learning technique that groups similar data points into KK clusters. This clustering technique is detailed as follows,

(Z¯¯,X¯¯)=argminZ¯𝕄N,K,X¯K×KZ¯X¯U¯F2,\displaystyle(\bar{\bar{Z}},\bar{\bar{X}})=\mathrm{arg~{}}\mathrm{min}_{\bar{Z}\in\mathbb{M}_{N,K},\bar{X}\in\mathbb{R}^{K\times K}}\|\bar{Z}\bar{X}-\bar{U}\|^{2}_{F}, (6)

where U¯\bar{U} is any N×KN\times K matrix. For convenience, call Equation (6) as “Run K-means algorithm on all rows of U¯\bar{U} with KK clusters to obtain Z¯¯\bar{\bar{Z}}” because we are interested in the classification matrix Z¯¯\bar{\bar{Z}}. Let U¯\bar{U} in Equation (6) be UU, the second statement of Lemma 1 guarantees that Z¯¯=Z𝒫,X¯¯=𝒫X\bar{\bar{Z}}=Z\mathcal{P},\bar{\bar{X}}=\mathcal{P}^{\prime}X, where 𝒫\mathcal{P} is a K×KK\times K permutation matrix, i.e., running K-means algorithm on all rows of UU exactly recovers ZZ up to a permutation of the KK extreme latent profiles.

After obtaining ZZ from UU, Θ\Theta can be recovered subsequently by Equation (4). The above analysis suggests the following algorithm, Ideal SCK, where SCK stands for Spectral Clustering with K-means. Ideal SCK returns a permutation of (Z,Θ)(Z,\Theta), which also supports the identifiability of the proposed model as stated in Proposition 1.

Algorithm 1 Ideal SCK
1:The expectation response matrix R0R_{0} and the number of extreme latent profiles KK.
2:A permutation of ZZ and Θ\Theta.
3:Obtain UΣVU\Sigma V^{\prime}, the top KK SVD of R0R_{0}.
4:Run K-means algorithm on all rows of UU with KK clusters to obtain Z𝒫Z\mathcal{P}, a permutation of ZZ.
5:Equation (4) gives VΣUZ𝒫((Z𝒫)Z𝒫)1=Θ𝒫V\Sigma U^{\prime}Z\mathcal{P}((Z\mathcal{P})^{\prime}Z\mathcal{P})^{-1}=\Theta\mathcal{P}, a permutation of Θ\Theta.

For the real case, the weighted response matrix RR is observed rather than the expectation response matrix R0R_{0}. We now move from the ideal scenario to the real scenario, intending to estimate ZZ and Θ\Theta when the observed weighted response matrix RR is a random matrix generated from an unknown distribution \mathcal{F} satisfying Equation (2) with KK extreme latent profiles under the WLCM model. The expectation of RR is R0R_{0} according to Equation (2) under WLCM, so intuitively, the singular values and singular vectors of RR will be close to those of R0R_{0}. Set R^=U^Σ^V^\hat{R}=\hat{U}\hat{\Sigma}\hat{V}^{\prime} as the top KK SVD of RR, where Σ^\hat{\Sigma} is a K×KK\times K diagonal matrix collecting the top KK singular values of RR. Write Σ^=diag(σ1(R),σ2(R),,σK(R))\hat{\Sigma}=\mathrm{diag}(\sigma_{1}(R),\sigma_{2}(R),\ldots,\sigma_{K}(R)). As 𝔼(R)=R0\mathbb{E}(R)=R_{0} and the N×JN\times J matrix R0R_{0} has KK non-zero singular values while the other (min(N,J)K)(\mathrm{min}(N,J)-K) singular values are zeros, we see that R^\hat{R} should be a good approximation of R0R_{0}. Matrices U^N×K,V^J×K\hat{U}\in\mathbb{R}^{N\times K},\hat{V}\in\mathbb{R}^{J\times K} collect the corresponding left and right singular vectors and satisfy U^U^=V^V^=IK×K\hat{U}^{\prime}\hat{U}=\hat{V}^{\prime}\hat{V}=I_{K\times K}. The above analysis implies that U^\hat{U} should have roughly KK distinct rows because U^\hat{U} is a slightly perturbed version of UU. Therefore, to obtain a good estimation of the classification matrix ZZ, we should apply the K-means algorithm on all rows of U^\hat{U} with KK clusters. Let Z^\hat{Z} be the estimated classification matrix returned by applying the K-means method on all rows of U^\hat{U} with KK clusters. Then we are able to obtain a good estimation of Θ\Theta according to Equation (4) by setting Θ^=V^Σ^U^Z^(Z^Z^)1\hat{\Theta}=\hat{V}\hat{\Sigma}\hat{U}^{\prime}\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}. Algorithm 2, referred to as SCK, is a natural extension of the Ideal SCK from the oracle case to the real case. Note that in our SCK algorithm, there are only two inputs: the observed weighted response matrix RR and the number of latent classes KK, i.e., SCK does not require any tuning parameters.

Algorithm 2 Spectral Clustering with K-means (SCK for short)
1:The observed weighted response matrix RN×JR\in\mathbb{R}^{N\times J} and the number of extreme latent profiles KK.
2:Z^\hat{Z} and Θ^\hat{\Theta}.
3:Obtain R^=U^Σ^V^\hat{R}=\hat{U}\hat{\Sigma}\hat{V}^{\prime}, the top KK SVD of RR.
4:Run K-means algorithm on all rows of U^\hat{U} with KK clusters to obtain Z^\hat{Z}.
5:Obtain an estimate of Θ\Theta by setting Θ^=R^Z^(Z^Z^)1\hat{\Theta}=\hat{R}^{\prime}\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}.

Here, we evaluate the computational cost of our SCK algorithm. The computational cost of the SVD step involved in the SCK approach is O(max(N2,J2)K)O(\max(N^{2},J^{2})K). For the K-means algorithm, its complexity is O(NlK2)O(NlK^{2}) with ll being the number of K-means iterations. In all experimental studies considered in this paper, ll is set as 100 for the K-means algorithm. The complexity of the last step in SCK is O(JNK)O(JNK). Since Kmin(N,J)K\ll\min(N,J) in this paper, as a consequence, the total time complexity of our SCK algorithm is O(max(N2,J2)K)O(\max(N^{2},J^{2})K).

4 Theoretical properties

In this section, we present comprehensive theoretical properties of the SCK algorithm when the observed weighted response matrix RR is generated from the proposed model. Our objective is to demonstrate that the estimated classification matrix Z^\hat{Z} and the estimated item parameter matrix Θ^\hat{\Theta} both concentrate around the true classification matrix ZZ and the true item parameter matrix Θ\Theta, respectively.

Let 𝒯={𝒯1,𝒯2,,𝒯K}\mathcal{T}=\{\mathcal{T}_{1},\mathcal{T}_{2},\ldots,\mathcal{T}_{K}\} be the collection of true partitions for all subjects, where 𝒯k={i:Z(i,k)=1fori[N]}\mathcal{T}_{k}=\{i:Z(i,k)=1\mathrm{~{}for~{}}i\in[N]\} for k[K]k\in[K], i.e., 𝒯k\mathcal{T}_{k} is the set of true partition of subjects into the kk-th extreme latent profile. Similarly, let 𝒯^={𝒯^1,𝒯^2,,𝒯^K}\hat{\mathcal{T}}=\{\hat{\mathcal{T}}_{1},\hat{\mathcal{T}}_{2},\ldots,\hat{\mathcal{T}}_{K}\} represent the collection of estimated partitions for all subjects, where 𝒯^k={i:Z^(i,k)=1fori[N]}\hat{\mathcal{T}}_{k}=\{i:\hat{Z}(i,k)=1\mathrm{~{}for~{}}i\in[N]\} for k[K]k\in[K]. We use the measure defined in [36] to quantify the closeness of the estimated partition 𝒯^\hat{\mathcal{T}} and the ground truth partition 𝒯\mathcal{T}. Denote the Clustering error associated with 𝒯\mathcal{T} and 𝒯^\hat{\mathcal{T}} as

f^=minπSKmaxk[K]|𝒯k𝒯^π(k)c|+|𝒯kc𝒯^π(k)|NK,\displaystyle\hat{f}=\mathrm{min}_{\pi\in S_{K}}\mathrm{max}_{k\in[K]}\frac{|\mathcal{T}_{k}\cap\mathcal{\hat{\mathcal{T}}}^{c}_{\pi(k)}|+|\mathcal{T}^{c}_{k}\cap\mathcal{\hat{\mathcal{T}}}_{\pi(k)}|}{N_{K}}, (7)

where SKS_{K} represents the set of all permutations of {1,2,,K}\{1,2,\ldots,K\}, 𝒯^π(k)c\mathcal{\hat{\mathcal{T}}}^{c}_{\pi(k)} and 𝒯kc\mathcal{T}^{c}_{k} denote the complementary sets. As stated in the reference [36], f^\hat{f} evaluates the maximum proportion of subjects in the symmetric difference of 𝒯k\mathcal{T}_{k} and 𝒯^π(k)\hat{\mathcal{T}}_{\pi(k)}. Since the observed weighted response matrix RR is generated from WLCM with expectation R0R_{0}, and f^\hat{f} measures the performance of the SCK algorithm, it is expected that SCK estimates ZZ with small Clustering error f^\hat{f}.

For convenience, let ρ=maxj[J],k[K]|Θ(j,k)|\rho=\mathrm{max}_{j\in[J],k\in[K]}|\Theta(j,k)| and call it the scaling parameter. Let B=ΘρB=\frac{\Theta}{\rho}, we have maxj[J],k[K]|B(j,k)|=1\mathrm{max}_{j\in[J],k\in[K]}|B(j,k)|=1 and R0=ρZBR_{0}=\rho ZB^{\prime}. Let τ=maxi[N],j[J]|R(i,j)R0(i,j)|\tau=\mathrm{max}_{i\in[N],j\in[J]}|R(i,j)-R_{0}(i,j)| and γ=maxi[N],j[J]Var(R(i,j))\gamma=\mathrm{max}_{i\in[N],j\in[J]}\mathrm{Var}(R(i,j)) where Var(R(i,j))\mathrm{Var}(R(i,j)) means the variance of R(i,j)R(i,j). We require the following assumption to establish theoretical guarantees of consistency for our SCK method.

Assumption 1.

Assume γτ2log(N+J)max(N,J)\gamma\geq\frac{\tau^{2}\mathrm{log}(N+J)}{\mathrm{max}(N,J)}.

The following theorem presents our main result, which provides upper bounds for the error rates of our SCK algorithm under our WLCM model.

Theorem 1.

Under WLCM(Z,Θ,)WLCM(Z,\Theta,\mathcal{F}), if Assumption 1 is satisfied, with probability at least 1o((N+J)3)1-o((N+J)^{-3}),

f^=O(γK2Nmaxmax(N,J)log(N+J)ρ2Nmin2J)andΘ^Θ𝒫FΘF=O(γKmax(N,J)log(N+J)ρNminJ),\displaystyle\hat{f}=O(\frac{\gamma K^{2}N_{\mathrm{max}}\mathrm{max}(N,J)\mathrm{log}(N+J)}{\rho^{2}N^{2}_{\mathrm{min}}J})\mathrm{~{}and~{}}\frac{\|\hat{\Theta}-\Theta\mathcal{P}\|_{F}}{\|\Theta\|_{F}}=O(\frac{\sqrt{\gamma K\mathrm{max}(N,J)\mathrm{log}(N+J)}}{\rho\sqrt{N_{\mathrm{min}}J}}),

where Nmax=maxk[K]{Nk},Nmin=mink[K]{Nk}N_{\mathrm{max}}=\mathrm{max}_{k\in[K]}\{N_{k}\},N_{\mathrm{min}}=\mathrm{min}_{k\in[K]}\{N_{k}\}, and 𝒫\mathcal{P} is a permutation matrix.

Because our WLCM is distribution-free, Theorem 1 provides a general theoretical guarantee of the SCK algorithm when RR is generated from WLCM for any distribution \mathcal{F} as long as Equation (2) is satisfied. We can simplify Theorem 1 by considering additional conditions:

Corollary 1.

Under WLCM(Z,Θ,)WLCM(Z,\Theta,\mathcal{F}), when Assumption 1 holds, if we make the additional assumption that NmaxNmin=O(1)\frac{N_{\mathrm{max}}}{N_{\mathrm{min}}}=O(1) and K=O(1)K=O(1), with probability at least 1o((N+J)3)1-o((N+J)^{-3}),

f^=O(γmax(N,J)log(N+J)ρ2NJ)andΘ^Θ𝒫FΘF=O(γmax(N,J)log(N+J)ρNJ).\displaystyle\hat{f}=O(\frac{\gamma\mathrm{max}(N,J)\mathrm{log}(N+J)}{\rho^{2}NJ})\mathrm{~{}and~{}}\frac{\|\hat{\Theta}-\Theta\mathcal{P}\|_{F}}{\|\Theta\|_{F}}=O(\frac{\sqrt{\gamma\mathrm{max}(N,J)\mathrm{log}(N+J)}}{\rho\sqrt{NJ}}).

For the case J=βNJ=\beta N for any positive constant β\beta, Corollary 1 implies that the SCK algorithm yields consistent estimation under WLCM since the error bounds in Corollary 1 decrease to zero as N+N\rightarrow+\infty when ρ\rho and distribution \mathcal{F} are fixed.

Recall that RR is an observed weighted response matrix generated from a distribution \mathcal{F} with expectation R0=ZΘ=ρZBR_{0}=Z\Theta^{\prime}=\rho ZB^{\prime} under the WLCM model and γ\gamma is the maximum variance of R(i,j)R(i,j) and it is closely related to the distribution \mathcal{F}, the ranges of R,ρ,BR,\rho,B, and γ\gamma can vary depending on the specific distribution \mathcal{F}. The following examples provide the ranges of R,ρ,BR,\rho,B, the upper bound of γ\gamma, and the explicit forms of error bounds in Theorem 1 for different distribution \mathcal{F} under our WLCM model. Meanwhile, based on the explicitly derived error bounds for different distribution \mathcal{F}, we also investigate how the scaling parameter ρ\rho influences the performance of the SCK algorithm in these examples. For all pairs (i,j)(i,j) with i[N],j[J]i\in[N],j\in[J], we consider the following distributions when 𝔼(R)=R0\mathbb{E}(R)=R_{0} in Equation (2) holds.

Example 1.

Let \mathcal{F} be Bernoulli distribution such that R(i,j)Bernoulli(R0(i,j))R(i,j)\sim\mathrm{Bernoulli}(R_{0}(i,j)), where R0(i,j)R_{0}(i,j) is the Bernoulli probability, i.e., 𝔼(R(i,j))=R0(i,j)\mathbb{E}(R(i,j))=R_{0}(i,j). For this case, our WLCM degenerates to the LCM model. According to the properties of the Bernoulli distribution, we have the following conclusions.

  • 1.

    R(i,j){0,1}R(i,j)\in\{0,1\}, i.e., R(i,j)R(i,j) only takes two values 0 and 1.

  • 2.

    B(i,j)[0,1]B(i,j)\in[0,1] and ρ(0,1]\rho\in(0,1] because R0(i,j)R_{0}(i,j) is a probability located in [0,1][0,1] and maxi[N],j[J]|B(i,j)|\mathrm{max}_{i\in[N],j\in[J]}|B(i,j)| is assumed to be 1.

  • 3.

    τ1\tau\leq 1 because τ=maxi[N],j[J]|R(i,j)R0(i,j)|1\tau=\mathrm{max}_{i\in[N],j\in[J]}|R(i,j)-R_{0}(i,j)|\leq 1.

  • 4.

    γρ\gamma\leq\rho because γ=maxi[N],j[J]Var(R(i,j))=maxi[N],j[J]R0(i,j)(1R0(i,j))maxi[N],j[J]R0(i,j)=maxi[N],j[J]ρ(ZB)(i,j)ρ\gamma=\mathrm{max}_{i\in[N],j\in[J]}\mathrm{Var}(R(i,j))=\mathrm{max}_{i\in[N],j\in[J]}R_{0}(i,j)(1-R_{0}(i,j))\leq\mathrm{max}_{i\in[N],j\in[J]}R_{0}(i,j)=\mathrm{max}_{i\in[N],j\in[J]}\rho(ZB)(i,j)\leq\rho.

  • 5.

    Let τ\tau be its upper bound 1 and γ\gamma be its upper bound ρ\rho, Assumption 1 becomes ρlog(N+J)max(N,J)\rho\geq\frac{\mathrm{log}(N+J)}{\mathrm{max}(N,J)}, which means a sparsity requirement on RR because ρ\rho controls the probability of the numbers of ones in RR for this case.

  • 6.

    Let γ\gamma be its upper bound ρ\rho in Theorem 1, we have

    f^=O(K2Nmaxmax(N,J)log(N+J)ρNmin2J)andΘ^Θ𝒫FΘF=O(Kmax(N,J)log(N+J)ρNminJ).\displaystyle\hat{f}=O(\frac{K^{2}N_{\mathrm{max}}\mathrm{max}(N,J)\mathrm{log}(N+J)}{\rho N^{2}_{\mathrm{min}}J})\mathrm{~{}and~{}}\frac{\|\hat{\Theta}-\Theta\mathcal{P}\|_{F}}{\|\Theta\|_{F}}=O(\frac{\sqrt{K\mathrm{max}(N,J)\mathrm{log}(N+J)}}{\sqrt{\rho N_{\mathrm{min}}J}}).

    We observe that increasing ρ\rho leads to a decrease in SCK’s error rates when \mathcal{F} is a Bernoulli distribution.

Example 2.

Let \mathcal{F} be Binomial distribution such that R(i,j)Binomial(m,R0(i,j)m)R(i,j)\sim\mathrm{Binomial}(m,\frac{R_{0}(i,j)}{m}) for any positive integer mm, where R(i,j)R(i,j) is a random variable that reflects the number of successes in a fixed number of independent trials mm with the same probability of success R0(i,j)m\frac{R_{0}(i,j)}{m}, i.e., 𝔼(R(i,j))=R0(i,j)\mathbb{E}(R(i,j))=R_{0}(i,j). For Binomial distribution, we have (R(i,j)=r)=(mr)(R0(i,j)m)r(1R0(i,j)m)mr\mathbb{P}(R(i,j)=r)=\binom{m}{r}(\frac{R_{0}(i,j)}{m})^{r}(1-\frac{R_{0}(i,j)}{m})^{m-r} for r=0,1,2,,mr=0,1,2,\ldots,m, where ()\binom{\bullet}{\bullet} is a binomial coefficient. By the property of the Binomial distribution, we have the following conclusions.

  • 1.

    R(i,j){0,1,2,,m}R(i,j)\in\{0,1,2,\ldots,m\}.

  • 2.

    B(i,j)[0,1]B(i,j)\in[0,1] and ρ(0,m]\rho\in(0,m] because R0(i,j)m\frac{R_{0}(i,j)}{m} is a probability that ranges in [0,1][0,1].

  • 3.

    τm\tau\leq m because τ=maxi[N],j[J]|R(i,j)R0(i,j)|m\tau=\mathrm{max}_{i\in[N],j\in[J]}|R(i,j)-R_{0}(i,j)|\leq m.

  • 4.

    γρ\gamma\leq\rho because γ=maxi[N],j[J]Var(R(i,j))=mR0(i,j)m(1R0(i,j)m)=R0(i,j)(1R0(i,j)m)ρ\gamma=\mathrm{max}_{i\in[N],j\in[J]}\mathrm{Var}(R(i,j))=m\frac{R_{0}(i,j)}{m}(1-\frac{R_{0}(i,j)}{m})=R_{0}(i,j)(1-\frac{R_{0}(i,j)}{m})\leq\rho.

  • 5.

    Let τ\tau be its upper bound mm and γ\gamma be its upper bound ρ\rho, Assumption 1 becomes ρm2log(N+J)max(N,J)\rho\geq\frac{m^{2}\mathrm{log}(N+J)}{\mathrm{max}(N,J)} which provides a lower bound requirement of the scaling parameter ρ\rho.

  • 6.

    Let γ\gamma be its upper bound ρ\rho in Theorem 1, we obtain the exact forms of error bounds for SCK when \mathcal{F} is a Binomial distribution, and we observe that increasing ρ\rho reduces SCK’s error rates.

Example 3.

Let \mathcal{F} be Poisson distribution such that R(i,j)Poisson(R0(i,j))R(i,j)\sim\mathrm{Poisson}(R_{0}(i,j)), where R0(i,j)R_{0}(i,j) is the Poisson parameter, i.e., 𝔼(R(i,j))=R0(i,j)\mathbb{E}(R(i,j))=R_{0}(i,j). By the properties of the Poisson distribution, the following conclusions can be obtained.

  • 1.

    R(i,j)R(i,j)\in\mathbb{N}, i.e., R(i,j)R(i,j) is an nonnegative integer.

  • 2.

    B(i,j)][0,1]B(i,j)]\in[0,1] and ρ(0,+)\rho\in(0,+\infty) because Poisson distribution can take any positive value for its mean.

  • 3.

    τ\tau is an unknown positive value because we cannot know the exact upper bound of R(i,j)R(i,j) when RR is obtained from the Poisson distribution under the WLCM model.

  • 4.

    γρ\gamma\leq\rho because γ=maxi[N],j[J]Var(R(i,j))=maxi[N],j[J]R0(i,j)ρ\gamma=\mathrm{max}_{i\in[N],j\in[J]}\mathrm{Var}(R(i,j))=\mathrm{max}_{i\in[N],j\in[J]}R_{0}(i,j)\leq\rho.

  • 5.

    Let γ\gamma be its upper bound ρ\rho, Assumption 1 becomes ρτ2log(N+j)max(N,J)\rho\geq\frac{\tau^{2}\mathrm{log}(N+j)}{\mathrm{max}(N,J)} which is a lower bound requirement of ρ\rho.

  • 6.

    Let γ\gamma be its upper bound ρ\rho in Theorem 1 obtains the exact forms of error bounds for the SCK algorithm when \mathcal{F} is a Poisson distribution. It is easy to observe that increasing ρ\rho leads to a decrease in SCK’s error rates.

Example 4.

Let \mathcal{F} be Normal distribution such that R(i,j)Normal(R0(i,j),σ2)R(i,j)\sim\mathrm{Normal}(R_{0}(i,j),\sigma^{2}), where R0(i,j)R_{0}(i,j) is the mean ( i.e., 𝔼(R(i,j))=R0(i,j)\mathbb{E}(R(i,j))=R_{0}(i,j)) and σ2\sigma^{2} is the variance parameter for Normal distribution. For this case, we have

  • 1.

    R(i,j)R(i,j)\in\mathbb{R}, i.e., R(i,j)R(i,j) is a real value.

  • 2.

    B(i,j)[1,1]B(i,j)\in[-1,1] and ρ(0,+)\rho\in(0,+\infty) because the mean of Normal distribution can take any value. Note that, unlike the cases when \mathcal{F} is Bernoulli or Poisson, BB can have negative elements for the Normal distribution case.

  • 3.

    Similar to Example 3, τ\tau is an unknown positive value.

  • 4.

    γ=σ2\gamma=\sigma^{2} because γ=maxi[N],j[J]Var(R(i,j))=maxi[N],j[J]σ2=σ2\gamma=\mathrm{max}_{i\in[N],j\in[J]}\mathrm{Var}(R(i,j))=\mathrm{max}_{i\in[N],j\in[J]}\sigma^{2}=\sigma^{2} for Normal distribution.

  • 5.

    Let γ\gamma be its exact value σ2\sigma^{2}, Assumption 1 becomes σ2max(N,J)τ2log(N+J)\sigma^{2}\mathrm{max}(N,J)\geq\tau^{2}\mathrm{log}(N+J) which means that max(N,J)\mathrm{max}(N,J) should be set larger than τ2log(N+J)σ2\frac{\tau^{2}\mathrm{log}(N+J)}{\sigma^{2}} for our theoretical analysis.

  • 6.

    Let γ\gamma be its exact value σ2\sigma^{2} in Theorem 1 provides the exact forms of error bounds for SCK. We observe that increasing the scaling parameter ρ\rho (or decreasing the variance σ2\sigma^{2}) reduces SCK’s error rates.

Example 5.

Let \mathcal{F} be Exponential distribution such that R(i,j)Exponential(1R0(i,j))R(i,j)\sim\mathrm{Exponential}(\frac{1}{R_{0}(i,j)}), where 1R0(i,j)\frac{1}{R_{0}(i,j)} is the Exponential parameter, i.e., 𝔼(R(i,j))=R0(i,j)\mathbb{E}(R(i,j))=R_{0}(i,j). For this case, we have

  • 1.

    R(i,j)+R(i,j)\in\mathbb{R}_{+}, i.e., R(i,j)R(i,j) is a positive value.

  • 2.

    B(i,j)(0,1]B(i,j)\in(0,1] and ρ(0,+)\rho\in(0,+\infty) because the mean of Exponential distribution can be any positive value.

  • 3.

    Similar to Example 3, τ\tau is an unknown positive value.

  • 4.

    γρ2\gamma\leq\rho^{2} because γ=maxi[N],j[J]Var(R(i,j))=maxi[N],j[J]R02(i,j)ρ2\gamma=\mathrm{max}_{i\in[N],j\in[J]}\mathrm{Var}(R(i,j))=\mathrm{max}_{i\in[N],j\in[J]}R^{2}_{0}(i,j)\leq\rho^{2} for Exponential distribution.

  • 5.

    Let γ\gamma be its upper bound ρ2\rho^{2}, Assumption 1 becomes ρ2τ2log(N+J)/max(N,J)\rho^{2}\geq\tau^{2}\mathrm{log}(N+J)/\mathrm{max}(N,J), a lower bound requirement of ρ\rho.

  • 6.

    Let γ\gamma be its upper bound ρ2\rho^{2} in Theorem 1, the theoretical bounds demonstrate that ρ\rho vanishes, which indicates that increasing ρ\rho has no significant impact on the error rates of SCK.

Example 6.

Let \mathcal{F} be Uniform distribution such that R(i,j)Uniform(0,2R0(i,j))R(i,j)\sim\mathrm{Uniform}(0,2R_{0}(i,j)), where 𝔼(R(i,j))=0+2R0(i,j)2=R0(i,j)\mathbb{E}(R(i,j))=\frac{0+2R_{0}(i,j)}{2}=R_{0}(i,j) holds immediately. For this case, we have

  • 1.

    R(i,j)(0,2ρ)R(i,j)\in(0,2\rho) because 2R0(i,j)2ρ2R_{0}(i,j)\leq 2\rho.

  • 2.

    B(i,j)(0,1]B(i,j)\in(0,1] and ρ(0,+)\rho\in(0,+\infty) because Uniform(0,2R0(i,j))\mathrm{Uniform}(0,2R_{0}(i,j)) allows 2R0(i,j)2R_{0}(i,j) to be any positive value.

  • 3.

    τ\tau is an unknown positive value with an upper bound 2ρ2\rho.

  • 4.

    γρ23\gamma\leq\frac{\rho^{2}}{3} because γ=maxi[N],j[J]Var(R(i,j))=maxi[N],j[J](2R0(i,j)0)212=maxi[N],j[J]R02(i,j)3ρ23\gamma=\mathrm{max}_{i\in[N],j\in[J]}\mathrm{Var}(R(i,j))=\mathrm{max}_{i\in[N],j\in[J]}\frac{(2R_{0}(i,j)-0)^{2}}{12}=\mathrm{max}_{i\in[N],j\in[J]}\frac{R^{2}_{0}(i,j)}{3}\leq\frac{\rho^{2}}{3} for Uniform distribution.

  • 5.

    Let γ\gamma be its upper bound ρ23\frac{\rho^{2}}{3}, Assumption 1 becomes ρ23τ2log(N+J)/max(N,J)\rho^{2}\geq 3\tau^{2}\mathrm{log}(N+J)/\mathrm{max}(N,J), a lower bound requirement of ρ\rho.

  • 6.

    Since ρ\rho disappears in the error bounds when we let γ=ρ23\gamma=\frac{\rho^{2}}{3} in Theorem 1, increasing ρ\rho does not significantly influence SCK’s error rates, a conclusion similar to Example 5.

Example 7.

Our WLCM can also model signed response matrix by setting (R(i,j)=1)=1+R0(i,j)2\mathbb{P}(R(i,j)=1)=\frac{1+R_{0}(i,j)}{2} and (R(i,j)=1)=1R0(i,j)2\mathbb{P}(R(i,j)=-1)=\frac{1-R_{0}(i,j)}{2}, where 𝔼(R(i,j))=1+R0(i,j)21R0(i,j)2=R0(i,j)\mathbb{E}(R(i,j))=\frac{1+R_{0}(i,j)}{2}-\frac{1-R_{0}(i,j)}{2}=R_{0}(i,j) and Equation (2) holds surely. For the signed response matrix, we have

  • 1.

    R(i,j){1,1}R(i,j)\in\{-1,1\}, i.e., R(i,j)R(i,j) only takes two values -1 and 1.

  • 2.

    B(i,j)[1,1]B(i,j)\in[-1,1] and ρ(0,1]\rho\in(0,1] because 1+R0(i,j)2\frac{1+R_{0}(i,j)}{2} and 1R0(i,j)2\frac{1-R_{0}(i,j)}{2} are two probabilities which should range in [0,1][0,1]. Note that similar to Example 4, B(i,j)B(i,j) can be negative for the signed response matrix.

  • 3.

    τ2\tau\leq 2 because R(i,j){1,1}R(i,j)\in\{-1,1\} and R0(i,j)[1,1]R_{0}(i,j)\in[-1,1].

  • 4.

    γ1\gamma\leq 1 because γ=maxi[N],j[J]Var(R(i,j))=maxi[N],j[J](1R02(i,j))1\gamma=\mathrm{max}_{i\in[N],j\in[J]}\mathrm{Var}(R(i,j))=\mathrm{max}_{i\in[N],j\in[J]}(1-R^{2}_{0}(i,j))\leq 1.

  • 5.

    When setting τ=2\tau=2 and γ=1\gamma=1, Assumption 1 turns to be max(N,J)4log(N+J)\mathrm{max}(N,J)\geq 4\mathrm{log}(N+J).

  • 6.

    Setting γ\gamma as its upper bound 11 in Theorem 1 gives that increasing ρ\rho reduces SCK’s error rates.

5 Simulation studies

In this section, we conduct extensive simulation experiments to evaluate the effectiveness of the proposed method and validate our theoretical results in Examples 1-7.

5.1 Baseline method

More than the SCK algorithm, here we briefly provide an alternative spectral method that can also be applied to fit our WLCM model. Recall that R0=ZΘR_{0}=Z\Theta^{\prime} under WLCM, it is easy to see that R0(i,:)=R0(i¯,:)R_{0}(i,:)=R_{0}(\bar{i},:) when two distinct subjects ii and i¯\bar{i} belong to the same extreme latent profile for i,i¯[N]i,\bar{i}\in[N]. Therefore, the population response matrix R0R_{0} features K disparate rows, and running the K-means approach on all rows of R0R_{0} with KK clusters can faithfully recover the classification matrix ZZ in terms of a permutation of the KK extreme latent profiles. R0=ZΘR_{0}=Z\Theta^{\prime} also gives that Θ=R0Z(ZZ)1\Theta=R^{\prime}_{0}Z(Z^{\prime}Z)^{-1}, which suggests the following ideal algorithm called Ideal RMK.

Algorithm 3 Ideal RMK
1:R0,KR_{0},K.
2:A permutation of ZZ and Θ\Theta.
3:Run K-means algorithm on all rows of R0R_{0} with KK clusters to obtain Z𝒫Z\mathcal{P}, a permutation of ZZ.
4:Compute R0Z𝒫((Z𝒫)Z𝒫)1=Θ𝒫R^{\prime}_{0}Z\mathcal{P}((Z\mathcal{P})^{\prime}Z\mathcal{P})^{-1}=\Theta\mathcal{P}, a permutation of Θ\Theta.

Algorithm 4 called RMK is a natural generalization of the Ideal RMK from the oracle case to the real case because 𝔼(R)=R0\mathbb{E}(R)=R_{0} under the WLCM model. Unlike the SCK method, the RMK method does not need to obtain the SVD of the observed weighted response matrix RR.

Algorithm 4 Response Matrix with K-means (RMK for short)
1:R,KR,K.
2:Z^,Θ^\hat{Z},\hat{\Theta}.
3:Run K-means algorithm on all rows of RR with KK clusters to obtain Z^\hat{Z}.
4:Obtain an estimate of Θ\Theta by setting Θ^=RZ^(Z^Z^)1\hat{\Theta}=R^{\prime}\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}.

The computational cost of the first step in RMK is O(lNJK)O(lNJK), where ll denotes the number of iterations for the K-means algorithm. The complexity of the second step in RMK is O(JNK)O(JNK). Therefore, the overall computational cost of RMK is O(lNJK)O(lNJK). When J=βNJ=\beta N for a constant value β(0,1]\beta\in(0,1], the complexity of RMK is O(βlKN2)O(\beta lKN^{2}), and it is larger than the SCK’s complexity O(KN2)O(KN^{2}) when βl>1\beta l>1. Therefore, SCK runs faster than RMK when βl>1\beta l>1, as confirmed by our numerical results in this section.

5.2 Evaluation metric

For the classification of subjects, when the true classification matrix ZZ is known, to evaluate how good the quality of the partition of the subjects into extreme latent profiles, four metrics are considered including the Clustering error f^\hat{f} computed by Equation (7). The other three popular evaluation criteria are Hamming error [37], normalized mutual information (NMI) [38, 39, 40, 41], and adjusted rand index (ARI) [41, 42, 43].

  • 1.

    Hamming error is defined as

    Hammingerror=N1min𝒫𝒫KZ^Z𝒫0,\displaystyle\mathrm{Hamming~{}error}=N^{-1}\mathrm{min}_{\mathcal{P}\in\mathcal{P}_{K}}\|\hat{Z}-Z\mathcal{P}\|_{0},

    where 𝒫K\mathcal{P}_{K} denotes the collection of all KK-by-KK permutation matrices. Hamming error falls within the range [0,1][0,1], and a smaller Hamming error indicates better classification performance.

  • 2.

    Let CC be a K×KK\times K confusion matrix such that C(k,l)C(k,l) is the number of common subjects between 𝒯k\mathcal{T}_{k} and 𝒯^l\hat{\mathcal{T}}_{l} for k,l[K]k,l\in[K]. NMI is defined as

    NMI=2k,lC(k,l)log(C(k,l)NCk.C.l)kCk.log(Ck.N)+lC.llog(C.lN),\displaystyle\mathrm{NMI}=\frac{-2\sum_{k,l}C(k,l)\mathrm{log}(\frac{C(k,l)N}{C_{k.}C_{.l}})}{\sum_{k}C_{k.}\mathrm{log}(\frac{C_{k.}}{N})+\sum_{l}C_{.l}\mathrm{log}(\frac{C_{.l}}{N})},

    where Ck.=m=1KC(k,m)C_{k.}=\sum_{m=1}^{K}C(k,m) and C.l=m=1KC(m,l)C_{.l}=\sum_{m=1}^{K}C(m,l). NMI ranges in [0,1][0,1] and it is the larger the better.

  • 3.

    ARI is defined as

    ARI=k,l(C(k,l)2)k(Ck.2)l(C.l2)(N2)12[k(Ck.2)+l(C.l2)]k(Ck.2)l(C.l2)(N2),\displaystyle\mathrm{ARI}=\frac{\sum_{k,l}\binom{C(k,l)}{2}-\frac{\sum_{k}\binom{C_{k.}}{2}\sum_{l}\binom{C_{.l}}{2}}{\binom{N}{2}}}{\frac{1}{2}[\sum_{k}\binom{C_{k.}}{2}+\sum_{l}\binom{C_{.l}}{2}]-\frac{\sum_{k}\binom{C_{k.}}{2}\sum_{l}\binom{C_{.l}}{2}}{\binom{N}{2}}},

    where (..)\binom{.}{.} is a binomial coefficient. ARI falls within the range [-1,1] and it is the larger the better.

For the estimation of Θ\Theta, we use the Relative l1l_{1} error and the Relative l2l_{2} error to evaluate the performance. The two criteria are defined as

Relativel1error=min𝒫𝒫KΘ^Θ𝒫1Θ1andRelativel2error=min𝒫𝒫KΘ^Θ𝒫FΘF.\displaystyle\mathrm{Relative~{}}l_{1}\mathrm{~{}error}=\mathrm{min}_{\mathcal{P}\in\mathcal{P}_{K}}\frac{\|\hat{\Theta}-\Theta\mathcal{P}\|_{1}}{\|\Theta\|_{1}}\mathrm{~{}and~{}}\mathrm{Relative~{}}l_{2}\mathrm{~{}error}=\mathrm{min}_{\mathcal{P}\in\mathcal{P}_{K}}\frac{\|\hat{\Theta}-\Theta\mathcal{P}\|_{F}}{\|\Theta\|_{F}}.

Both measures are the smaller the better.

5.3 Synthetic weighted response matrices

We conduct numerical studies to examine the accuracy and the efficiency of our SCK and RMK approaches by changing the scaling parameter ρ\rho and the number of subjects NN. Unless specified, in all computer-generated weighted response matrices, we set K=3,J=N5K=3,J=\frac{N}{5}, and the N×KN\times K classification matrix ZZ is generated such that each subject belongs to one of the KK extreme latent profiles with equal probability. For distributions that require BB’s entries to be nonnegative, we let B(j,k)=rand(1)B(j,k)=\mathrm{rand}(1) for j[J],k[K]j\in[J],k\in[K], where rand(1)\mathrm{rand}(1) is a random value simulated from the uniform distribution on [0,1][0,1]. For Normal distribution and signed response matrix that allow BB to have negative entries, we let B(j,k)=2rand(1)1B(j,k)=2\mathrm{rand}(1)-1 for j[J],k[K]j\in[J],k\in[K], i.e., B(j,k)B(j,k) ranges in [1,1][-1,1]. Set Bmax=maxj[J],k[K]|B(j,k)|B_{\mathrm{max}}=\mathrm{max}_{j\in[J],k\in[K]}|B(j,k)|. Because the generation process of BB makes |B(j,k)|[0,1]|B(j,k)|\in[0,1] but cannot guarantee that Bmax=1B_{\mathrm{max}}=1 which is required in the definition of BB. Therefore, we update BB by BBmax\frac{B}{B_{\mathrm{max}}}. For the scaling parameter ρ\rho and the number of subjects NN, they are set independently for each distribution. After setting all model parameters (K,N,J,Z,B,ρ)(K,N,J,Z,B,\rho), we can generate the observed weighted response matrix RR from distribution \mathcal{F} with expectation R0=ZΘ=ρZBR_{0}=Z\Theta^{\prime}=\rho ZB^{\prime} under our WLCM model. By applying the SCK method (and the RMK method) to RR with KK extreme latent profiles, we can compute the evaluation metrics of SCK (and RMK). In every simulation scenario, we generate 50 independent replicates and report the mean of Clustering error (as well as Hamming error, NMI, ARI, Relative l1l_{1} error, Relative l2l_{2} error, and running time) computed from the 50 repetitions for each method.

5.3.1 Bernoulli distribution

When R(i,j)Bernoulli(R0(i,j))R(i,j)\sim\mathrm{Bernoulli}(R_{0}(i,j)) for i[N],j[J]i\in[N],j\in[J], we consider the following two simulations.

Simulation 1(a): changing ρ\rho. Set N=500N=500. For the Bernoulli distribution, the scaling parameter ρ\rho should be set within the range (0,1](0,1] according to Example 1. Here, for simulation studies, we let ρ\rho range in {0.1,0.2,0.3,,1}\{0.1,0.2,0.3,\ldots,1\}.

Simulation 1(b): changing NN. Let ρ=0.1\rho=0.1 and NN range in {1000,2000,,5000}\{1000,2000,\ldots,5000\}.

The results are presented in Figure 1. We observe that SCK outperforms RMK because SCK returns more accurate estimations of (Z,Θ)(Z,\Theta) and SCK runs faster than RMK across all settings. Both methods achieve better performances as ρ\rho increases, which conforms to our analysis in Example 1. Additionally, both algorithms enjoy better performances when the number of subjects NN increases, as predicted by our analysis following Corollary 1.

Refer to caption
(a) Simulation 1(a)
Refer to caption
(b) Simulation 1(a)
Refer to caption
(c) Simulation 1(a)
Refer to caption
(d) Simulation 1(a)
Refer to caption
(e) Simulation 1(a)
Refer to caption
(f) Simulation 1(a)
Refer to caption
(g) Simulation 1(a)
Refer to caption
(h) Simulation 1(b)
Refer to caption
(i) Simulation 1(b)
Refer to caption
(j) Simulation 1(b)
Refer to caption
(k) Simulation 1(b)
Refer to caption
(l) Simulation 1(b)
Refer to caption
(m) Simulation 1(b)
Refer to caption
(n) Simulation 1(b)
Figure 1: Numerical results of Simulation 1.

5.3.2 Binomial distribution

When R(i,j)Binomial(m,R0(i,j)m)R(i,j)\sim\mathrm{Binomial}(m,\frac{R_{0}(i,j)}{m}) for i[N],j[J]i\in[N],j\in[J], we consider the following two simulations.

Simulation 2(a): changing ρ\rho. Set N=500N=500 and m=5m=5. Recall that ρ\rho’s range is (0,m](0,m] when \mathcal{F} is Binomial distribution according to Example 2, here, we let ρ\rho range in {0.2,0.4,0.6,,2}\{0.2,0.4,0.6,\ldots,2\}.

Simulation 2(b): changing NN. Let ρ=0.1,m=5\rho=0.1,m=5, and NN range in {1000,2000,,5000}\{1000,2000,\ldots,5000\}.

Figure 2 presents the corresponding results. We note that SCK and RMK have similar error rates, while SCK runs faster than RMK for this simulation. Meanwhile, increasing ρ\rho (and NN) decreases error rates for both methods, which confirms our findings in Example 2 and Corollary 1.

Refer to caption
(a) Simulation 2(a)
Refer to caption
(b) Simulation 2(a)
Refer to caption
(c) Simulation 2(a)
Refer to caption
(d) Simulation 2(a)
Refer to caption
(e) Simulation 2(a)
Refer to caption
(f) Simulation 2(a)
Refer to caption
(g) Simulation 2(a)
Refer to caption
(h) Simulation 2(b)
Refer to caption
(i) Simulation 2(b)
Refer to caption
(j) Simulation 2(b)
Refer to caption
(k) Simulation 2(b)
Refer to caption
(l) Simulation 2(b)
Refer to caption
(m) Simulation 2(b)
Refer to caption
(n) Simulation 2(b)
Figure 2: Numerical results of Simulation 2.

5.3.3 Poisson distribution

When R(i,j)Poisson(R0(i,j))R(i,j)\sim\mathrm{Poisson}(R_{0}(i,j)) for i[N],j[J]i\in[N],j\in[J], we consider the following two simulations.

Simulation 3(a): changing ρ\rho. Set N=500N=500. Example 3 says that the theoretical range of ρ\rho is (0,+)(0,+\infty) when \mathcal{F} is Poisson distribution. Here, we let ρ\rho range in {0.2,0.4,0.6,,2}\{0.2,0.4,0.6,\ldots,2\}.

Simulation 3(b): changing NN. Let ρ=0.1\rho=0.1 and NN range in {1000,2000,,5000}\{1000,2000,\ldots,5000\}.

Figure 3 displays the numerical results of Simulation 3(a) and Simulation 3(b). The results are similar to those of the Bernoulli distribution case: SCK outperforms RMK in both estimating (Z,Θ)(Z,\Theta) and running time; Both methods perform better as ρ\rho and NN increase, which supports our analysis in Example 3 and Corollary 1.

Refer to caption
(a) Simulation 3(a)
Refer to caption
(b) Simulation 3(a)
Refer to caption
(c) Simulation 3(a)
Refer to caption
(d) Simulation 3(a)
Refer to caption
(e) Simulation 3(a)
Refer to caption
(f) Simulation 3(a)
Refer to caption
(g) Simulation 3(a)
Refer to caption
(h) Simulation 3(b)
Refer to caption
(i) Simulation 3(b)
Refer to caption
(j) Simulation 3(b)
Refer to caption
(k) Simulation 3(b)
Refer to caption
(l) Simulation 3(b)
Refer to caption
(m) Simulation 3(b)
Refer to caption
(n) Simulation 3(b)
Figure 3: Numerical results of Simulation 3.

5.3.4 Normal distribution

When R(i,j)Normal(R0(i,j),σ2)R(i,j)\sim\mathrm{Normal}(R_{0}(i,j),\sigma^{2}) for i[N],j[J]i\in[N],j\in[J], we consider the following two simulations.

Simulation 4(a): changing ρ\rho. Set N=500N=500 and σ2=2\sigma^{2}=2. According to Example 4, the scaling parameter ρ\rho can be set as any positive value when \mathcal{F} is Normal distribution. Here, we let ρ\rho range in {0.2,0.4,0.6,,2}\{0.2,0.4,0.6,\ldots,2\}.

Simulation 4(b): changing NN. Let ρ=0.5,σ2=2\rho=0.5,\sigma^{2}=2, and NN range in {1000,2000,,5000}\{1000,2000,\ldots,5000\}.

Figure 4 shows the results. We see that SCK and RMK have similar performances in estimating model parameters (Z,Θ)(Z,\Theta) while SCK runs faster than RMK. Additionally, the error rates of both approaches decrease when the scaling parameter ρ\rho and the number of subjects NN increase, supporting our findings in Example 4 and Corollary 1.

Refer to caption
(a) Simulation 4(a)
Refer to caption
(b) Simulation 4(a)
Refer to caption
(c) Simulation 4(a)
Refer to caption
(d) Simulation 4(a)
Refer to caption
(e) Simulation 4(a)
Refer to caption
(f) Simulation 4(a)
Refer to caption
(g) Simulation 4(a)
Refer to caption
(h) Simulation 4(b)
Refer to caption
(i) Simulation 4(b)
Refer to caption
(j) Simulation 4(b)
Refer to caption
(k) Simulation 4(b)
Refer to caption
(l) Simulation 4(b)
Refer to caption
(m) Simulation 4(b)
Refer to caption
(n) Simulation 4(b)
Figure 4: Numerical results of Simulation 4.

5.3.5 Exponential distribution

When R(i,j)Exponential(1R0(i,j))R(i,j)\sim\mathrm{Exponential}(\frac{1}{R_{0}(i,j)}) for i[N],j[J]i\in[N],j\in[J], we consider the following two simulations.

Simulation 5(a): changing ρ\rho. Set N=300N=300. According to Example 5, the range of the scaling parameter ρ\rho is (0,+)(0,+\infty) when \mathcal{F} is Exponential distribution. Here, we let ρ\rho range in {1,2,,20}\{1,2,\ldots,20\} for our numerical studies.

Simulation 5(b): changing NN. Let ρ=1\rho=1 and NN range in {300,600,,3000}\{300,600,\ldots,3000\}.

Figure 5 displays the results. We see that both methods provide satisfactory estimations for ZZ and Θ\Theta for their small error rates, large NMI, and large ARI. SCK provides more accurate estimations than RMK and SCK takes less time for estimations than RMK. Meanwhile, we find that increasing ρ\rho does not significantly influence the performances of SCK and RMK and this verifies our theoretical analysis in Example 5 that ρ\rho disappears in the theoretical upper bounds of error rates by setting γ=ρ2\gamma=\rho^{2} in Theorem 1 for Exponential distribution. Furthermore, when we increase NN, both methods perform better and this supports our analysis after Corollary 1.

Refer to caption
(a) Simulation 5(a)
Refer to caption
(b) Simulation 5(a)
Refer to caption
(c) Simulation 5(a)
Refer to caption
(d) Simulation 5(a)
Refer to caption
(e) Simulation 5(a)
Refer to caption
(f) Simulation 5(a)
Refer to caption
(g) Simulation 5(a)
Refer to caption
(h) Simulation 5(b)
Refer to caption
(i) Simulation 5(b)
Refer to caption
(j) Simulation 5(b)
Refer to caption
(k) Simulation 5(b)
Refer to caption
(l) Simulation 5(b)
Refer to caption
(m) Simulation 5(b)
Refer to caption
(n) Simulation 5(b)
Figure 5: Numerical results of Simulation 5.

5.3.6 Uniform distribution

When R(i,j)Uniform(0,2R0(i,j))R(i,j)\sim\mathrm{Uniform}(0,2R_{0}(i,j)) for i[N],j[J]i\in[N],j\in[J], we consider the following two simulations.

Simulation 6(a): changing ρ\rho. Set N=120N=120. According to Example 6, the scaling parameter ρ\rho can be set as any positive value when \mathcal{F} is Uniform distribution. Here, we let ρ\rho range in {1,2,,20}\{1,2,\ldots,20\}.

Simulation 6(b): changing NN. Let ρ=1\rho=1 and NN range in {300,600,,3000}\{300,600,\ldots,3000\}.

Figure 6 displays the numerical results. We see that increasing ρ\rho does not significantly decrease or increase estimation accuracies of SCK and RMK which verifies our theoretical analysis in Example 6. For all settings, SCK runs faster than RMK. When increasing NN, the Clustering error and Hamming error (NMI and ARI) for both approaches are 0 (1), and this suggests that SCK and RMK return the exact estimation of the classification matrix ZZ. This phenomenon occurs because NN is set quite large for Uniform distribution in Simulation 6(b). For the estimation of Θ\Theta, error rates for both methods decrease when we increase NN and this is consistent with our findings following Corollary 1.

Refer to caption
(a) Simulation 6(a)
Refer to caption
(b) Simulation 6(a)
Refer to caption
(c) Simulation 6(a)
Refer to caption
(d) Simulation 6(a)
Refer to caption
(e) Simulation 6(a)
Refer to caption
(f) Simulation 6(a)
Refer to caption
(g) Simulation 6(a)
Refer to caption
(h) Simulation 6(b)
Refer to caption
(i) Simulation 6(b)
Refer to caption
(j) Simulation 6(b)
Refer to caption
(k) Simulation 6(b)
Refer to caption
(l) Simulation 6(b)
Refer to caption
(m) Simulation 6(b)
Refer to caption
(n) Simulation 6(b)
Figure 6: Numerical results of Simulation 6.

5.3.7 Signed response matrix

For signed response matrices when (R(i,j)=1)=1+R0(i,j)2\mathbb{P}(R(i,j)=1)=\frac{1+R_{0}(i,j)}{2} and (R(i,j)=1)=1R0(i,j)2\mathbb{P}(R(i,j)=-1)=\frac{1-R_{0}(i,j)}{2} for i[N],j[J]i\in[N],j\in[J], we consider the following two simulations.

Simulation 7(a): changing ρ\rho. Set N=500N=500. Recall that the theoretical range of the scaling parameter ρ\rho is (0,1](0,1] for signed response matrices according to our analysis in Example 7, here, we let ρ\rho range in {0.1,0.2,,1}\{0.1,0.2,\ldots,1\}.

Simulation 7(b): changing NN. Let ρ=0.2\rho=0.2 and NN range in {1000,2000,,5000}\{1000,2000,\ldots,5000\}.

Figure 7 shows the results. We see that increasing ρ\rho and NN improves the estimation accuracies of SCK and RMK, which confirms our analysis in Example 7 and Corollary 1. Additionally, it is easy to see that both algorithms enjoy similar performances in estimating ZZ and Θ\Theta, and SCK requires less computation time compared to RMK.

Refer to caption
(a) Simulation 7(a)
Refer to caption
(b) Simulation 7(a)
Refer to caption
(c) Simulation 7(a)
Refer to caption
(d) Simulation 7(a)
Refer to caption
(e) Simulation 7(a)
Refer to caption
(f) Simulation 7(a)
Refer to caption
(g) Simulation 7(a)
Refer to caption
(h) Simulation 7(b)
Refer to caption
(i) Simulation 7(b)
Refer to caption
(j) Simulation 7(b)
Refer to caption
(k) Simulation 7(b)
Refer to caption
(l) Simulation 7(b)
Refer to caption
(m) Simulation 7(b)
Refer to caption
(n) Simulation 7(b)
Figure 7: Numerical results of Simulation 7.

5.3.8 Simulated weighted response matrices

For visuality, we plot two weighted response matrices RR generated from the Normal distribution and the Poisson distribution under WLCM. Let K=2,N=16,J=10,σ2=1,(i)=1,(i+8)=2K=2,N=16,J=10,\sigma^{2}=1,\ell(i)=1,\ell(i+8)=2 for i[8]i\in[8], and Θ(j,1)=100,Θ(j,2)=11010j\Theta(j,1)=100,\Theta(j,2)=110-10j for j[10]j\in[10]. Because R0=ZΘR_{0}=Z\Theta^{\prime} has been set, we can generate RR under different distributions with expectation R0R_{0} under the proposed WLCM model. Here, we consider the following two settings.

Simulation 8 (a): When R(i,j)Normal(R0(i,j),σ2)R(i,j)\sim\mathrm{Normal}(R_{0}(i,j),\sigma^{2}) for i[N],j[J]i\in[N],j\in[J], the left panel of Figure 8 displays a weighted response matrix RR generated from Simulation 8 (a).

Simulation 8 (b): When R(i,j)Poisson(R0(i,j))R(i,j)\sim\mathrm{Poisson}(R_{0}(i,j)) for i[N],j[J]i\in[N],j\in[J], the right panel of Figure 8 provides a RR generated from Simulation 8 (b).

Refer to caption
(a) Normal distribution
Refer to caption
(b) Poisson distribution
Figure 8: Illustration for weighted response matrices RR generated from WLCM. In both panels, Sii denote subject ii and Ijj denotes item jj for i[16],j[10]i\in[16],j\in[10].
Table 1: Error rates of SCK and RMK for RR in Figure 8. Values outside (and inside) the brackets are results for RR in panel (a) (and (b)) of Figure 8.
Clustering error Hamming error NMI ARI Relative l1l_{1} error Relative l2l_{2} error
SCK 0 (0) 0 (0) 1 (1) 1 (1) 0.0024 (0.0254) 0.0032 (0.0295)
RMK 0 (0) 0 (0) 1 (1) 1 (1) 0.0024 (0.0245) 0.0032 (0.0295)
Refer to caption
Figure 9: Heatmap of the estimated item parameter matrix Θ^\hat{\Theta} of SCK and RMK for RR in Figure 8.

Error rates of the proposed methods for the observed weighted response matrices provided in Figure 8 are displayed in Table 1. We also plot the estimated item matrix Θ^\hat{\Theta} for both methods in Figure 9. We see that both approaches exactly recover ZZ from RR while they estimate Θ\Theta with slight perturbations. Meanwhile, since Z,Θ,Z,\Theta, and KK are known for this simulation, RR provided in Figure 8 can be regarded as benchmark weighted response matrices, and readers can apply SCK and RMK (and other methods) to RR to check their effectiveness in estimating ZZ and Θ\Theta.

6 Real data applications

As the main goal of this paper is to introduce the proposed WLCM model and the SCK algorithm for weighted response matrices, this section reports empirical results on two data sets with weighted response matrices. Because the true classification matrix and the true item parameter matrix are unknown for real data, and SCK runs much faster than RMK, we only report the outcomes of the SCK approach. For real-world datasets, the number of extreme latent profiles KK is often unknown. Here, we infer KK for real-world weighted response matrices using the following strategy:

K=argmink[rank(R)]RZ^Θ^,\displaystyle K=\mathrm{arg~{}min}_{k\in[\mathrm{rank}(R)]}\|R-\hat{Z}\hat{\Theta}^{\prime}\|, (8)

where Z^\hat{Z} and Θ^\hat{\Theta} are outputs in Algorithm 2 with inputs RR and kk. The method specified in Equation (8) selects KK by picking the one that minimizes the spectral norm difference between RR and Z^Θ^\hat{Z}\hat{\Theta}^{\prime}. The determination of the number of extreme latent profiles K in our WLCM model in a rigorous manner with theoretical guarantees remains a future direction.

6.1 International Personality Item Pool (IPIP) personality test data

Background. We apply SCK to an experiment personality test data called the International Personality Item Pool (IPIP) personality test, which is obtainable for download at https://openpsychometrics.org/_rawdata/. This data consists of 1005 subjects and 40 items. The IPIP data also records the age and gender of each subject. After dropping subjects with missing entries in their responses, age, or gender, and dropping two subjects that are neither male nor female, there are 896 subjects left, i.e., N=896,J=40N=896,J=40. All items are rated on a 5-point scale, where 1=Strongly disagree, 2=Disagree, 3=Neither agree not disagree, 4=Agree, 5=Strongly agree, i.e., R{1,2,3,4,5}896×40R\in\{1,2,3,4,5\}^{896\times 40}, a weighted response matrix. Items 1-10 measure the personality factor Assertiveness (short as “AS”); Items 11-20 measure the personality factor Social confidence (short as “SC”); Items 21-30 measure the personality factor Adventurousness (short as “AD”); Items 31-40 measure the personality factor Dominance (short as “DO”). The details of each item are depicted in Figure 10.

Analysis. We apply Equation (8) to infer KK for the IPIP dataset and find that the estimated value of KK is 3. We then apply the SCK algorithm to the response matrix RR with K=3K=3 to obtain the 896×3896\times 3 matrix Z^\hat{Z} and the 40×340\times 3 matrix Θ^\hat{\Theta}. The running time for SCK on this dataset is around 0.2 seconds.

Table 2: Basic information for each estimated extreme latent profile obtained from Z^\hat{Z} for the IPIP data.
profile 1 profile 2 profile 3
Size 276 226 394
#Male 123 129 241
#Female 153 97 153
Average age of male 35.9837 32.8240 35.9004
Average age of female 35.5425 31.3814 38.7059

Results. For convenience, we denote the estimated three extreme latent profiles as profile 1, profile 2, and profile 3. Based on Z^\hat{Z} and the information of age and gender, we can obtain some basic information (shown in Table 2) such as the size of each profile, number of males (females) in each profile, and the average age of males (and females) in each profile. From Table 2, we see that the number of females is larger than that of males for profile 1 while profiles 2 and 3 have more males. The average age of males (and females) in profile 2 is smaller than that of profiles 1 and 3 while the average age of females in profile 3 is the largest. We can also obtain the average point on each item for males (and females) in each estimated extreme latent profile and the results are shown in panel (a) (and panel (b)) of Figure 10. We observe that males in profile 3 tend to be more confident, more creative, more social, and more open to changes than males in profiles 1 and 2; males in profile 3 are more (less) dominant than males in profile 1 (profile 2). Males in profile 2 are more confident&creative&social&open to changes&dominant than males in profile 1. Meanwhile, in the three estimated extreme latent profiles, females enjoy similar personalities to males. We also find that males in profile 3 (profile 2) are more (less) confident&creative&social&open to changes&dominant than females in profile 3 (profile 2). Furthermore, it is interesting to see that, though males in profile 1 are less confident&creative&social&open to changes than females in profile 1, they are more dominant than females in profile 1. We also plot the average point on each item in each estimated extreme latent profile regardless of gender in panel (c) of Figure 10 where we can draw similar conclusions as that for male. In panel (d) of Figure 10, we plot the heatmap of the estimated item parameter matrix Θ^\hat{\Theta}. By comparing panel (c) with panel (d), we see that the (j,k)(j,k)-th element in the matrix shown in panel (c) is close to Θ^(j,k)\hat{\Theta}(j,k) for j[40],k[3]j\in[40],k\in[3]. Such a result implies that the behavior differences on each item for every extreme latent profile are governed by the item parameter matrix Θ\Theta.

Remark 5.

Recall that 𝔼(R)=R0=ZΘ\mathbb{E}(R)=R_{0}=Z\Theta^{\prime} under the WLCM model, we have R0(i,j)=Θ(j,(i))R_{0}(i,j)=\Theta(j,\ell(i)) for i[N],j[J]i\in[N],j\in[J]. Then we have (i)kR0(i,j)=(i)kΘ(j,(i))=(i)kΘ(j,k)=NkΘ(j,k)\sum_{\ell(i)\equiv k}R_{0}(i,j)=\sum_{\ell(i)\equiv k}\Theta(j,\ell(i))=\sum_{\ell(i)\equiv k}\Theta(j,k)=N_{k}\Theta(j,k) which gives that Θ(j,k)=(i)kR0(i,j)Nk\Theta(j,k)=\frac{\sum_{\ell(i)\equiv k}R_{0}(i,j)}{N_{k}} for k[K]k\in[K]. This interprets why the average value on the jj-th item in the kk-th estimated extreme latent profile approximates Θ^(j,k)\hat{\Theta}(j,k) for j[J],k[K]j\in[J],k\in[K].

Refer to caption
(a) Average point on each item for male in each estimated extreme latent profile
Refer to caption
(b) Average point on each item for female in each estimated extreme latent profile
Refer to caption
(c) Average point on each item in each estimated extreme latent profile
Refer to caption
(d) Heatmap of Θ^\hat{\Theta}
Figure 10: Numerical results for the IPIP data.

6.2 Big Five Personality Test with Random Number (BFPTRN) data

Background. Our SCK method is also applied to personality test data: the Big Five Personality Test with Random Number (BFPTRN) data. This dataset can be downloaded from the same URL as the IPIP data. This data asks respondents to generate random numbers in certain ranges attached to 50 personality items. The Big Five personality traits are extraversion (items E1-E10), neuroticism (items N1-N10), agreeableness (items A1-A10), conscientiousness (items C1-C10), and openness (items O1-O10). The original BFPTRN data contains 1369 subjects. After excluding subjects with missing responses or missing random numbers and removing those with random numbers exceeding the specified range, there remain 1155 subjects, i.e., N=1155,J=50N=1155,J=50. All items are rated using the same 5-point scale as the IPIP data, which results in R{1,2,3,4,5}1155×50R\in\{1,2,3,4,5\}^{1155\times 50} being weighted. The detail of each item and each range for random numbers can be found in Figure 11.

Analysis. The estimated number of extreme latent profiles for the BFPTRN dataset is 3. Applying the SCK approach to RR with K=3K=3 produces the 1155×31155\times 3 matrix Z^\hat{Z} and the 50×350\times 3 matrix Θ^\hat{\Theta}. SCK takes around 1.6 seconds to process this data.

Results. Without confusion, we also let profile 1, profile 2, and profile 3 represent the three estimated extreme latent profiles. Profile 1,2, and 3 have 409, 320, and 426 subjects, respectively. Similar to the IPIP data, based on Z^\hat{Z} and Θ^\hat{\Theta}, we can also obtain the heatmap of the average point on each subject for every profile, the heatmap of the average random number on each range for every profile, and the heatmap of Θ^\hat{\Theta} as shown in Figure 11. We observe that there is no significant connection between the average point and the average random number on each item in each estimated extreme latent profile. From panel (a) of Figure 11, we find that: for extraversion, subjects in profile 1 are the most extrovertive while subjects in profile 2 are the most introverted; for neuroticism, subjects in profile 3 are emotionally stable while subjects in profiles 1 and 2 are emotionally unstable; for agreeableness, subjects in profiles 1 and 3 are easier to get along with than subjects in profile 2; for conscientiousness, subjects in profile 3 are more responsible that those in profiles 1 and 2; for openness, subjects in profiles 1 and 3 are more open than those in profile 2. Meanwhile, the matrix shown in panel (a) approximates Θ^\hat{\Theta} well, which has been explained in Remark 5.

Refer to caption
(a) Average point on each item in each estimated extreme latent profile
Refer to caption
(b) Average random number on each range in each estimated extreme latent profile
Refer to caption
(c) Heatmap of Θ^\hat{\Theta}
Figure 11: Numerical results for the BFPTRN data.

7 Conclusion and future work

In this paper, we introduced the weighted latent class model (WLCM), a novel class of latent class analysis models for categorical data with weighted responses. We studied its model identifiability, developed an efficient inference method SCK to fit WLCM, and built a theoretical guarantee of estimation consistency for the proposed method under WLCM. On the methodology side, the new model WLCM provides exploratory and useful tools for latent class analysis in applications where the categorical data may have weighted responses. WLCM allows the observed weighted response matrix to be generated from any distribution as long as its expectation follows a latent class structure modeled by WLCM. In particular, the popular latent class model is a sub-model of our WLCM, and categorical data with signed responses can also be modeled by WLCM. Ground-truth latent classes of categorical data with weighted responses generated from WLCM serve as benchmarks for evaluating latent class analysis approaches. On the algorithmic side, the SVD-based spectral method SCK is efficient and easy to implement. SCK requires no tuning parameters and it is applicable for any categorical data with weighted responses. This means that researchers in fields such as social, psychological, behavioral, biological sciences, and beyond can design their tests/evaluations/surveys/interviews without worrying that the response should be binary or positive, as our method SCK is applicable for any weighted response matrices in latent class analysis. On the theoretic side, we established the rate of convergence for our method SCK under the proposed model WLCM. We found that SCK exhibits different behaviors when the weighted response matrices are generated from different distributions, and we conducted extensive experiments to verify our theoretical findings. Empirically, we applied our method to two real categorical datasets with weighted responses. We expect that our WLCM model and SCK method will have broad applications for latent class analysis of data with weighted responses in diverse fields, similar to the widespread use of latent class models in recent years.

There are several future directions worth exploring. First, methods with theoretical guarantees should be designed to determine the number of extreme latent profiles KK for observed weighted response matrices generated from any distribution \mathcal{F} under WLCM. Second, the grade of membership (GoM) model [44, 45] provides a richer modeling capacity than the latent class model since GoM allows a subject to belong to multiple extreme latent profiles. Therefore, following the distribution-free idea developed in this work, it is meaningful to extend the model GoM to categorical data with weighted responses. Third, like the LCM can be equipped with individual covariates [46, 47, 48, 49, 50, 51], it is worth considering additional individual covariates into the WLCM analysis. Fourth, our WLCM only considers static latent class analysis and it is meaningful to extend WLCM to the dynamic case [52]. Fifth, our SCK is a spectral clustering method and it is possible to speed up it by applications of the random-projection techniques [53] or the distributed spectral clustering idea [54] to deal with large-scale categorical data for latent class analysis.

CRediT authorship contribution statement

Huan Qing is the sole author of this paper.

Declaration of competing interest

The author declares no competing interests.

Data availability

Data and code will be made available on request.

Appendix A Proofs under WLCM

A.1 Proof of Proposition 1

Proof.

According to Lemma 1, we know that U=ZXU=ZX, where X=ΘVΣ1X=\Theta^{\prime}V\Sigma^{-1}. Similarly, UU can be rewritten as U=Z~X~U=\tilde{Z}\tilde{X}, where X~=Θ~VΣ1\tilde{X}=\tilde{\Theta}^{\prime}V\Sigma^{-1}. Then, for i[N]i\in[N], we have

U(i,:)=Z(i,:)X=X((i),:)=Z~(i,:)X~=X~(~(i),:),\displaystyle U(i,:)=Z(i,:)X=X(\ell(i),:)=\tilde{Z}(i,:)\tilde{X}=\tilde{X}(\tilde{\ell}(i),:), (9)

where ~(i)\tilde{\ell}(i) denotes the extreme latent profile that the ii-th subject belongs in the alternative classification matrix Z~\tilde{Z}. For i¯[N]\bar{i}\in[N] and i¯i\bar{i}\neq i, we have

U(i¯,:)=Z(i¯,:)X=X((i¯),:)=Z~(i¯,:)X~=X~(~(i¯),:).\displaystyle U(\bar{i},:)=Z(\bar{i},:)X=X(\ell(\bar{i}),:)=\tilde{Z}(\bar{i},:)\tilde{X}=\tilde{X}(\tilde{\ell}(\bar{i}),:). (10)

When (i)=(i¯)\ell(i)=\ell(\bar{i}), by the second statement of Lemma 1, we get U(i,:)=U(i¯,:)U(i,:)=U(\bar{i},:). Combining this fact (i.e., U(i,:)=U(i¯,:)U(i,:)=U(\bar{i},:)) with Equations (9) and (10) leads to

X((i),:)=X((i¯),:)=X~(~(i),:)=X~(~(i¯),:)when(i)=(i¯).\displaystyle X(\ell(i),:)=X(\ell(\bar{i}),:)=\tilde{X}(\tilde{\ell}(i),:)=\tilde{X}(\tilde{\ell}(\bar{i}),:)\mathrm{~{}when~{}}\ell(i)=\ell(\bar{i}). (11)

Equation (11) implies that ~(i)=~(i¯)\tilde{\ell}(i)=\tilde{\ell}(\bar{i}) when (i)=(i¯)\ell(i)=\ell(\bar{i}), i.e., for any two distinct subjects ii and i¯\bar{i}, they are in the same extreme latent profile under Z~\tilde{Z} when they are in the same extreme latent profile under ZZ. Therefore, we have Z~=Z𝒫\tilde{Z}=Z\mathcal{P}, where 𝒫\mathcal{P} is a permutation matrix. Combining Z~=Z𝒫\tilde{Z}=Z\mathcal{P} with ZΘ=Z~Θ~Z\Theta^{\prime}=\tilde{Z}\tilde{\Theta}^{\prime} leads to ZΘ=Z~Θ~=Z𝒫Θ~Z\Theta^{\prime}=\tilde{Z}\tilde{\Theta}^{\prime}=Z\mathcal{P}\tilde{\Theta}^{\prime}, which gives that

Z(Θ𝒫Θ~)=0.\displaystyle Z(\Theta^{\prime}-\mathcal{P}\tilde{\Theta}^{\prime})=0. (12)

Taking the transposition of Equation (12) gives

(ΘΘ~𝒫)Z=0.\displaystyle(\Theta-\tilde{\Theta}\mathcal{P}^{\prime})Z^{\prime}=0. (13)

Right multiplying ZZ at both sides of Equation (14) gives

(ΘΘ~𝒫)ZZ=0.\displaystyle(\Theta-\tilde{\Theta}\mathcal{P}^{\prime})Z^{\prime}Z=0. (14)

Since each extreme latent profile is not an empty set, the N×KN\times K classification matrix ZZ has a rank KK, which gives that the K×KK\times K matrix ZZZ^{\prime}Z is nonsingular. Therefore, Right multiplying (ZZ)1(Z^{\prime}Z)^{-1} at both sides of Equation (14) gives Θ=Θ~𝒫\Theta=\tilde{\Theta}\mathcal{P}^{\prime}, i.e., Θ~=Θ𝒫\tilde{\Theta}=\Theta\mathcal{P} since 𝒫\mathcal{P} is a permutation matrix. ∎

A.2 Proof of Lemma 1

Proof.

For the first statement: Since R0=ZΘ=UΣVR_{0}=Z\Theta^{\prime}=U\Sigma V^{\prime}, VV=IK0×K0V^{\prime}V=I_{K_{0}\times K_{0}}, and the K0×K0K_{0}\times K_{0} diagonal matrix Σ\Sigma is nonsingular, we have U=ZΘVΣ1ZXU=Z\Theta^{\prime}V\Sigma^{-1}\equiv ZX, where X=ΘVΣ1X=\Theta^{\prime}V\Sigma^{-1}. Hence, the first statement holds.

For the second statement: For i[N]i\in[N], U=ZXU=ZX gives U(i,:)=Z(i,:)X=X((i),:)U(i,:)=Z(i,:)X=X(\ell(i),:). Then, if (i¯)=(i)\ell(\bar{i})=\ell(i), we have U(i¯,:)=X((i¯),:)=X((i),:)=U(i,:)U(\bar{i},:)=X(\ell(\bar{i}),:)=X(\ell(i),:)=U(i,:), i.e., UU has KK distinct rows. Thus, the second statement holds.

For the third statement: Since R0=ZΘ=UΣVR_{0}=Z\Theta^{\prime}=U\Sigma V^{\prime}, we have ΘZ=VΣUΘZZ=VΣUZΘ=VΣUZ(ZZ)1\Theta Z^{\prime}=V\Sigma U^{\prime}\Rightarrow\Theta Z^{\prime}Z=V\Sigma U^{\prime}Z\Rightarrow\Theta=V\Sigma U^{\prime}Z(Z^{\prime}Z)^{-1} where the K×KK\times K matrix ZZZ^{\prime}Z is nonsingular because each extreme latent profile has at least one subject, i.e., rank(ZZ)=rank(Z)=K\mathrm{rank}(Z^{\prime}Z)=\mathrm{rank}(Z)=K. Thus, the third statement holds.

For the fourth statement: Recall that when K0=KK_{0}=K, we have UN×K,VJ×KU\in\mathbb{R}^{N\times K},V\in\mathbb{R}^{J\times K}, Σ\Sigma is a K×KK\times K full-rank diagonal matrix, and XX is a K×KK\times K matrix, where UU=IK×K,VV=IK×KU^{\prime}U=I_{K\times K},V^{\prime}V=I_{K\times K}. Let Δ=diag(N1,N2,,NK)\Delta=\mathrm{diag}(\sqrt{N_{1}},\sqrt{N_{2}},\ldots,\sqrt{N_{K}}), then

R0=ZΘ=ZΔ1ΔΘ.\displaystyle R_{0}=Z\Theta^{\prime}=Z\Delta^{-1}\Delta\Theta^{\prime}. (15)

It is straightforward to verify that ZΔ1Z\Delta^{-1} is a column orthogonal matrix, i.e., (ZΔ1)ZΔ1=IK×K(Z\Delta^{-1})^{\prime}Z\Delta^{-1}=I_{K\times K}.

Since K0=KK_{0}=K, we have rank(ΔΘ)=K\mathrm{rank}(\Delta\Theta^{\prime})=K. Let U~Σ~V~=ΔΘ\tilde{U}\tilde{\Sigma}\tilde{V}^{\prime}=\Delta\Theta^{\prime} be the compact SVD of ΔΘ\Delta\Theta^{\prime}, where Σ~\tilde{\Sigma} is a K×KK\times K diagonal matrix, U~K×K,V~J×K,U~U~=IK×K\tilde{U}\in\mathbb{R}^{K\times K},\tilde{V}\in\mathbb{R}^{J\times K},\tilde{U}^{\prime}\tilde{U}=I_{K\times K}, and V~V~=IK×K\tilde{V}^{\prime}\tilde{V}=I_{K\times K}. Note that U~K×K\tilde{U}\in\mathbb{R}^{K\times K} and U~U~=IK×K\tilde{U}^{\prime}\tilde{U}=I_{K\times K} imply rank(U~)=K\mathrm{rank}(\tilde{U})=K. Equation (15) implies

R0=ZΘ=UΣV=ZΔ1ΔΘ=ZΔ1U~Σ~V~.\displaystyle R_{0}=Z\Theta^{\prime}=U\Sigma V^{\prime}=Z\Delta^{-1}\Delta\Theta^{\prime}=Z\Delta^{-1}\tilde{U}\tilde{\Sigma}\tilde{V}^{\prime}. (16)

Note that U,V,ZΔ1U~U,V,Z\Delta^{-1}\tilde{U}, and V~\tilde{V} are all orthonormal matrices. Also note that Σ\Sigma and Σ~\tilde{\Sigma} are K×KK\times K diagonal matrices. Then we have

U=ZΔ1U~,Σ=Σ~,andV=V~.\displaystyle U=Z\Delta^{-1}\tilde{U},\Sigma=\tilde{\Sigma},\mathrm{and~{}}V=\tilde{V}. (17)

Recall that U=ZXU=ZX, Equation (17) gives that X=Δ1U~K×KX=\Delta^{-1}\tilde{U}\in\mathbb{R}^{K\times K} and rank(X)=K\mathrm{rank}(X)=K because rank(Δ)=K\mathrm{rank}(\Delta)=K and rank(U~)=K\mathrm{rank}(\tilde{U})=K. We can easily verify that the rows of Δ1U~\Delta^{-1}\tilde{U} are perpendicular to each other and the kk-th row has length 1/Nk\sqrt{1/N_{k}} for k[K]k\in[K], i.e., XX=Δ1U~U~Δ1=Δ2=Δ1\sqrt{XX^{\prime}}=\sqrt{\Delta^{-1}\tilde{U}\tilde{U}^{\prime}\Delta^{-1}}=\sqrt{\Delta^{-2}}=\Delta^{-1}. Thus, the fourth statement holds.

Remark 6.

In this remark, we provide the reason why the fourth statement does not hold when K0<KK_{0}<K. For this case, the rank of ΔΘ\Delta\Theta^{\prime} is K0K_{0}, thus U~K×K0\tilde{U}\in\mathbb{R}^{K\times K_{0}} and rank(U~)=K0\mathrm{rank}(\tilde{U})=K_{0}. Then we have X=Δ1U~K×K0X=\Delta^{-1}\tilde{U}\in\mathbb{R}^{K\times K_{0}} and rank(X)=K0\mathrm{rank}(X)=K_{0}. Thus, rank(XX)=K0<K=rank(Δ2)\mathrm{rank}(XX^{\prime})=K_{0}<K=\mathrm{rank}(\Delta^{-2}), which implies XXΔ1\sqrt{XX^{\prime}}\neq\Delta^{-1} when K0<KK_{0}<K and the fourth statement does not hold.

A.3 Proof of Theorem 1

First, the following two lemmas are provided for our further proof.

Lemma 2.

Under WLCM(Z,Θ,)WLCM(Z,\Theta,\mathcal{F}), we have

max(U^O^UF,V^O^VF)22KRR0ρσK(B)Nmin,\displaystyle\mathrm{max}(\|\hat{U}\hat{O}-U\|_{F},\|\hat{V}\hat{O}-V\|_{F})\leq\frac{2\sqrt{2K}\|R-R_{0}\|}{\rho\sigma_{K}(B)\sqrt{N_{\mathrm{min}}}},

where O^\hat{O} is a KK-by-KK orthogonal matrix.

Proof.

According to the proof of Lemma 3 in [55], there is a K×KK\times K orthogonal matrix O^\hat{O} such that

max(U^O^UF,V^O^VF)2KR^R0λK(R0R0).\displaystyle\mathrm{max}(\|\hat{U}\hat{O}-U\|_{F},\|\hat{V}\hat{O}-V\|_{F})\leq\frac{\sqrt{2K}\|\hat{R}-R_{0}\|}{\sqrt{\lambda_{K}(R_{0}R^{\prime}_{0})}}.

Because R^\hat{R} is the top KK SVD of RR and rank(R0)=K\mathrm{rank}(R_{0})=K, we have RR^RR0\|R-\hat{R}\|\leq\|R-R_{0}\|. Then we have R^R0=R^R+RR02RR0\|\hat{R}-R_{0}\|=\|\hat{R}-R+R-R_{0}\|\leq 2\|R-R_{0}\|, which gives

max(U^O^UF,V^O^VF)22KRR0λK(R0R0).\displaystyle\mathrm{max}(\|\hat{U}\hat{O}-U\|_{F},\|\hat{V}\hat{O}-V\|_{F})\leq\frac{2\sqrt{2K}\|R-R_{0}\|}{\sqrt{\lambda_{K}(R_{0}R^{\prime}_{0})}}. (18)

For λK(R0R0)\lambda_{K}(R_{0}R^{\prime}_{0}), because R0=ZΘ=ρZBR_{0}=Z\Theta^{\prime}=\rho ZB^{\prime} and λK(ZZ)=Nmin\lambda_{K}(Z^{\prime}Z)=N_{\mathrm{min}}, we have

λK(R0R0)\displaystyle\lambda_{K}(R_{0}R^{\prime}_{0}) =λK(ZΘΘZ)=λK(ρ2ZBBZ)=ρ2λK(BBZZ)\displaystyle=\lambda_{K}(Z\Theta^{\prime}\Theta Z^{\prime})=\lambda_{K}(\rho^{2}ZB^{\prime}BZ^{\prime})=\rho^{2}\lambda_{K}(B^{\prime}BZ^{\prime}Z)
ρ2λK(ZZ)λK(BB)=ρ2NminλK(BB).\displaystyle\geq\rho^{2}\lambda_{K}(Z^{\prime}Z)\lambda_{K}(B^{\prime}B)=\rho^{2}N_{\mathrm{min}}\lambda_{K}(B^{\prime}B).

Combining Equation (18) with λK(R0R0)ρ2NminλK(BB)\lambda_{K}(R_{0}R^{\prime}_{0})\geq\rho^{2}N_{\mathrm{min}}\lambda_{K}(B^{\prime}B) gives

max(U^O^UF,V^O^VF)22KRR0ρσK(B)Nmin.\displaystyle\mathrm{max}(\|\hat{U}\hat{O}-U\|_{F},\|\hat{V}\hat{O}-V\|_{F})\leq\frac{2\sqrt{2K}\|R-R_{0}\|}{\rho\sigma_{K}(B)\sqrt{N_{\mathrm{min}}}}.

Lemma 3.

Under WLCM(Z,Θ,)WLCM(Z,\Theta,\mathcal{F}), if Assumption 1 is satisfied, then with probability at least 1o((N+J)3)1-o((N+J)^{-3}),

RR0Cγmax(N,J)log(N+J),\displaystyle\|R-R_{0}\|\leq C\sqrt{\gamma\mathrm{~{}max}(N,J)\mathrm{log}(N+J)},

where CC is a positive constant.

Proof.

This lemma holds by setting α\alpha in Lemma 2 [56] as 3, where Lemma 2 of [56] is obtained from the rectangular version of Bernstein inequality in [57]. ∎

Proof.

Now, we prove the first statement of Theorem 1. Set ς>0\varsigma>0 as a small value, by Lemma 2 of [36] and the fourth statement of Lemma 1, if

KςUU^O^F(1Nk+1Nl)1Nk+1Nl,foreach1klK,\displaystyle\frac{\sqrt{K}}{\varsigma}\|U-\hat{U}\hat{O}\|_{F}(\frac{1}{\sqrt{N_{k}}}+\frac{1}{\sqrt{N_{l}}})\leq\sqrt{\frac{1}{N_{k}}+\frac{1}{N_{l}}},\mathrm{~{}for~{}each~{}}1\leq k\neq l\leq K, (19)

then the Clustering error f^=O(ς2)\hat{f}=O(\varsigma^{2}) using the K-means algorithm. By setting ς=2KNmaxNminUU^O^F\varsigma=\sqrt{\frac{2KN_{\mathrm{max}}}{N_{\mathrm{min}}}}\|U-\hat{U}\hat{O}\|_{F}, we see that Equation (19) always holds for all 1klK1\leq k\neq l\leq K. So we get f^=O(ς2)=O(KNmaxUU^O^F2Nmin)\hat{f}=O(\varsigma^{2})=O(\frac{KN_{\mathrm{max}}\|U-\hat{U}\hat{O}\|^{2}_{F}}{N_{\mathrm{min}}}). According to Lemma 2, we have

f^=O(K2NmaxRR02ρ2σK2(B)Nmin2).\displaystyle\hat{f}=O(\frac{K^{2}N_{\mathrm{max}}\|R-R_{0}\|^{2}}{\rho^{2}\sigma^{2}_{K}(B)N^{2}_{\mathrm{min}}}).

By Lemma 3, we have

f^=O(γK2Nmaxmax(N,J)log(N+J)ρ2σK2(B)Nmin2).\displaystyle\hat{f}=O(\frac{\gamma K^{2}N_{\mathrm{max}}\mathrm{max}(N,J)\mathrm{log}(N+J)}{\rho^{2}\sigma^{2}_{K}(B)N^{2}_{\mathrm{min}}}).

Next, we prove the second statement of Theorem 1. Since U=ZXU=ZX by Equation (3) in Lemma 1 and UU=IK×KU^{\prime}U=I_{K\times K}, we have XZZX=IK×KX^{\prime}Z^{\prime}ZX=I_{K\times K} which gives that (ZZ)1=XX(Z^{\prime}Z)^{-1}=XX^{\prime} and λ1(XX)=σ12(XX)=1λK(ZZ)=1Nmin\lambda_{1}(XX^{\prime})=\sigma^{2}_{1}(XX^{\prime})=\frac{1}{\lambda_{K}(Z^{\prime}Z)}=\frac{1}{N_{\mathrm{min}}}. We also have Z(ZZ)1=ZXX=UXZ(Z^{\prime}Z)^{-1}=ZXX^{\prime}=UX^{\prime}. Similarly, we have Z^(Z^Z^)1U^X^\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}\approx\hat{U}\hat{X}^{\prime}, where X^\hat{X} is the K×KK\times K centroid matrix returned by K-means method for U^\hat{U}. Recall that R^=U^Σ^V^\hat{R}=\hat{U}\hat{\Sigma}\hat{V}^{\prime}, combine it with Equation (4) and Lemma 3, we have

Θ^Θ𝒫\displaystyle\|\hat{\Theta}-\Theta\mathcal{P}\| =V^Σ^U^Z^(Z^Z^)1VΣUZ(ZZ)1𝒫=R^Z^(Z^Z^)1R0Z(ZZ)1𝒫\displaystyle=\|\hat{V}\hat{\Sigma}\hat{U}^{\prime}\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-V\Sigma U^{\prime}Z(Z^{\prime}Z)^{-1}\mathcal{P}\|=\|\hat{R}^{\prime}\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-R^{\prime}_{0}Z(Z^{\prime}Z)^{-1}\mathcal{P}\|
=(R^R0)Z^(Z^Z^)1+R0(Z^(Z^Z^)1Z(ZZ)1𝒫)(R^R0)Z^(Z^Z^)1+R0(Z^(Z^Z^)1Z(ZZ)1𝒫)\displaystyle=\|(\hat{R}^{\prime}-R^{\prime}_{0})\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}+R^{\prime}_{0}(\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P})\|\leq\|(\hat{R}^{\prime}-R^{\prime}_{0})\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}\|+\|R^{\prime}_{0}(\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P})\|
R^R0Z^(Z^Z^)1+R0Z^(Z^Z^)1Z(ZZ)1𝒫2RR0Z^(Z^Z^)1+R0Z^(Z^Z^)1Z(ZZ)1𝒫\displaystyle\leq\|\hat{R}-R_{0}\|\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}\|+\|R_{0}\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}\|\leq 2\|R-R_{0}\|\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}\|+\|R_{0}\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}\|
=2RR0Z^(Z^Z^)1+ρZBZ^(Z^Z^)1Z(ZZ)1𝒫2RR0Z^(Z^Z^)1+ρZBZ^(Z^Z^)1Z(ZZ)1𝒫\displaystyle=2\|R-R_{0}\|\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}\|+\|\rho ZB^{\prime}\|\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}\|\leq 2\|R-R_{0}\|\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}\|+\rho\|Z\|\|B\|\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}\|
=2RR0Z^(Z^Z^)1+ρσ1(B)NmaxZ^(Z^Z^)1Z(ZZ)1𝒫\displaystyle=2\|R-R_{0}\|\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}\|+\rho\sigma_{1}(B)\sqrt{N_{\mathrm{max}}}\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}\|
=2RR0Z^(Z^Z^)1Z(ZZ)1𝒫+Z(ZZ)1𝒫+ρσ1(B)NmaxZ^(Z^Z^)1Z(ZZ)1𝒫\displaystyle=2\|R-R_{0}\|\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}+Z(Z^{\prime}Z)^{-1}\mathcal{P}\|+\rho\sigma_{1}(B)\sqrt{N_{\mathrm{max}}}\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}\|
2RR0(Z^(Z^Z^)1Z(ZZ)1𝒫+Z(ZZ)1𝒫)+ρσ1(B)NmaxZ^(Z^Z^)1Z(ZZ)1𝒫\displaystyle\leq 2\|R-R_{0}\|(\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}\|+\|Z(Z^{\prime}Z)^{-1}\mathcal{P}\|)+\rho\sigma_{1}(B)\sqrt{N_{\mathrm{max}}}\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}\|
2RR0(Z^(Z^Z^)1Z(ZZ)1𝒫+Z(ZZ)1𝒫)+ρσ1(B)NmaxZ^(Z^Z^)1Z(ZZ)1𝒫\displaystyle\leq 2\|R-R_{0}\|(\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}\|+\|Z(Z^{\prime}Z)^{-1}\|\|\mathcal{P}\|)+\rho\sigma_{1}(B)\sqrt{N_{\mathrm{max}}}\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}\|
=2RR0(Z^(Z^Z^)1Z(ZZ)1𝒫+1Nmin)+ρσ1(B)NmaxZ^(Z^Z^)1Z(ZZ)1𝒫\displaystyle=2\|R-R_{0}\|(\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}\|+\frac{1}{\sqrt{N_{\mathrm{min}}}})+\rho\sigma_{1}(B)\sqrt{N_{\mathrm{max}}}\|\hat{Z}(\hat{Z}^{\prime}\hat{Z})^{-1}-Z(Z^{\prime}Z)^{-1}\mathcal{P}\|
=O(RR0(U^X^UX𝒫+1Nmin))O(RR0(U^X^+UX𝒫+1Nmin))\displaystyle=O(\|R-R_{0}\|(\|\hat{U}\hat{X}^{\prime}-UX^{\prime}\mathcal{P}\|+\frac{1}{\sqrt{N_{\mathrm{min}}}}))\leq O(\|R-R_{0}\|(\|\hat{U}\hat{X}^{\prime}\|+\|UX^{\prime}\mathcal{P}\|+\frac{1}{\sqrt{N_{\mathrm{min}}}}))
O(RR0(U^X^+UX𝒫+1Nmin))=O(RR0(X^+X+1Nmin))\displaystyle\leq O(\|R-R_{0}\|(\|\hat{U}\|\|\hat{X}\|+\|U\|\|X\|\|\mathcal{P}\|+\frac{1}{\sqrt{N_{\mathrm{min}}}}))=O(\|R-R_{0}\|(\|\hat{X}\|+\|X\|+\frac{1}{\sqrt{N_{\mathrm{min}}}}))
=O(RR0Nmin)=O(γmax(N,J)log(N+J)Nmin)\displaystyle=O(\frac{\|R-R_{0}\|}{\sqrt{N_{\mathrm{min}}}})=O(\sqrt{\frac{\gamma\mathrm{~{}max}(N,J)\mathrm{log}(N+J)}{N_{\mathrm{min}}}})

Since (Θ^Θ𝒫)(\hat{\Theta}-\Theta\mathcal{P}) is a J×KJ\times K matrix and KJK\ll J in this paper, we have rank(Θ^Θ𝒫)=K\mathrm{rank}(\hat{\Theta}-\Theta\mathcal{P})=K. Since MFrank(M)M\|M\|_{F}\leq\sqrt{\mathrm{rank}(M)}\|M\| holds for any matrix MM, we have Θ^Θ𝒫FKΘ^Θ𝒫\|\hat{\Theta}-\Theta\mathcal{P}\|_{F}\leq\sqrt{K}\|\hat{\Theta}-\Theta\mathcal{P}\|. Thus, we have

Θ^Θ𝒫F=O(γKmax(N,J)log(N+J)Nmin).\displaystyle\|\hat{\Theta}-\Theta\mathcal{P}\|_{F}=O(\sqrt{\frac{\gamma K\mathrm{~{}max}(N,J)\mathrm{log}(N+J)}{N_{\mathrm{min}}}}). (20)

Combing Equation (20) with the fact that ΘFΘ=ρB=ρB=ρσ1(B)ρσK(B)\|\Theta\|_{F}\geq\|\Theta\|=\|\rho B\|=\rho\|B\|=\rho\sigma_{1}(B)\geq\rho\sigma_{K}(B) gives

Θ^Θ𝒫FΘFΘ^Θ𝒫FρσK(B)=O(γKmax(N,J)log(N+J)ρσK(B)Nmin).\displaystyle\frac{\|\hat{\Theta}-\Theta\mathcal{P}\|_{F}}{\|\Theta\|_{F}}\leq\frac{\|\hat{\Theta}-\Theta\mathcal{P}\|_{F}}{\rho\sigma_{K}(B)}=O(\frac{\sqrt{\gamma K\mathrm{max}(N,J)\mathrm{log}(N+J)}}{\rho\sigma_{K}(B)\sqrt{N_{\mathrm{min}}}}).

Recall that the JJ-by-KK matrix BB satisfies maxj[J],k[K]|B(j,k)|=1\mathrm{max}_{j\in[J],k\in[K]}|B(j,k)|=1, we have σK(B)\sigma_{K}(B) is at least of the order JK1\sqrt{J}-\sqrt{K-1} with high probability by applying the lower bound of the smallest singular value of a random rectangular matrix in [58]. Since KJK\ll J in this paper, we have

f^=O(γK2Nmaxmax(N,J)log(N+J)ρ2Nmin2J)andΘ^Θ𝒫FΘF=O(γKmax(N,J)log(N+J)ρNminJ).\displaystyle\hat{f}=O(\frac{\gamma K^{2}N_{\mathrm{max}}\mathrm{max}(N,J)\mathrm{log}(N+J)}{\rho^{2}N^{2}_{\mathrm{min}}J})\mathrm{~{}and~{}}\frac{\|\hat{\Theta}-\Theta\mathcal{P}\|_{F}}{\|\Theta\|_{F}}=O(\frac{\sqrt{\gamma K\mathrm{max}(N,J)\mathrm{log}(N+J)}}{\rho\sqrt{N_{\mathrm{min}}J}}).

References

  • [1] C. M. Dayton, G. B. Macready, Concomitant-variable latent-class models, Journal of the american statistical association 83 (401) (1988) 173–178.
  • [2] J. A. Hagenaars, A. L. McCutcheon, Applied latent class analysis, Cambridge University Press, 2002.
  • [3] J. Magidson, J. K. Vermunt, Latent class models, The Sage handbook of quantitative methodology for the social sciences (2004) 175–198.
  • [4] G. Guo, J. Zhang, D. Thalmann, N. Yorke-Smith, Etaf: An extended trust antecedents framework for trust prediction, in: 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014), IEEE, 2014, pp. 540–547.
  • [5] F. M. Harper, J. A. Konstan, The movielens datasets: History and context, Acm transactions on interactive intelligent systems (tiis) 5 (4) (2015) 1–19.
  • [6] G. J. Meyer, S. E. Finn, L. D. Eyde, G. G. Kay, K. L. Moreland, R. R. Dies, E. J. Eisman, T. W. Kubiszyn, G. M. Reed, Psychological testing and psychological assessment: A review of evidence and issues., American psychologist 56 (2) (2001) 128.
  • [7] J. J. Silverman, M. Galanter, M. Jackson-Triche, D. G. Jacobs, J. W. Lomax, M. B. Riba, L. D. Tong, K. E. Watkins, L. J. Fochtmann, R. S. Rhoads, et al., The american psychiatric association practice guidelines for the psychiatric evaluation of adults, American Journal of Psychiatry 172 (8) (2015) 798–802.
  • [8] J. De La Torre, L. A. van der Ark, G. Rossi, Analysis of clinical data from a cognitive diagnosis modeling framework, Measurement and Evaluation in Counseling and Development 51 (4) (2018) 281–296.
  • [9] Y. Chen, X. Li, S. Zhang, Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis, Psychometrika 84 (2019) 124–146.
  • [10] Z. Shang, E. A. Erosheva, G. Xu, Partial-mastery cognitive diagnosis models, The Annals of Applied Statistics 15 (3) (2021) 1529–1555.
  • [11] K. T. Poole, Nonparametric unfolding of binary choice data, Political Analysis 8 (3) (2000) 211–237.
  • [12] J. Clinton, S. Jackman, D. Rivers, The statistical analysis of roll call data, American Political Science Review 98 (2) (2004) 355–370.
  • [13] R. Bakker, K. T. Poole, Bayesian metric multidimensional scaling, Political analysis 21 (1) (2013) 125–140.
  • [14] Y. Chen, Z. Ying, H. Zhang, Unfolding-model-based visualization: theory, method and applications, The Journal of Machine Learning Research 22 (1) (2021) 548–598.
  • [15] J. Martinez-Moya, M. Feo-Valero, Do shippers’ characteristics influence port choice criteria? capturing heterogeneity by using latent class models, Transport Policy 116 (2022) 96–105.
  • [16] A. K. Formann, T. Kohlmann, Latent class analysis in medical research, Statistical methods in medical research 5 (2) (1996) 179–211.
  • [17] A. Kongsted, A. M. Nielsen, Latent class analysis in health research, Journal of physiotherapy 63 (1) (2017) 55–58.
  • [18] Z. Wu, M. Deloria-Knoll, S. L. Zeger, Nested partially latent class models for dependent binary data; estimating disease etiology, Biostatistics 18 (2) (2017) 200–213.
  • [19] P. G. Van der Heijden, J. Dessens, U. Bockenholt, Estimating the concomitant-variable latent-class model with the em algorithm, Journal of Educational and Behavioral Statistics 21 (3) (1996) 215–229.
  • [20] Z. Bakk, J. K. Vermunt, Robustness of stepwise latent class modeling with continuous distal outcomes, Structural equation modeling: a multidisciplinary journal 23 (1) (2016) 20–31.
  • [21] H. Chen, L. Han, A. Lim, Beyond the em algorithm: constrained optimization methods for latent class model, Communications in Statistics-Simulation and Computation 51 (9) (2022) 5222–5244.
  • [22] Y. Gu, G. Xu, A joint mle approach to large-scale structured latent attribute analysis, Journal of the American Statistical Association 118 (541) (2023) 746–760.
  • [23] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, M. Telgarsky, Tensor decompositions for learning latent variable models, Journal of machine learning research 15 (2014) 2773–2832.
  • [24] Z. Zeng, Y. Gu, G. Xu, A tensor-em method for large-scale latent class analysis with binary responses, Psychometrika 88 (2) (2023) 580–612.
  • [25] A. K. Formann, Constrained latent class models: Theory and applications, British Journal of Mathematical and Statistical Psychology 38 (1) (1985) 87–111.
  • [26] B. Lindsay, C. C. Clogg, J. Grego, Semiparametric estimation in the rasch model and related exponential response models, including a simple latent class model for item analysis, Journal of the American Statistical Association 86 (413) (1991) 96–107.
  • [27] N. L. Zhang, Hierarchical latent class models for cluster analysis, The Journal of Machine Learning Research 5 (2004) 697–723.
  • [28] C.-C. Yang, Evaluating latent class analysis models in qualitative phenotype identification, Computational statistics & data analysis 50 (4) (2006) 1090–1104.
  • [29] G. Xu, Identifiability of restricted latent class models with binary responses, The Annals of Statistics 45 (2) (2017) 675 – 707.
  • [30] G. Xu, Z. Shang, Identifying latent structures in restricted latent class models, Journal of the American Statistical Association 113 (523) (2018) 1284–1295.
  • [31] W. Ma, W. Guo, Cognitive diagnosis models for multiple strategies, British Journal of Mathematical and Statistical Psychology 72 (2) (2019) 370–392.
  • [32] Y. Gu, G. Xu, Partial identifiability of restricted latent class models, The Annals of Statistics 48 (4) (2020) 2082 – 2107.
  • [33] M. E. Newman, Analysis of weighted networks, Physical review E 70 (5) (2004) 056131.
  • [34] T. Derr, C. Johnson, Y. Chang, J. Tang, Balance in signed bipartite networks, in: Proceedings of the 28th ACM International Conference on Information and Knowledge Management, 2019, pp. 1221–1230.
  • [35] K. Goldberg, T. Roeder, D. Gupta, C. Perkins, Eigentaste: A constant time collaborative filtering algorithm, information retrieval 4 (2001) 133–151.
  • [36] A. Joseph, B. Yu, Impact of regularization on spectral clustering, Annals of Statistics 44 (4) (2016) 1765–1791.
  • [37] J. Jin, Fast community detection by SCORE, Annals of Statistics 43 (1) (2015) 57–89.
  • [38] A. Strehl, J. Ghosh, Cluster ensembles—a knowledge reuse framework for combining multiple partitions, Journal of machine learning research 3 (Dec) (2002) 583–617.
  • [39] L. Danon, A. Diaz-Guilera, J. Duch, A. Arenas, Comparing community structure identification, Journal of statistical mechanics: Theory and experiment 2005 (09) (2005) P09008.
  • [40] J. P. Bagrow, Evaluating local community methods in networks, Journal of Statistical Mechanics: Theory and Experiment 2008 (05) (2008) P05001.
  • [41] W. Luo, Z. Yan, C. Bu, D. Zhang, Community detection by fuzzy relations, IEEE Transactions on Emerging Topics in Computing 8 (2) (2017) 478–492.
  • [42] L. Hubert, P. Arabie, Comparing partitions, Journal of classification 2 (1985) 193–218.
  • [43] N. X. Vinh, J. Epps, J. Bailey, Information theoretic measures for clusterings comparison: is a correction for chance necessary?, in: Proceedings of the 26th annual international conference on machine learning, 2009, pp. 1073–1080.
  • [44] M. A. Woodbury, J. Clive, A. Garson Jr, Mathematical typology: a grade of membership technique for obtaining disease definition, Computers and biomedical research 11 (3) (1978) 277–298.
  • [45] E. A. Erosheva, Comparing latent structures of the grade of membership, rasch, and latent class models, Psychometrika 70 (4) (2005) 619–628.
  • [46] G.-H. Huang, K. Bandeen-Roche, Building an identifiable latent class model with covariate effects on underlying and measured variables, Psychometrika 69 (1) (2004) 5–32.
  • [47] A. Forcina, Identifiability of extended latent class models with individual covariates, Computational Statistics & Data Analysis 52 (12) (2008) 5263–5268.
  • [48] B. A. Reboussin, E. H. Ip, M. Wolfson, Locally dependent latent class models with covariates: an application to under-age drinking in the usa, Journal of the Royal Statistical Society Series A: Statistics in Society 171 (4) (2008) 877–897.
  • [49] J. K. Vermunt, Latent class modeling with covariates: Two improved three-step approaches, Political analysis 18 (4) (2010) 450–469.
  • [50] R. Di Mari, Z. Bakk, A. Punzo, A random-covariate approach for distal outcome prediction with latent class analysis, Structural Equation Modeling: A Multidisciplinary Journal 27 (3) (2020) 351–368.
  • [51] Z. Bakk, R. Di Mari, J. Oser, J. Kuha, Two-stage multilevel latent class analysis with covariates in the presence of direct effects, Structural Equation Modeling: A Multidisciplinary Journal 29 (2) (2022) 267–277.
  • [52] T. Asparouhov, E. L. Hamaker, B. Muthén, Dynamic latent class analysis, Structural Equation Modeling: A Multidisciplinary Journal 24 (2) (2017) 257–269.
  • [53] H. Zhang, X. Guo, X. Chang, Randomized spectral clustering in large-scale stochastic block models, Journal of Computational and Graphical Statistics 31 (3) (2022) 887–906.
  • [54] S. Wu, Z. Li, X. Zhu, A distributed community detection algorithm for large scale networks under stochastic block models, Computational Statistics & Data Analysis (2023) 107794.
  • [55] Z. Zhou, A. A. Amini, Analysis of spectral clustering algorithms for community detection: the general bipartite setting, Journal of Machine Learning Research 20 (47) (2019) 1–47.
  • [56] H. Qing, J. Wang, Community detection for weighted bipartite networks, Knowledge-Based Systems 274 (2023) 110643.
  • [57] J. A. Tropp, User-friendly tail bounds for sums of random matrices, Foundations of Computational Mathematics 12 (4) (2012) 389–434.
  • [58] M. Rudelson, R. Vershynin, Smallest singular value of a random rectangular matrix, Communications on Pure and Applied Mathematics: A Journal Issued by the Courant Institute of Mathematical Sciences 62 (12) (2009) 1707–1739.