This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

UKTF: Unified Knowledge Tracing Framework for Subjective and Objective Assessments
thanks: Corresponding author: Chunyan Zeng, Email: [email protected].

1st Zhifeng Wang CCNU Wollongong Joint Institute
Central China Normal University
Wuhan 430079, China
[email protected]
   2nd Jiaqin Wan Faculty of Artificial Intelligence in Education
Central China Normal University
Wuhan, China
[email protected]
   3rd Yang Yang CCNU Wollongong Joint Institute
Central China Normal University
Wuhan 430079, China
[email protected]
   4th Chunyan Zeng School of Electrical and Electronic Engineering
Hubei University of Technology
Wuhan 430068, China
[email protected]
   5th Jialiang Shen Faculty of Artificial Intelligence in Education
Central China Normal University
Wuhan 430079, China
[email protected]
Abstract

With the continuous deepening and development of the concept of smart education, learners’ comprehensive development and individual needs have received increasing attention. However, traditional educational evaluation systems tend to assess learners’ cognitive abilities solely through general test scores, failing to comprehensively consider their actual knowledge states. Knowledge tracing technology can establish knowledge state models based on learners’ historical answer data, thereby enabling personalized assessment of learners. Nevertheless, current classical knowledge tracing models are primarily suited for objective test questions, while subjective test questions still confront challenges such as complex data representation, imperfect modeling, and the intricate and dynamic nature of knowledge states. Drawing on the application of knowledge tracing technology in education, this study aims to fully utilize examination data and proposes a unified knowledge tracing model that integrates both objective and subjective test questions. Recognizing the differences in question structure, assessment methods, and data characteristics between objective and subjective test questions, the model employs the same backbone network for training both types of questions. Simultaneously, it achieves knowledge tracing for subjective test questions by universally modifying the training approach of the baseline model, adding branch networks, and optimizing the method of question encoding. This study conducted multiple experiments on real datasets, and the results consistently demonstrate that the model effectively addresses knowledge tracing issues in both objective and subjective test question scenarios.

Index Terms:
knowledge tracing, objective assessment, subjective assessment, deep learning

I Introduction

With the thriving development of online adaptive learning platforms [1, 2], educational information technology is gradually infiltrating every aspect of education [3]. In this context, knowledge tracing technology [4, 5, 6], as the cornerstone of online intelligent education, bears the crucial task of providing precise support for personalized learning, thereby attracting continuous attention from researchers in the field of educational technology [7].

In 1972, Atkinson first introduced the knowledge tracing model[8], a seminal model simulating learners’ mastery of knowledge. Subsequently, in 1995, Corbett and Anderson incorporated the knowledge tracing model into the realm of intelligent education and successfully applied it to intelligent educational systems[9]. Early knowledge tracing models primarily relied on machine learning methods, which, despite their simplicity and interpretability, often yielded unsatisfactory results. Recently, with the rise of deep learning [10], an increasing number of deep learning approaches have been applied to the field of knowledge tracing. Leveraging its powerful feature representation and learning capabilities, deep learning has provided more precise and effective modeling tools for knowledge tracing.

This study employs three classic models as baselines: the Deep Knowledge Tracing model (DKT), the Dynamic Key-Value Memory Networks model (DKVMN), and the Graph-based Knowledge Tracing model (GKT). Drawing on learners’ performance on both objective and subjective test questions, we have realized a unified knowledge tracing model for both types of questions. The primary contributions of this research are as follows:

1) Given the binary nature of answers to objective test questions, the model adopts a classification approach for training, utilizing the backbone network to trace knowledge in objective test questions. As subjective test scores follow a multi-value discrete distribution, to accommodate the assessment and prediction requirements of subjective test questions, we convert the binary classification problem into a regression problem.

2) While maintaining a unified backbone network, we achieve knowledge tracing for subjective test questions by uniformly altering the model’s training approach, adding branch networks, and optimizing question encodings.

3) This study conducted extensive experiments using realworld datasets, demonstrating that the model effectively addresses knowledge tracing issues in both objective and subjective test question scenarios.

The rest of the paper is organized as follows. In Section II, we review the related work. In Section III, we introduce the main research methods. In Section IV, we describe the details of the experiment and the results. Finally, we have a summary of this work in Section V.

II Related Work

Current knowledge tracing models are primarily divided into two categories: traditional knowledge tracing models based on machine learning and knowledge tracing models based on deep learning.

II-A Traditional Knowledge Tracing Models Based on Machine Learning

Although traditional knowledge tracing models based on machine learning offer good interpretability, they rely on theoretical assumptions and require manual construction of input features, which can often be one-sided and limited. Consequently, the predictive performance of these models is generally mediocre. Bayesian Knowledge Tracing (BKT) is a representative model from this period. Its core principle is based on the Hidden Markov Model (HMM) for time series analysis [5]. However, early BKT model simply categorized students’ knowledge states into two categories: mastered and not mastered. To address this issue, Zhang et al. [11] extended the original two states to three, including an intermediate possible mastery state. Wang et al. [12]refined the representation of students’ learning status by replacing binary nodes with continuous values between 0 and 1. Given the potential dependencies between knowledge concepts, K¨aser proposed Dynamic Bayesian Knowledge Tracing (DBKT) [13]. In the DBKT model, a learner’s mastery level of a particular knowledge concept is also constrained by their mastery of other related concepts.

II-B Knowledge Tracing Models Based on Deep Learning

Deep neural networks have been a huge success in speech [14, 15, 16] and image processing [17, 18], and knowledge tracking as well. Based on the different neural networks used, knowledge tracing models based on deep learning can be broadly classified into four categories: those based on Recurrent Neural Networks (RNN) [19], those based on Dynamic Key-Value Memory Networks [20], those incorporating Attention Mechanisms[21], and those utilizing Graph Neural Networks (GNN)[22]. Among these, the DKT, DKVMN, and GKT models are the focus of this study.

The Deep Knowledge Tracing model (DKT), pioneered by Chris Piech et al. in 2015 [23], introduced a novel modeling approach for the field of knowledge tracing using Recurrent Neural Networks (RNN)[19]. Addressing the issues of input irreconstructibility and persistent fluctuations in learner knowledge levels in the DKT model, the DKT+ model[24] optimized the loss function by introducing regularization terms to ensure the stability of student knowledge levels. In 2017, Zhang et al. combined the advantages of Memory Augmented Networks (MANN) to propose the Dynamic Key-Value Memory Networks (DKVMN) [20], which, for the first time, employed a dynamic key-value memory network approach to enable modeling of multiple knowledge concept levels. While DKVMN improved interpretability compared to DKT, it struggled to effectively capture sequential dependencies in learners’ answer sequences. To address this, Abdelrahman et al.[25] proposed the SKVMN model, which introduced an improved Hop-LSTM neural network to tackle the issue of DKVMN’s inability to capture long-term dependencies in learner interactions. In 2019, Nakagawa et al. [26] first applied Graph Neural Networks to knowledge tracing, introducing the GKT model, which utilizes a graph structure to represent the relationships between students’ answer histories and knowledge concepts. The GKT model reformulates knowledge tracing as a temporal node-level classification problem in GNN, decomposing course knowledge into several knowledge concepts and utilizing GNN for node state updates and information aggregation to predict students’ future mastery of various knowledge points. Subsequently, Yang et al. further introduced the GIKT model [27], which fully integrates the correlation between problems and skills through embedding propagation.

II-B1 Deep Knowledge Tracing Model

  • Answer Data Modeling: The learner’s answer data is organized into a time series, where each time point corresponds to an answer event. Each answer event comprises question information and the answer result. The learner’s answer data at time tt, denoted as xtx_{t}, is set as a one-hot encoding of the learner’s interaction tuple {et,rt}\{e_{t},r_{t}\}, where xt{0,1}2Nx_{t}\in\{0,1\}^{2N}.

  • Knowledge State Modeling: The Long Short-Term Memory model (LSTM) is employed to model the temporal dependencies in learners’ answer sequences. A knowledge state variable hth_{t} is introduced to represent the learners’ mastery level of the knowledge concepts.

  • Parameter Learning and Optimization: Through supervised learning, the parameters of the model are learned. Using known answer data, the parameters are adjusted through the backpropagation algorithm and a gradient descent optimizer to minimize the difference between the predicted answer outcomes and the actual answer outcomes. The loss computation for the entire answer sequence of a single learner is as follows:

    L=t(yTδ(et+1),rt+1)L=\sum_{t}\ell(y^{T}\delta(e_{t+1}),r_{t+1}) (1)
  • Answer Prediction: The knowledge state variable hth_{t} is passed through an output layer to map it to the probability of the student answering each question correctly, thereby enabling the prediction of answers.

II-B2 The Dynamic Key-Value Memory Network Model

  • Knowledge Encoding: The learner’s answer data is encoded into a form suitable for model processing, and firstly, one-hot encoding is applied to obtain the neural network’s input data ete_{t} and the combined value matrix components’ data xt=(et,rt)x_{t}=(e_{t},r_{t}), where qtq_{t} represents the question label at time tt, and rtr_{t} represents the learner’s response at tt. Subsequently, an embedding layer is utilized to encode the data into continuous embedding vectors mtm_{t} for ete_{t} and sts_{t} for the combination (et,rt)(e_{t},r_{t}).

  • Knowledge State Modeling: The DKVMN network is employed to model the changes in students’ knowledge states. Two memory matrices are introduced: a static key matrix and a dynamic value matrix. The key matrix represents knowledge concepts, while the value matrix represents the learners’ mastery levels of these concepts.

  • Model Training and Optimization: During the training process, the standard cross-entropy loss between the predicted answer probability ptp_{t} and the true label ltl_{t} is minimized using the gradient descent algorithm to jointly learn the embedding matrices and other parameters. The formula for the loss function is as follows:

    L=t(ltlogpt+(1lt)log(1pt))L=-\sum_{t}(l_{t}\mathrm{log}p_{t}+(1-l_{t})\log(1-p_{t})) (2)
  • Answer Prediction: By incorporating the vector ftf_{t} that captures the learner’s comprehensive knowledge level and question characteristics, the probability ptp_{t} of the learner correctly answering the question is obtained through a fully connected layer with a sigmoid activation function, thereby enabling prediction of the learner’s response. Specifically, it is defined as:

    pt=σ(Wfft+bf)p_{t}=\sigma\left(W_{f}\cdot f_{t}+b_{f}\right) (3)

II-B3 The Graph-based Knowledge Tracing Model

  • Knowledge Graph Construction: The GKT model structures the course exercises into a graph G=(V,E,A)G=(V,E,A) potentially, decomposing the requirements for mastering course exercises into NN knowledge concepts, referred to as the node set V={v1,,vN}V=\{v_{1},\cdots,v_{N}\}. These knowledge concepts share dependencies, denoted as edges EV×VE\subseteq V\times V, and the degree of dependency is represented by the adjacency matrix AN×NA\in\mathbb{R}^{N\times N}.

  • Knowledge State Modeling: The learner, at time step tt, possesses an independent knowledge state ht={hitiV}h^{t}=\{h_{i}^{t}\mid i\in V\} for each knowledge concept, where this knowledge state evolves over time. When the learner attempts an exercise viv_{i} related to knowledge concept ii, not only does the learner’s knowledge state for concept ii update, but also the knowledge states hjth_{j}^{t} of adjacent concepts jNij\in N_{i} are updated, where NiN_{i} denotes the set of nodes adjacent to node viv_{i}. The updating process in the GKT model is accomplished through two steps: Aggregation and Update, performed by a Graph Convolutional Network (GCN).

  • Model Training and Optimization: During the training process, the model is trained by minimizing the Negative Log-Likelihood (NLL) loss in order to achieve the best model performance.

  • Answer Prediction: Based on the updated knowledge states, the prediction probability ptp_{t} of the learner correctly answering each knowledge concept is obtained through a fully connected layer with a sigmoid activation function, enabling the prediction of the learner’s responses to questions.

III Proposed Methods

The specific framework of the unified knowledge tracing model for both objective and subjective test questions is shown in Fig. 1. To construct the proposed model, this study selects three classic models in the field of knowledge tracing as backbones: the Deep Knowledge Tracing Model (DKT), the Dynamic Key-Value Memory Network Model (DKVMN), and the Graph-based Knowledge Tracing Model (GKT). The models are conducted using learners’ answer data on both objective and subjective test questions.

Refer to caption

Figure 1: A Unified Knowledge Tracing Model for Both Objective and Subjective Test Questions.

III-A Knowledge Tracing Models in Objective Test Question Scenarios

III-A1 Deep Knowledge Tracing Model

DKT model predicts learners’ future performance in learning by analyzing their historical answer data, employing Long Short-Term Memory (LSTM) to capture the temporal changes in learners’ knowledge states [28]. The specific definitions are as follows:

ht=tanh(Whxxt+Whhht1+bh)h_{t}=\tanh(\mathrm{W}_{hx}\mathrm{x}_{t}+\mathrm{W}_{hh}\mathrm{h}_{t-1}+\mathrm{b}_{h}) (4)
yt=σ(Wyhht+by)y_{t}=\sigma\big{(}\mathrm{W}_{yh}\mathrm{h}_{t}+\mathrm{b}_{y}\big{)} (5)

Where ht1h_{t-1} represents the hidden state at the previous time step, hth_{t} denotes the current knowledge state of the learner, and yty_{t} represents the probability of the learner correctly answering at time tt.

III-A2 Dynamic Key-Value Memory Networks Model

DKVMN model takes learners’ answer data as input and utilizes a dynamic key-value memory matrix to store learners’ knowledge states. The model employs two memory matrices: a static key matrix and a dynamic value matrix, which are used to store fixed knowledge point information and learners’ dynamically changing knowledge states, respectively. The DKVMN model utilizes an attention mechanism to control read and write operations, thereby enabling dynamic updates to the knowledge state. It is primarily composed of three components:

  • Acquire Relevant Weights: The query is performed by leveraging the embedding vector mtm_{t} to interrogate the key storage matrix MkM^{k}. This involves computing the inner product between mtm_{t} and each key slot M(i)kM^{k}_{(i)} within the matrix. Subsequently, a softmax activation function is applied to transform these inner products into a probability distribution, representing the weights wtNw_{t}\in\mathbb{R}^{N} that quantify the involvement of each knowledge point in the question at time tt. The computation formula is as follows:

    wt(i)=Softmax(mtTMk(i))w_{t}(i)=\mathrm{Softmax}(m_{t}^{T}M^{k}(i)) (6)
  • Reading Process: The reading vector rtr_{t} is computed based on the value matrix MtvM_{t}^{v} and the knowledge point relevance weights wtw_{t}. The vector rtr_{t} represents the learner’s mastery level of the given exercise. The computation formula is as follows:

    rt=i=1Nwt(i)Mtv(i)r_{t}=\sum_{i=1}^{N}w_{t}(i)M_{t}^{v}(i) (7)

    Given that each exercise possesses its inherent difficulty, the reading vector rtr_{t} and the embedding mtm_{t} of the input exercise are concatenated vertically. This concatenated vector is then passed through a fully connected layer with a tanh activation function, yielding an aggregated feature vector ftf_{t}, which encapsulates both the learner’s mastery level and the prior difficulty of the exercise. Subsequently, ftf_{t} is fed into another fully connected layer with a sigmoid activation function to predict the probability ptp_{t} that the learner will correctly answer the question. The specific implementation process is outlined as follows:

    ft=tanh(W1T[rt,mt]+b1)f_{t}=\tanh(W_{1}^{T}[r_{t},m_{t}]+b_{1}) (8)
    pt=sigmoid(W2Tft+b2)p_{t}=\mathrm{sigmoid}(W_{2}^{T}f_{t}+b_{2}) (9)
  • Writing Process: After the learner answers the question, the value memory matrix MtvM_{t}^{v} is updated based on the learner’s response. When incorporating the learner’s knowledge growth sts_{t} into the value matrix components, the existing memory is first erased before adding the new information.

    Given a write weight wtw_{t} (which is the same as the knowledge point relevance weight wtw_{t} utilized in the reading process), the computation of the erase vector ete_{t} is performed as follows:

    et=sigmoid(ETst+be)e_{t}=\mathrm{sigmoid}(E^{T}s_{t}+b_{e}) (10)

    Where the transformation matrix Edv×dvE\in\mathbb{R}^{d_{v}\times d_{v}} has a shape of dv×dvd_{v}\times d_{v}, and etdve_{t}\in\mathbb{R}^{d_{v}} is a column vector with dvd_{v} elements, each element taking a value within the interval (0,1)(0,1). The memory vector of the value component Mt1v(i)M_{t-1}^{v}(i) from the previous time step is modified as follows:

    M~tv(i)=Mt1v(i)[𝟏wt(i)et]\tilde{M}_{t}^{v}(i)=M_{t-1}^{v}(i)[\mathbf{1}-w_{t}(i)e_{t}] (11)

    Where 𝟏\mathbf{1} represents a row vector of all ones, and thus, only when both the weight at the corresponding matrix position and the erase element are 1, will the element at that matrix position be reset to 0. After erasing the memory, an add vector ata_{t} is employed to update the value matrix MtvM_{t}^{v}, and the update of the value matrix at each time step is given by:

    Mtv(i)=M~t1v(i)+wt(i)atM_{t}^{v}(i)=\tilde{M}_{t-1}^{v}(i)+w_{t}(i)a_{t} (12)

    Given the transformed memory information and the current problem features, a feature vector ftf_{t} is constructed. Subsequently, this feature vector is fed into a fully connected layer with a sigmoid activation function to predict the probability ptp_{t} that the learner will correctly answer the question. The specific implementation process is detailed as follows:

    pt=sigmoid(W2Tft+b2)p_{t}=\mathrm{sigmoid}(W_{2}^{T}f_{t}+b_{2}) (13)

III-A3 The Graph-based Knowledge Tracing Model

The GKT model [26] represents learners’ historical answer data as a graph structure, where each node signifies a knowledge concept, and the edges between nodes indicate the relationships between these knowledge concepts. By observing learners’ answer patterns, the model updates the states of the nodes within the graph, thereby dynamically tracking learners’ mastery levels of various knowledge concepts. This model primarily comprises two components:

  • Information Aggregation: We define an answer embedding matrix Ex2N×eE_{x}\in\mathbb{R}^{2N\times e} and a skill embedding matrix EcN×eE_{c}\in\mathbb{R}^{N\times e}, where Ec(k)E_{c}(k) represents the kk-th row vector of matrix EcE_{c}, and ee is the embedding dimension. At time step tt, for the answered knowledge concept ii and its neighboring knowledge concepts jj, the hidden states and embeddings are aggregated according to the following specific formula:

    hkt={[hkt,xtEx](k=i)[hkt,Ec(k)](ki)h_{k}^{\prime t}=\begin{cases}[h_{k}^{t},x^{t}E_{x}]&(k=i)\\ [h_{k}^{t},E_{c}(k)]&(k\neq i)\end{cases} (14)
  • Feature Update: Update the hidden state of each knowledge point through aggregated features and the structure of the knowledge graph. The specific steps are as follows:

    mkt+1={fself(hkt)(k=i)fneighbor(hit,hkt)(ki)m_{k}^{t+1}=\begin{cases}f_{\text{self}}(h_{k}^{\prime t})&(k=i)\\ f_{\text{neighbor}}(h_{i}^{\prime t},h_{k}^{\prime t})&(k\neq i)\end{cases} (15)
    m~kt+1=Gea(mkt+1)\widetilde{m}_{k}^{t+1}=G_{ea}(m_{k}^{t+1}) (16)
    hkt+1=Ggru(m~kt+1,hkt)h_{k}^{t+1}=G_{gru}(\tilde{m}_{k}^{t+1},h_{k}^{t}) (17)

    In this context, fselff_{\text{self}} and fneighborf_{\text{neighbor}} represent the self-function and neighbor-function respectively, where fselff_{\text{self}} is implemented as a Multilayer Perceptron (MLP). GeaG_{\text{ea}} denotes an Erase-Add gate, which performs feature erasing and enhancement on the aggregated feature vector mk(t+1)m_{k}^{(t+1)} to better simulate the learner’s knowledge state characteristics. Additionally, GgruG_{\text{gru}} is a Gated Recurrent Unit (GRU) that is utilized to perform feature extraction, memory retention, and updating on m~k(t+1)\tilde{m}_{k}^{(t+1)}.

    fneighborf_{neighbor} is an arbitrary function defined based on the structure of the knowledge graph to propagate information to adjacent nodes. Nakagawa et al. proposed two optional implementations: a statistical-based approach and a learning-based approach. In this study, a statistical-based dense graph approach is adopted, with the formula outlined as follows:

    foutgo(hit,hjt)=𝐀i,jfoutgo([hit,hjt])f_{\mathrm{outgo}}(\mathrm{h^{\prime}}_{i}^{t},\mathrm{h^{\prime}}_{j}^{t})=\mathbf{A}_{i,j}f_{\mathrm{outgo}}([\mathrm{h^{\prime}}_{i}^{t},\mathrm{h^{\prime}}_{j}^{t}]) (18)
    fincome(hit,hjt)=𝐀j,ifincome([hit,hjt])f_{\mathrm{income}}(\mathrm{h^{\prime}}_{i}^{t},\mathrm{h^{\prime}}_{j}^{t})=\mathbf{A}_{j,i}f_{\mathrm{income}}([\mathrm{h^{\prime}}_{i}^{t},\mathrm{h^{\prime}}_{j}^{t}]) (19)
    fneighbor=foutgo(hit,hjt)+fincome(hit,hjt)f_{\mathrm{neighbor}}=f_{\mathrm{outgo}}(\mathrm{h^{\prime}}_{i}^{t},\mathrm{h^{\prime}}_{j}^{t})+f_{\mathrm{income}}(\mathrm{h^{\prime}}_{i}^{t},\mathrm{h^{\prime}}_{j}^{t}) (20)

    The dense graph is computed as follows:

    𝐀i,j={ni,jΣkni,kij0i=j\mathbf{A}_{i,j}=\begin{cases}\frac{n_{i,j}}{\Sigma_{k}n_{i,k}}&\quad\mathrm{i\neq j}\\ \\ 0&\quad\mathrm{i=j}\end{cases} (21)

    Based on the updated knowledge state, the prediction probability ptp_{t} of the learner correctly answering each knowledge point at the next time step is obtained through a fully connected layer with a sigmoid activation function, thereby enabling prediction of the learner’s responses. The specific formula is as follows:

    pkt=σ(Wouthkt+1+bk)p_{k}^{t}=\sigma(W_{\text{out}}h_{k}^{t+1}+b_{k}) (22)

III-B Knowledge Tracing Models in Subjective Test Question Scenarios

III-B1 Deep Knowledge Tracing Model

Since the scores of subjective test questions are multi-valued discretely distributed rather than simply right or wrong. Therefore, some of the structure of the DKT model needs to be adjusted to fit the new prediction task. The specific adjustments are as follows:

  • Question Encoding: For the response tuple xt=(et,at)x_{t}=(e_{t},a_{t}), after one-hot encoding, xt{0,1}2Ex_{t}\in\{0,1\}^{2E}, with a length twice the total number of exercise items. For subjective test questions, this experiment directly encodes the question ID and corresponding score into the feature vector. Specifically, if an exercise item is attempted, the corresponding position in the first half of xtx_{t} is set to 1, otherwise 0; if an exercise item is attempted, the corresponding position in the second half of xtx_{t} represents the specific score, otherwise 0. The detailed encoding formula is as follows:

    xt(i)={1i=etati=et+nquestionx_{t}(i)=\begin{cases}1&i=e_{t}\\ a_{t}&i=e_{t}+n_{question}\end{cases} (23)
  • Loss Function: For the subjective test questions, the knowledge-tracking task can be defined simply as a regression problem, so the loss function was adjusted from BCELoss to MSELoss.

  • Branch Network: The final output layer of the original DKT model usually uses a softmax activation function to predict how well the learner will answer the questions, right or wrong. For subjective test questions, a fully connected layer with a sigmoid function will be added to output the learners’ scores on the subjective test questions.

III-B2 Dynamic Key-Value Memory Networks Model

In response to the scoring characteristics of the subjective questions, some adjustments to the structures of the DKVMN model are as follows:

  • Question Encoding: For subjective test items, the problem IDs are encoded using one-hot encoding, and the response tuple xt=(et,at)x_{t}=(e_{t},a_{t}) is correlated with the number of knowledge points, where the specific encoding formula is given as follows:

    xt=ei+ainquestionx_{t}=e_{i}+a_{i}*n_{question} (24)
  • Loss Function: The original DKVMN model picks binary cross-entropy with logits for model optimization. For the subjective test questions, the loss function was adjusted to MSELoss.

  • Branch Network: The final output of the original DKVMN model is the probability of answering the test question correctly, and for the subjective test questions, a fully connected layer with a sigmoid function is added to change the output to a normalized score for the test question.

III-B3 The Graph-based Knowledge Tracing Model

The experiment adapted the GKT model by modifying the classification task to a regression task and applying the GKT model to the prediction of scores on subjective test questions.

  • Question Encoding: The encoding process fuses the knowledge point ID ete_{t} with the answer situation ata_{t} into a single vector, with a specific encoding formula as follows:

    xt=ei+aix_{t}=e_{i}+a_{i} (25)
  • Loss Function: The negative likelihood logit (NLL) was adjusted to MSELoss, and the GKT model was adjusted to be able to predict subjective test scores.

IV Experimental Results and Analysis

IV-A Dataset

In the objective test question scenario, each model was applied on three real datasets. The datasets used for the experiments are the publicly available dataset ASSISTment-2009-2010-skill dataset, and the monthly test answer dataset of the sophomore class of Changshui Middle School, which contains objective test question data of eight academic disciplines, abbreviated as Assist09 and Object.

Assist09 is an ensemble of learners’ practice records for the 2009-2010 school year provided by the ASSISTments online tutoring platform, which is a widely used dataset in knowledge tracking. This dataset has two data formats: “Skill-builder” and “Non-Skill-builder”. In this experiment, the data set in “Skill-builder” format is selected for the experiment. Object comes from the monthly examination records of the sophomore class of Changshui Middle School, which contains the objective test data of eight academic disciplines.

TABLE I: Statistical information about Datasets
Dataset Student Skill Question Exercise
Assist09 4217 124 26688 401756
Object-English 4773 23 65 310245
Object-Math 5224 16 16 83584
TABLE II: Statistical information about Datasets
Dataset Student Skill Question Exercise
Subject-Chinese 5236 21 24 125664
Subject-Math 5224 6 6 31344
Subject-Biology 2901 5 5 14505

In the objective test scenario, taking the DKT model experiment as an example, we introduce the three datasets used in the experiment, and the raw statistical information is shown in Table I.

In the subjective test scenario, the experiment uses Subject, a monthly test answer dataset of the sophomore class of Changshui Middle School, which contains subjective test data of eight subjects. Taking the DKT model experiment as an example, the three datasets used for the experiment are introduced, and the raw statistical information is shown in Table II.

TABLE III: Description of Assit09
Name Meaning
order_id Non-time-series aligned exercise record ID
user_id Student ID
skill_id Concept ID
correct Students’ answering questions correctly or incorrectly
TABLE IV: Description of Object
Name Meaning
exer_id Non-time-series aligned exercise record ID
user_id Student ID
knowledge_code Concept ID
score Students’ score

Four fields related to the model are selected from the raw data of the three datasets, taking ASSISTment-2009-2010-skill as an example, the specific meanings are shown in Table III. Taking Object as an example, the specific meaning is shown in Table IV.

IV-B Data Pre-processing

In the objective test question scenario, taking ASSISTment-2009-2010-skill as an example, the following preprocessing is done on the extracted data:

  • For all records containing missing values (NaN) in the skill_id column, the deletion operation is performed.

  • For learner answer records containing the same interaction data, delete operation is performed and only one piece of duplicate data is retained.

  • Convert the question ID to a continuous variable starting from 0.

  • Convert data samples into sequential data categorized by ID.

  • Make one-hot encoding of question IDs and answers.

  • Uniform sequence length

  • For subjective test questions, the process of normalizing test scores also needs to be added. Normalized scores are on similar scales and are easier to compare and interpret.

IV-C Experiment Result

  • Model Performance in Objective Test Question Scenarios

The DKT model uses LSTM structure, Adam trainer to predict the learners’ performance in answering the objective test questions for the next moment, and calculates BCELoss, RMSE, MAE, and AUC in each Epoch by comparing the real scores with the predicted scores. The DKT model is experimented on three datasets, and the model performs well in all of them.

TABLE V: DKT Performance on objective test questions
Dataset BCELoss RMSE MAE AUC
Assit09 0.503 0.407 0.329 0.798
Object-English 0.574 0.442 0.392 0.763
Object-Math 0.491 0.404 0.329 0.816

As shown in Table V, the AUC of all three datasets reached over 0.75, which can prove that the DKT model is effective on the objective test knowledge tracking task.

Among them, the DKT model performs best on the dataset Object-Math. Analyzed in combination with the size and complexity of the dataset, the data structure of Object-Math is simpler and the DKT model fits better. When the data structure is complex but the amount of data is not sufficient, such as the dataset Object-English, the prediction performance is degraded to some extent.

In order to evaluate the prediction effect of DKVMN model on objective test questions, this paper conducted experiments on three real datasets, and at the same time, four commonly used metrics, namely, Loss, RMSE, ACC, and AUC, were chosen as the metrics for evaluating the model.

TABLE VI: DKVMN Performance on objective test questions
Dataset Loss RMSE ACC AUC
Assit09 0.501 0.494 0.755 0.800
Object-English 0.468 0.486 0.763 0.834
Object-Math 0.430 0.458 0.790 0.862

As shown in Table VI, the AUC scores on the dataset were all above 0.75 and the ACC scores were all above 0.80, proving that the DKVMN model has good generalization ability on the objective test knowledge tracking task.

Among them, the DKVMN model fits best on the dataset Object-Math, with a Loss of 0.430, an RMSE score of 0.458, an ACC score of 0.790, and an AUC score of 0.862.

The GKT model chooses the negative log-likelihood (NLL) as a measure of Loss, while three commonly used metrics, RMSE, ACC, and AUC, are chosen to evaluate the model performance.

TABLE VII: GKT Performance on objective test questions
Dataset Loss RMSE ACC AUC
Assit09 0.562 0.435 0.718 0.723
Object-Math 0.466 0.393 0.768 0.845
Object-Biology 0.531 0.423 0.723 0.770

As shown in Table VII, the AUC scores and ACC scores on the dataset are all above 0.7, and the results indicate that the GKT model is able to fit a wide range of different learner data for the knowledge tracking task.

Among them, the GKT model fits best on the dataset Object-Math, with Loss up to 0.466, RMSE score up to 0.393, ACC score up to 0.768, and AUC score up to 0.845.

  • Model Performance in Subjective Test Question Scenarios

Experiments were carried out with the DKT model on the three subjective test question datasets to calculate MSELoss, RMSE, MAE, and ACC in each Epoch, and the optimal model performance on the three datasets was as follows:

TABLE VIII: DKT Performance on subjective test questions
Dataset MSELoss RMSE MAE ACC
Subject-Chinese 0.105 0.325 0.259 0.844
Subject-Math 0.086 0.294 0.226 0.896
Subject-Biology 0.040 0.201 0.165 0.905

As shown in Table VIII, the ACC on all three datasets reached above 0.8, which can prove that the improved DKT model is effective on the subjective test knowledge tracking task. Among them, the model performs best on the dataset Subject-Biology, with ACC score up to 0.905.

The final output of the improved DKVMN model is the normalized scores of the subjective questions, and the four commonly used metrics of MSELoss, RMSE, MAE, and ACC are chosen as the metrics to evaluate the model in this experiment. The optimal model performance on the three datasets is as shown in Table IX:

TABLE IX: DKVMN Performance on subjective test questions
Dataset MSELoss RMSE MAE ACC
Subject-Math 0.057 0.316 0.263 0.686
Subject-Biology 0.039 0.309 0.260 0.718
Subject-Physics 0.093 0.362 0.295 0.704

From the data in the table, it is clear that the improved DKVMN model is effective on the subjective test knowledge tracking task.

Among them, the model performs best on the dataset Subject-Biology and fits less well on the dataset Subject-Math.

The experimental datasets of the GKT model are Subject-Chinese, Subject-Math, and Subject-Physics.The performance of the adjusted GKT model on the three datasets is shown in Table X:

TABLE X: GKT Performance on subjective test questions
Dataset MSELoss RMSE MAE ACC
Subject-Chinese 0.106 0.326 0.195 0.799
Subject-Math 0.041 0.202 0.041 0.960
Subject-Physics 0.133 0.364 0.250 0.806

From the data in Table X, it can be seen that the GKT model is effective on the dataset Subject-Math, with MSELoss up to 0.041, RMSE score up to 0.202, MAE score up to 0.041, and ACC score up to 0.960.

And the results prove that the GKT model has a huge advantage when predicting on datasets with clear and logical knowledge point relationships.

V Conclusions

Based on three classic knowledge tracing models, namely DKT, DKVMN, and GKT, this paper leverages the publicly available ASSISTment-2009-2010-skill dataset and private datasets Object, Subject from Changshui High School’s monthly exams for sophomore class students, which include answer data for both objective and subjective test questions across eight academic disciplines, to implement a unified knowledge tracing model for both objective and subjective test questions. Initially, in the context of objective test questions, a binary classification approach is adopted for model training. Experiments conducted on three benchmark models using real data demonstrate that the models exhibit good knowledge tracing and prediction capabilities for objective test questions. Given the characteristics of subjective test questions, the classification task is transformed into a regression task. By uniformly modifying the model training approach, adding branch networks, and optimizing the method of question encoding, knowledge tracing for subjective test questions is achieved. When the uniformly adjusted benchmark models are applied to subjective test questions, the prediction results are remarkable. The unified knowledge tracing model for both objective and subjective test questions studied in this paper can fully utilize exam data, combine objective and subjective test questions, and provide a qualitative and quantitative comprehensive assessment of learners’ knowledge states, thereby offering reliable support for personalized teaching. Finally, this paper designs experiments, presents and analyzes the experimental results. The main limitation of this study lies in the adjustment method of the model and the prediction method for subjective test questions in subjective test scenarios, which still need to be improved. The current adjustment method of the model is mainly based on the encoding method of the original model, which simply encodes the knowledge points and answer scores of subjective test questions, and then changes the loss function to MSELoss commonly used in regression tasks. However, the prediction performance is lower than that of objective test questions.

References

  • [1] L. Li and Z. Wang, “Knowledge Graph-Enhanced Intelligent Tutoring System Based on Exercise Representativeness and Informativeness,” International Journal of Intelligent Systems, vol. 2023, p. e2578286, Oct. 2023.
  • [2] Z. Wang, W. Yan, C. Zeng, Y. Tian, and S. Dong, “A Unified Interpretable Intelligent Learning Diagnosis Framework for Learning Performance Prediction in Intelligent Tutoring Systems,” International Journal of Intelligent Systems, vol. 2023, p. e4468025, Feb. 2023.
  • [3] X. Liao, X. Zhang, Z. Wang, and H. Luo, “Design and implementation of an AI-enabled visual report tool as formative assessment to promote learning achievement and self-regulated learning: An experimental study,” British Journal of Educational Technology, vol. 55, no. 3, pp. 1253–1276, 2024.
  • [4] Z. Wang, W. Wu, C. Zeng, H. Luo, and J. Sun, “Psychological factors enhanced heterogeneous learning interactive graph knowledge tracing for understanding the learning process,” Frontiers in Psychology, vol. 15, May 2024.
  • [5] L. Li and Z. Wang, “Knowledge relation rank enhanced heterogeneous learning interaction modeling for neural graph forgetting knowledge tracing,” PLOS ONE, vol. 18, no. 12, p. e0295808, Dec. 2023.
  • [6] Z. Wang, Y. Hou, C. Zeng, S. Zhang, and R. Ye, “Multiple Learning Features–Enhanced Knowledge Tracing Based on Learner–Resource Response Channels,” Sustainability, vol. 15, no. 12, p. 9427, Jan. 2023.
  • [7] L. Lyu, Z. Wang, H. Yun, Z. Yang, and Y. Li, “Deep Knowledge Tracing Based on Spatial and Temporal Representation Learning for Learning Performance Prediction,” Applied Sciences, vol. 12, no. 14, pp. 1–21, Jan. 2022.
  • [8] R. C. Atkinson, “Optimizing the learning of a second-language vocabulary.” Journal of experimental psychology, vol. 96, no. 1, p. 124, 1972.
  • [9] A. T. Corbett and J. R. Anderson, “Knowledge tracing: Modeling the acquisition of procedural knowledge,” User modeling and user-adapted interaction, vol. 4, pp. 253–278, 1994.
  • [10] Z. Wang, J. Yao, C. Zeng, L. Li, and C. Tan, “Students’ Classroom Behavior Detection System Incorporating Deformable DETR with Swin Transformer and Light-Weight Feature Pyramid Network,” Systems, vol. 11, no. 7, p. 372, Jul. 2023.
  • [11] K. Zhang and Y. Yao, “A three learning states bayesian knowledge tracing model,” Knowledge-Based Systems, vol. 148, pp. 189–201, 2018.
  • [12] Y. Wang and N. Heffernan, “Extending knowledge tracing to allow partial credit: Using continuous versus binary nodes,” in Artificial Intelligence in Education: 16th International Conference, AIED 2013, Memphis, TN, USA, July 9-13, 2013. Proceedings 16.   Springer, 2013, pp. 181–188.
  • [13] T. Käser, S. Klingler, A. G. Schwing, and M. Gross, “Dynamic bayesian networks for student modeling,” IEEE Transactions on Learning Technologies, vol. 10, no. 4, pp. 450–462, 2017.
  • [14] C. Zeng, D. Zhu, Z. Wang, and Y. Yang, “Deep and Shallow Feature Fusion and Recognition of Recording Devices Based on Attention Mechanism,” in Advances in Intelligent Networking and Collaborative Systems, L. Barolli, K. F. Li, and H. Miwa, Eds.   Cham: Springer International Publishing, 2021, vol. 1263, pp. 372–381.
  • [15] Z.-Y. Zhu, Q.-H. He, X.-H. Feng, Y.-X. Li, and Z.-F. Wang, “Liveness detection using time drift between lip movement and voice,” in 2013 International Conference on Machine Learning and Cybernetics, vol. 02, Jul. 2013, pp. 973–978, iSSN: 2160-1348.
  • [16] Z.-F. Wang, Q.-H. He, X.-Y. Zhang, H.-Y. Luo, and Z.-S. Su, “Playback attack detection based on channel pattern noise,” Journal of South China University of Technology, vol. 39, no. 10, pp. 7–12, 2011.
  • [17] Z. Wang, C. Zuo, and C. Zeng, “SAE based unified double JPEG compression detection system for Web image forensics,” International Journal of Web Information Systems, vol. 17, no. 2, pp. 84–98, Apr. 2021.
  • [18] Y. Tian, X. Wang, H. Yao, J. Chen, Z. Wang, and L. Yi, “Occlusion handling using moving volume and ray casting techniques for augmented reality systems,” Multimedia Tools and Applications, vol. 77, no. 13, pp. 16 561–16 578, Jul. 2018.
  • [19] R. J. Williams, “A learning algorithm for continually running fully recurrent neural netwokrs,” Neural Computation, vol. 1, pp. 256–263, 1989.
  • [20] J. Zhang, X. Shi, I. King, and D.-Y. Yeung, “Dynamic key-value memory networks for knowledge tracing,” in Proceedings of the 26th international conference on World Wide Web, 2017, pp. 765–774.
  • [21] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  • [22] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and M. Sun, “Graph neural networks: A review of methods and applications,” AI open, vol. 1, pp. 57–81, 2020.
  • [23] C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. J. Guibas, and J. Sohl-Dickstein, “Deep knowledge tracing,” Advances in neural information processing systems, vol. 28, 2015.
  • [24] C.-K. Yeung and D.-Y. Yeung, “Addressing two problems in deep knowledge tracing via prediction-consistent regularization,” in Proceedings of the fifth annual ACM conference on learning at scale, 2018, pp. 1–10.
  • [25] G. Abdelrahman and Q. Wang, “Knowledge tracing with sequential key-value memory networks,” in Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval, 2019, pp. 175–184.
  • [26] H. Nakagawa, Y. Iwasawa, and Y. Matsuo, “Graph-based knowledge tracing: Modeling student proficiency using graph neural networks,” in Web Intelligence, vol. 19, no. 1-2.   IOS Press, 2021, pp. 87–102.
  • [27] Y. Yang, J. Shen, Y. Qu, Y. Liu, K. Wang, Y. Zhu, W. Zhang, and Y. Yu, “Gikt: a graph-based interaction model for knowledge tracing,” in Machine learning and knowledge discovery in databases: European conference, ECML PKDD 2020, Ghent, Belgium, September 14–18, 2020, proceedings, part I.   Springer, 2021, pp. 299–315.
  • [28] L. Li and Z. Wang, “Calibrated Q-Matrix-Enhanced Deep Knowledge Tracing with Relational Attention Mechanism,” Applied Sciences, vol. 13, no. 4, pp. 1–24, Jan. 2023.