This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

DialogID: A Dialogic Instruction Dataset for Improving Teaching Effectiveness in Online Environments

Jiahao Chen TAL Education GroupBeijingChina [email protected] Shuyan Huang TAL Education GroupBeijingChina [email protected] Zitao Liu Guangdong Institute of Smart Education
Jinan University
GuangzhouChina
TAL Education GroupBeijingChina [email protected]
 and  Weiqi Luo Guangdong Institute of Smart Education
Jinan University
GuangzhouChina
[email protected]
(2022)
Abstract.

Online dialogic instructions are a set of pedagogical instructions used in real-world online educational contexts to motivate students, help understand learning materials, and build effective study habits. In spite of the popularity and advantages of online learning, the education technology and educational data mining communities still suffer from the lack of large-scale, high-quality, and well-annotated teaching instruction datasets to study computational approaches to automatically detect online dialogic instructions and further improve the online teaching effectiveness. Therefore, in this paper, we present a dataset of online dialogic instruction detection, DialogID, which contains 30,431 effective dialogic instructions. These teaching instructions are well annotated into 8 categories. Furthermore, we utilize the prevalent pre-trained language models (PLMs) and propose a simple yet effective adversarial training learning paradigm to improve the quality and generalization of dialogic instruction detection. Extensive experiments demonstrate that our approach outperforms a wide range of baseline methods. The data and our code are available for research purposes from: https://github.com/ai4ed/DialogID.

dialogic instruction; teaching effectiveness; instruction detection
copyright: acmcopyrightjournalyear: 2022copyright: acmcopyrightconference: Proceedings of the 31st ACM International Conference on Information and Knowledge Management; October 17–21, 2022; Atlanta, GA, USAbooktitle: Proceedings of the 31st ACM International Conference on Information and Knowledge Management (CIKM ’22), October 17–21, 2022, Atlanta, GA, USAprice: 15.00doi: 10.1145/3511808.3557580isbn: 978-1-4503-9236-5/22/10ccs: Applied computing Computer-managed instructionccs: Applied computing E-learningccs: Applied computing Interactive learning environmentsccs: Applied computing Computer-assisted instruction

1. Introduction

The Covid-19 pandemic has brought tremendous changes to educational institutions around the world. With the recent development of technology such as digital video processing and live streaming, various forms of online learning tools emerge and a large number of offline institutions switch to the online mode (Dhawan, 2020; Li et al., 2020; Liu et al., 2020). In spite of the advantages of online classes and a variety of support from online teaching software, teaching online classes still remains a very challenging task for the well-trained offline classroom instructors. When sitting in front of a camera or a laptop, traditional classroom instructors lack effective pedagogical instructions to ensure the overall quality of their online classes.

Dialogic instructions for online classes promote interactions between teachers and students instead of teacher-presentation only. Besides, they improve the learning interest and the confidence of the student, constructing effective learning habits. Hence, a computational approach to automatically detect the dialogic instructions during the online class seems to provide real-time feedback to teachers and improve their online teaching skills.

However, there are some challenges to build an automatic detection approach of dialogic instruction. Online teaching is not a standardized procedure. Even for the same learning content, the instructors teach varies according to their own pedagogical styles. Furthermore, the different teaching experiences of the instructors also lead to the quality of dialogic instructions. An illustrative example of note-taking111Note-taking instructions ask students to take notes of key points. instructions is as follows:

  • S1: Make sure you write down this key point.

  • S2: Make sure you remember this key point.

S1 instruction gives students concrete action, i.e., taking notes, which is helpful for students to build their learning habits. S2 is an ineffective and confusing instruction and doesn’t meet the quality standard of note-taking instruction. An intelligent dialogic instruction detection model is hoped to effectively distinguish these subtle differences and provide instant feedback to online instructors.

Table 1. Definitions and examples of dialogic instructions. others contains instructions that are either ineffective or irrelevant.
Instruction Definition Example(s)
commending Commending instructions that praise and encourage students. Good job!
guidance Guiding students to solve a problem step by step. What would happen then?
summarization Wrapping up the lesson or summarizing the content just learned. Let’s conclude what we have learned today.
greeting Greetings at the beginning of a class; Instructions that help manage the teaching procedures. How is it going?
Can you see the slides?
note-taking Instructions that ask students to take notes of key points. Make sure you write down this key point.
repeating Requiring students to rehearse the content. Could you repeat it?
reviewing Reminding the students what they learned in a previous class. Could you remember the words you learned last week?
example-giving Demonstrating the content by concrete facts. Here is an example.
others Ineffective instructions, or instructions unrelated to the class. It’s good weather today.

Existing educational research has revealed the significance of dialogic instructions on students’ social emotional well-being (Tennant et al., 2015), motivation (Henderlong and Lepper, 2002), and academic achievements (Moely et al., 1992; Dweck, 2007). Class observation frameworks have been established such as CLASS (Pianta et al., 2008) and COPUS (Smith et al., 2013). However, these methods heavily rely on human efforts like manual video coding (Praetorius and Charalambous, 2018; Rosenshine, 2012), and hence fail to provide automatic in-time feedback to instructors. Machine learning models are able to learn from human-coded data then make predictions automatically. For example, Donnelly et al. utilized Naive Bayes models to capture the occurrences of five key instructional segments, i.e., small group work, lecture, etc. (Donnelly et al., 2016).

However, even though the aforementioned research focuses on detecting and studying teachers’ dialogic instructions, none of them open sources their research datasets. Furthermore, the majority of research works are undertaken in the traditional offline classrooms and their methodologies and paradigms are not applicable to the online learning environments. Therefore, in this work, to help and promote research and development of tasks for online dialogic instruction detection, we present DialogID, a high-quality dialogic instruction dataset for improving online teaching effectiveness. DialogID contains 30,431 effective dialogic instructions extracted from real-world K-12 online classes. To the best of our knowledge, DialogID is one of the first publicly available dialogic instruction datasets collected from online classrooms. Furthermore, we propose a simple yet effective adversarial training (AT) paradigm with pre-trained language models (PLMs) learned from DialogID to solve the dialogic instructions detection problem automatically. Experimental results demonstrate the usage and effectiveness of the DialogID dataset and the proposed instructions detection approach.

2. Dataset

2.1. Dialogic Instructions

In this work, following many existing pedagogical studies (Goodenow, 1993; Osterman, 2010; Henderlong and Lepper, 2002; Dweck, 2007; Yelland and Masters, 2007; Shafto et al., 2014; Anthony et al., 2015; AN, 2004; HAGHVERDİ et al., 2010; Lee et al., 2008; Rinehart et al., 1986), we focus on online dialogic instructions with the following aspects: (1) motivate students and make them feel easy about the class: greeting (Goodenow, 1993; Osterman, 2010) and commending (Henderlong and Lepper, 2002; Dweck, 2007); (2) help students understand learning materials and retain them: guidance (Yelland and Masters, 2007), example-giving (Shafto et al., 2014), repeating (Anthony et al., 2015), and reviewing (AN, 2004); and (3) build effective learning habits: note-taking (HAGHVERDİ et al., 2010; Lee et al., 2008) and summarization (Rinehart et al., 1986).

Therefore we aim to capture these 8 kinds of effective instructions. The definitions and examples of instructions are shown in Table 1. Please note that the scope of dialogic instructions in our work is a superset of the previous study (Xu et al., 2020).

2.2. Data Annotation

To ensure the annotation quality and the trained AI driven detection models are able to be deployed into the real production systems without any human intervention, we design a 3-step online dialogic instruction annotation process that aims to automatically identify teaching instructions from the entire online classroom recordings. The 3-step process is described as follows.

Step 1: Extract teacher utterances. Similar to (Xu et al., 2020; Huang et al., 2020), we extract teacher utterances from the online classroom video recordings and filter out background noises and silence fragments via an in-house voice activity detection (VAD) model. Similar to (Tashev and Mirsamadi, 2016), the in-house VAD model is a four-layer deep neural network trained on online classroom audio data. Please note that there are no voice overlaps as audio recordings of each teacher and student are recorded separately.

Step 2: Generate dialogic instruction candidates. Dialogic instructions only constitute a small portion of teacher utterances within an online course. To make the annotation efficiently and economically, we identify utterance candidates that may contain dialogic instructions. Specifically, we first transcribe each teacher utterance (obtained from Step 1) via a self-trained automatic speech recognition (ASR) model, which is a deep feed-forward sequential memory network to transfer the voice utterances into text information (Zhang et al., 2018). The ASR model is trained on classroom specific datasets and has a character error rate of 11.36% in the classroom scenarios. Then for each type of dialogic instruction listed in Table 1, we pre-define a list of keywords and use the keyword matching method to find candidate utterances of dialogic instructions. Only the utterance whose transcription is matched with at least one of the keywords will be kept. The pre-defined keywords are constructed by analyzing thousands of online class videos and surveying hundreds of instructors, students, parents and educators. For example, words or phrases like “Hello/Good Morning/Goodbye” and “as seen in/as shown in” are keywords for the greeting and summarization dialogic instructions respectively.

Step 3: Extract segment-level audios for utterance-level annotation. Individual utterance candidates from Step 2 may simply contain one or two sentences that are difficult to annotate due to the lack of classroom contexts. Therefore, to make sure our crowdsourced labels are reliable, we assemble the target utterance candidate, its preceding n utterances and its n following utterances into an audio segment. The crowd workers assign labels after listening to each audio segment. A teacher’s utterance will be labeled as “others” if it doesn’t belong to the 8 categories in Table 1.

2.3. Data Analysis

Dialogic instructions in DialogID are collected and constructed from the K-12 online classes at TAL Education Group222TAL Education Group (NYSE:TAL) is an educational technology company dedicated to supporting public and private education across the world.. Through the 3-step annotation procedure, we end up with 51,908 annotated samples and 30,431 of them are effective online dialogic instructions. The detailed per type instruction distribution and the corresponding sizes of training, validation, test sets are shown in Table 2. Furthermore, the length distribution (in words) of each type of dialogic instruction is depicted in Figure 1. As we can see, the amounts of different types of dialogic instructions are relatively balanced in DialogID and most of the dialogic instructions are short sentences with less than 20 words.

Table 2. Data statistics of the DialogID dataset.
Train Validation Test Total AVG Len STD Len
commending 2,437 320 692 3,449 17.2 16.3
guidance 2,987 425 881 4,293 23.4 18.4
summarization 2,206 307 588 3,101 27.1 23
greeting 1,798 243 529 2,570 15.5 13.8
note-taking 2,667 394 782 3,843 19 15.3
repeating 2,488 368 705 3,561 19.9 14.2
reviewing 2,793 402 786 3,981 27.3 19.6
example-giving 3,977 550 1,106 5,633 24.8 20.5
others 14,982 2,182 4,313 21,477 21.6 17.1
Total 36,335 5,191 10,382 51,908 22 18
Refer to caption
Figure 1. Length distribution per type in DialogID.
\Description

Length distribution per type

3. An Adversarial Training Enhanced Detection Framework

In this section, we describe our dialogic instruction detection framework, which has two key components: (1) a PLM, which serves as the base model in the classification task; and (2) an adversarial training learning module, which improves the model generalization from the very limited and noisy teacher instruction transcriptions.

3.1. Pre-trained Language Models

Traditional machine learning models use word vectors as inputs, which are static and not able to extract contextual information. By contrast, more recent PLMs learn contextual embeddings with their Transformer-based architectures. Therefore in this study we utilize PLMs as our base model in our detection framework.

To perform the instruction detection task on a sentence 𝐱=(x1,,xn)\mathbf{x}=(x_{1},\cdots,x_{n}) which contains nn tokens, similar to (Devlin et al., 2019; Liu et al., 2019), we first add a special token [CLS][CLS] in front of the sentence. After that, embeddings of each token (E[CLS],E1,,En)(E_{[CLS]},E_{1},\cdots,E_{n}) are fed into multiple Transformer encoders sequentially, where each token gradually captures contextual information of the sentence. Finally at the last layer of Transformer encoders, hidden states of each token are extracted, and the hidden state of the special token [CLS][CLS] is treated as the representation of the sentence.

In our study, we utilize RoBERTa (Liu et al., 2019) pre-trained model, which is a Transformer-based model sharing the same structure with BERT (Devlin et al., 2019), while several improvements are made at the pre-training stage, including removing the next-sentence prediction objective of BERT, and using dynamic masking, etc. We also experiment with other recently proposed PLMs and details are discussed in Section 4.

3.2. Adversarial Training Module

Adversarial training is an efficient regularization technique that provides an efficient way to not only improve the robustness of the DNNs against perturbations but also enhance its generalization over original inputs by training DNNs to correctly classify both of the original inputs and adversarial examples (AEs) (Miyato et al., 2017; Goodfellow et al., 2014; Guo et al., 2021). Similar to the pioneer work of Miyato et al. who extended AT to text classification (Miyato et al., 2017), we create AEs by adding adversarial perturbations on intermediate representations in the embedding layers and use AEs to optimize the model parameters for better generalization. Specifically, the adversarial perturbation 𝐞\mathbf{e} is computed by an efficient and fast gradient approximation method developed by Goodfellow et al. (Goodfellow et al., 2014) as follows:

𝐱=𝐱+𝐞;𝐞=ϵ𝐠/𝐠2;𝐠=𝐱(𝐱,𝜽)\mathbf{x}^{\prime}=\mathbf{x}+\mathbf{e};\quad\mathbf{e}=\epsilon\mathbf{g}/\|\mathbf{g}\|_{2};\quad\mathbf{g}=\nabla_{\mathbf{x}}\mathcal{L}(\mathbf{x},\boldsymbol{\theta})

where 𝐱\mathbf{x}^{\prime} and 𝐱\mathbf{x} denote the perturbed and original representations in neural networks’ embedding layers respectively. ϵ\epsilon is a hyperparameter to control the norm of perturbations and 𝜽\boldsymbol{\theta} is the model parameters.

Table 3. Prediction performance per instruction type of all different baselines in terms of precision, recall and F1 score.
Dialogic Instruction Model Precision Recall F1 Dialogic Instruction Model Precision Recall F1
commending BERT 0.8274 0.8801 0.8529 guidance BERT 0.7505 0.8025 0.7756
ELECTRA 0.8093 0.8829 0.8445 ELECTRA 0.7518 0.8082 0.7790
MacBERT 0.8219 0.8801 0.8500 MacBERT 0.8106 0.7480 0.7780
XLNet 0.8343 0.8223 0.8282 XLNet 0.7899 0.7809 0.7854
RoBERTa 0.8013 0.9263 0.8592 RoBERTa 0.7555 0.8241 0.7883
RoBERTa+AT 0.8083 0.9263 0.8633 RoBERTa+AT 0.7770 0.8343 0.8046
summarization BERT 0.9039 0.8963 0.9001 greeting BERT 0.8942 0.8790 0.8866
ELECTRA 0.8542 0.9167 0.8843 ELECTRA 0.8392 0.8979 0.8676
MacBERT 0.8882 0.9184 0.9030 MacBERT 0.8826 0.8809 0.8817
XLNet 0.8938 0.8878 0.8908 XLNet 0.8248 0.9168 0.8684
RoBERTa 0.8834 0.9150 0.8989 RoBERTa 0.9018 0.8507 0.8755
RoBERTa+AT 0.8834 0.9150 0.8989 RoBERTa+AT 0.8637 0.9225 0.8921
note-taking BERT 0.8100 0.9488 0.8740 repeating BERT 0.9134 0.9277 0.9205
ELECTRA 0.8082 0.9373 0.8680 ELECTRA 0.8728 0.9348 0.9027
MacBERT 0.7940 0.9514 0.8656 MacBERT 0.8908 0.9376 0.9136
XLNet 0.8491 0.8632 0.8561 XLNet 0.8770 0.9305 0.9030
RoBERTa 0.8201 0.9501 0.8803 RoBERTa 0.8787 0.9248 0.9012
RoBERTa+AT 0.8493 0.8939 0.8710 RoBERTa+AT 0.9006 0.9248 0.9125
reviewing BERT 0.8162 0.9720 0.8873 example-giving BERT 0.9114 0.9675 0.9386
ELECTRA 0.8284 0.9644 0.8912 ELECTRA 0.9066 0.9738 0.9390
MacBERT 0.8284 0.9644 0.8912 MacBERT 0.9109 0.9702 0.9396
XLNet 0.8315 0.9542 0.8886 XLNet 0.9033 0.9801 0.9402
RoBERTa 0.8346 0.9631 0.8943 RoBERTa 0.9108 0.9792 0.9438
RoBERTa+AT 0.8412 0.9567 0.8952 RoBERTa+AT 0.9126 0.9729 0.9418

4. Experiments

To comprehensively assess DialogID and the proposed method, except for RoBERTa, we additionally select a series of widely-used text classification models, including BERT (Devlin et al., 2019), ELECTRA (Clark et al., 2020), MacBERT (Cui et al., 2020), and XLNet (Yang et al., 2019). Moreover, we conduct an ablation study to demonstrate the performance improvement enhanced by the adversarial training module. The proposed AT enhanced approach is denoted as “RoBERTa+AT” in the following section. Please note that the AT module can be incorporated with any PLM. PLMs for our experiments can be found in this repository 333https://github.com/ymcui/Chinese-BERT-wwm. For each model, we set max_len to 128 and learning rate to 1e-5. The number of epochs is set to 100, and we early stop in the training phase if the model doesn’t get improvement in validation datasets for 5 epochs.

4.1. Results

Prediction with PLMs. We compare the performance of different PLMs in terms of precision, recall and F1 scores. Results are shown in Table 3 (per type) and Table 4 (overall). When comparing RoBERTa to other PLMs such as BERT, ELECTRA, MacBERT, and XLNet, we can find that RoBERTa achieves the highest prediction performance than other approaches in terms of F1 score, which indicates their stronger capacity to model dialogic instructions. When looking into each category, interestingly we can observe that RoBERTa does not always achieve the top performance. In detail, they show inferior performance compared with BERT and MacBERT in summarization, greeting and repeating, which have the least samples compared to instructions in other categories.

Table 4. Overall prediction performance of different models.
Model Precision Recall F1
BERT 0.8534 0.9092 0.8795
ELECTRA 0.8338 0.9145 0.8720
MacBERT 0.8534 0.9064 0.8779
XLNet 0.8505 0.8920 0.8701
RoBERTa 0.8483 0.9167 0.8802
RoBERTa+AT 0.8545 0.9183 0.8849

Prediction with PLMs and AT. We demonstrate the effectiveness of AT by comparing RoBERTa+AT with RoBERTa. Table 3 and Table 4 show that by adding an adversarial training module which aims to enhance model’s generalization, the RoBERTa+AT model outperforms the original RoBERTa model in 5 out of 8 types of dialogic instructions as well as the overall performance, in terms of F1 scores. It’s worth noting that the RoBERTa+AT model increases F1 score by 1.66% in greeting compared with RoBERTa. We believe this is because the greeting dialogic instruction is the least category and it has the smallest average length compared to instructions in other categories. The AT module enhances the generalization of the PLM model by jointly training the original clean inputs and the corresponding perturbed AEs.

Qualitative Analysis. We further demonstrate the effectiveness of RoBERTa+AT by visualizing the learned representations, shown in Figure 2. Instances of nine categories in the testing set are fed into the trained models, and their representations i.e., the tensor values of the special token [CLS][CLS] are collected. Dimension reduction method t-SNE (Van and Hinton, 2008) is performed so that the representations can be visualized in the 2-dimensional space. From the figure, we can see that representations of instances in different categories are well separated by the proposed RoBERTa-AT with significant margins between different categories.

Refer to caption
Figure 2. Representation visualization of RoBERTa+AT.
\Description

Learned representation of the RoBERTa+AT model.

5. Conclusion

In this work, we introduce DialogID, a dialogic instruction dataset that contains 8 categories of the online class instructions collected from real-world K-12 online classrooms. Experiments conducted on DialogID show the effectiveness and superiority of our proposed approach against a wide range of baselines.

Acknowledgements.
This work was supported in part by National Key R&D Program of China, under Grant No. 2020AAA0104500; in part by Beijing Nova Program (Z201100006820068) from Beijing Municipal Science & Technology Commission and in part by NFSC under Grant No. 61877029.

References

  • (1)
  • AN (2004) Shuhua AN. 2004. Capturing the Chinese way of teaching: The learning-questioning and learning-reviewing instructional model. In How Chinese learn mathematics: Perspectives from insiders. World Scientific, 462–482.
  • Anthony et al. (2015) Glenda Anthony, Jodie Hunter, and Roberta Hunter. 2015. Supporting Prospective Teachers to Notice Students’ Mathematical Thinking through Rehearsal Activities. Mathematics Teacher Education and Development 17, 2 (2015), 7–24.
  • Clark et al. (2020) Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. ELECTRA: Pre-training text encoders as discriminators rather than generators. In International Conference on Learning Representations.
  • Cui et al. (2020) Yiming Cui, Wanxiang Che, Ting Liu, Bing Qin, Shijin Wang, and Guoping Hu. 2020. Revisiting pre-trained models for Chinese natural language processing. arXiv preprint arXiv:2004.13922 (2020).
  • Devlin et al. (2019) Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171–4186.
  • Dhawan (2020) Shivangi Dhawan. 2020. Online learning: A panacea in the time of COVID-19 crisis. Journal of Educational Technology Systems 49, 1 (2020), 5–22.
  • Donnelly et al. (2016) Patrick J Donnelly, Nathan Blanchard, Borhan Samei, Andrew M Olney, Xiaoyi Sun, Brooke Ward, Sean Kelly, Martin Nystran, and Sidney K D’Mello. 2016. Automatic teacher modeling from live classroom audio. In Proceedings of the 2016 Conference on User Modeling Adaptation and Personalization. 45–53.
  • Dweck (2007) Carol S Dweck. 2007. Boosting achievement with messages that motivate. Education Canada 47, 2 (2007), 6–10.
  • Goodenow (1993) Carol Goodenow. 1993. The psychological sense of school membership among adolescents: Scale development and educational correlates. Psychology in the Schools 30, 1 (1993), 79–90.
  • Goodfellow et al. (2014) Ian J Goodfellow, Jonathon Shlens, and Christian Szegedy. 2014. Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572 (2014).
  • Guo et al. (2021) Xiaopeng Guo, Zhijie Huang, Jie Gao, Mingyu Shang, Maojing Shu, and Jun Sun. 2021. Enhancing Knowledge Tracing via Adversarial Training. In Proceedings of the 29th ACM International Conference on Multimedia. 367–375.
  • HAGHVERDİ et al. (2010) Hamid HAGHVERDİ, Reza Biria, and Lotfollah Karimi. 2010. Note-taking strategies and academic achievement. Dil ve Dilbilimi Çalışmaları Dergisi 6, 1 (2010).
  • Henderlong and Lepper (2002) Jennifer Henderlong and Mark R Lepper. 2002. The effects of praise on children’s intrinsic motivation: A review and synthesis. Psychological Bulletin 128, 5 (2002), 774.
  • Huang et al. (2020) Gale Yan Huang, Jiahao Chen, Haochen Liu, Weiping Fu, Wenbiao Ding, Jiliang Tang, Songfan Yang, Guoliang Li, and Zitao Liu. 2020. Neural multi-task learning for teacher question detection in online classrooms. In International Conference on Artificial Intelligence in Education. Springer, 269–281.
  • Lee et al. (2008) Pai-Lin Lee, William Lan, Douglas Hamman, and Bret Hendricks. 2008. The effects of teaching notetaking strategies on elementary students’ science learning. Instructional Science 36, 3 (2008), 191–201.
  • Li et al. (2020) Hang Li, Yu Kang, Wenbiao Ding, Song Yang, Songfan Yang, Gale Yan Huang, and Zitao Liu. 2020. Multimodal learning for classroom activity detection. In IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 9234–9238.
  • Liu et al. (2019) Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692 (2019).
  • Liu et al. (2020) Zitao Liu, Guowei Xu, Tianqiao Liu, Weiping Fu, Yubi Qi, Wenbiao Ding, Yujia Song, Chaoyou Guo, Cong Kong, Songfan Yang, et al. 2020. Dolphin: a spoken language proficiency assessment system for elementary education. In Proceedings of The Web Conference 2020. 2641–2647.
  • Miyato et al. (2017) Takeru Miyato, Andrew M Dai, and Ian Goodfellow. 2017. Adversarial training methods for semi-supervised text classification. In International Conference on Learning Representations.
  • Moely et al. (1992) Barbara E Moely, Silvia S Hart, Linda Leal, Kevin A Santulli, Nirmala Rao, Terry Johnson, and Libby Burney Hamilton. 1992. The teacher’s role in facilitating memory and study strategy development in the elementary school classroom. Child Development 63, 3 (1992), 653–672.
  • Osterman (2010) Karen F Osterman. 2010. Teacher Practice and Students’ Sense of Belonging. International Research Handbook on Values Education and Student Wellbeing (2010), 239.
  • Pianta et al. (2008) Robert C Pianta, Karen M La Paro, and Bridget K Hamre. 2008. Classroom Assessment Scoring System: Manual K-3. Paul H Brookes Publishing.
  • Praetorius and Charalambous (2018) Anna-Katharina Praetorius and Charalambos Y Charalambous. 2018. Classroom Observation Frameworks for Studying Instructional Quality: Looking Back and Looking Forward. ZDM: The International Journal on Mathematics Education 50, 3 (2018), 535–553.
  • Rinehart et al. (1986) Steven D Rinehart, Steven A Stahl, and Lawrence G Erickson. 1986. Some effects of summarization training on reading and studying. Reading Research Quarterly (1986), 422–438.
  • Rosenshine (2012) Barak Rosenshine. 2012. Principles of instruction: Research-based strategies that all teachers should know. American Educator 36, 1 (2012), 12.
  • Shafto et al. (2014) Patrick Shafto, Noah D Goodman, and Thomas L Griffiths. 2014. A rational account of pedagogical reasoning: Teaching by, and learning from, examples. Cognitive Psychology 71 (2014), 55–89.
  • Smith et al. (2013) Michelle K Smith, Francis HM Jones, Sarah L Gilbert, and Carl E Wieman. 2013. The Classroom Observation Protocol for Undergraduate STEM (COPUS): A new instrument to characterize university STEM classroom practices. CBE—Life Sciences Education 12, 4 (2013), 618–627.
  • Tashev and Mirsamadi (2016) Ivan Tashev and Seyedmahdad Mirsamadi. 2016. DNN-based causal voice activity detector. In Information Theory and Applications Workshop.
  • Tennant et al. (2015) Jaclyn E Tennant, Michelle K Demaray, Christine K Malecki, Melissa N Terry, Michael Clary, and Nathan Elzinga. 2015. Students’ ratings of teacher support and academic and social–emotional well-being. School Psychology Quarterly 30, 4 (2015), 494.
  • Van and Hinton (2008) L Van, der Maaten and Ge Hinton. 2008. (Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9, 2 (2008), 2579–2605.
  • Xu et al. (2020) Shiting Xu, Wenbiao Ding, and Zitao Liu. 2020. Automatic dialogic instruction detection for k-12 online one-on-one classes. In International Conference on Artificial Intelligence in Education. Springer, 340–345.
  • Yang et al. (2019) Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet: Generalized autoregressive pretraining for language understanding. Advances in Neural Information Processing Systems 32 (2019).
  • Yelland and Masters (2007) Nicola Yelland and Jennifer Masters. 2007. Rethinking scaffolding in the information age. Computers & Education 48, 3 (2007), 362–382.
  • Zhang et al. (2018) Shiliang Zhang, Ming Lei, Zhijie Yan, and Lirong Dai. 2018. Deep-FSMN for large vocabulary continuous speech recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 5869–5873.