An Augmented Reality Platform for Introducing Reinforcement
Learning to K-12 Students with Robots

Ziyi Zhang, Samuel Micah Akai-Nettey, Adonai Addo,
Chris Rogers, Jivko Sinapov

Abstract

Interactive reinforcement learning, where humans actively assist during an agent’s learning process, has the promise to alleviate the sample complexity challenges of practical algorithms. However, the inner workings and state of the robot are typically hidden from the teacher when humans provide feedback. To create a common ground between the human and the learning robot, in this paper, we propose an Augmented Reality (AR) system that reveals the hidden state of the learning to the human users. This paper describes our system’s design and implementation and concludes with a discussion on two directions for future work which we are pursuing: 1) use of our system in AI education activities at the K-12 level; and 2) development of a framework for an AR-based human-in-the-loop reinforcement learning, where the human teacher can see sensory and cognitive representations of the robot overlaid in the real world.

Introduction

Artificial intelligence (AI) is playing a more prominent role in our daily life and has positively contributed to many fields and industries (Fosel et al. 2018; Zhou, Li, and Zare 2017). Reinforcement Learning, in particular, has generated great interest over the past decade but still suffers from the challenge of sample complexity. To alleviate that issue, recent works have focused on how humans can assist robots with direct feedback (e.g., demonstrations or advice) during the learning process (Ritschel et al. 2019; Yu and Short 2021). Yet, the robot’s inner workings and representations remain hidden from the users, which can result in a mismatch between the human’s mental model of the robot’s learning process and the actual inner workings on the robot. This paper proposes a tool based on Augmented Reality (AR) for the purpose of establishing common ground between a learning robot and its human teachers, with the goal of improving the efficacy of teaching and helping the human understand what the robot is learning.

Our primary application is in AI education at the K-12 level. Reinforcement learning (RL) has contributed to many fields (Kiumarsi et al. 2018; Fosel et al. 2018) but has received little attention in K-12 AI education(Coppens, Bargiacchi, and Nowe 2019). However, basic RL concepts such as rewards for doing the right thing are close to our perception of human learning (Sutton and Barto 2018) and therefore may be easy for K-12 students to understand. Robots have been successfully used in K-12 STEM education. (Petrovič 2021; Laut, Kapila, and Iskander 2013). Some researchers have applied educational robots on teaching AI to undergraduate design students, improving students’ engagement and lowering the barrier to entry in AI (van der Vlist et al. 2008). One limitation of using physical robots is the lack of an intuitive way for users to visualize the AI training process and the robot’s perception of the environment. We address this challenge by providing an AR tool that enables the user to visualize the robot’s learning process and to actively assist in the training of the robot.

Augmented reality (AR) is a technique to enhance the real world environment with virtual visual components, sounds or other sensory stimuli. By combining digital elements to a person’s perception of the real world, AR is powerful on providing immersive experience and visualizing abstract concepts in physical world. Compared with virtual reality, AR does not require substantial hardware and can be easily implemented on different platforms that are easy to access including mobile phone, tablet, laptop, etc. In recent years, AR has been applied in various fields and industries including education (da Silva 2019) and human robot interaction (HRI) (Hennerley, Dickinson, and Jiang 2017) which we would further introduce in the next section.

Refer to caption — Figure 1: Users play with the RL activity and robot with the AR application installed in their mobile devices.

In this paper, we combined an AR interface with a LEGO® SPIKE Prime robot and designed a treasure hunting RL problem to introduce RL concepts to middle and high school students. The AR application can be easily installed on mobile devices (as showed in Figure 1) and the communication between the AR application and the robot is completed through an ESP8266 WiFi module connected to the robot. From the interface, the user can visualize internal RL representations and data structures. It also allows users to interpret key information related to training in an intuitive way. Since training a robot in physical world required longer time than training a virtual agent, we lowered the difficulty of our RL activity and gave users access to provide some instructions to the robot to decrease the sample complexity. The activity covers the following aspects of RL: 1) Key RL concepts, including state, action, reward; 2) Exploration and exploitation; 3) Q-table and how robots make decisions based on it; 4) Episodes and termination rules; and 5) Human-in-the-loop training. In the following sections we will unfold technical details and discuss potential applications of our system.

Related Work

In the past ten years, AR has gradually become a popular tool in K-12 education. Researchers have applied AR technique on teaching robotics (Cheli et al. 2018), physics (Ibáñez et al. 2014), arts (Huang, Li, and Fong 2016), etc. And results from these studies showed that AR could not only increase students’ engagement, concentration and learning outcome, but also reduce the difficulty of learning. However, currently AR is rarely used in K-12 AI education. Some researchers developed a virtual reality platform that could introduce RL to K-12 students in a digital world (Coppens, Bargiacchi, and Nowe 2019).

AR also contributed to many HRI research including robotic teleoperation (Lee and Park 2018), robot debugging (Ikeda and Szafir 2021), etc. Related work has proposed an AR visual tool named “SENSAR” for human-robot collaboration (Cleaver et al. 2020), which was also deployed on a LEGO EV3 robot for educational purposes. The learning platform was an interface for students to visualize sensor values and control motors. In our research, we want to embed the learning experience into a game to make it more engaging, as other researchers also report greater engagement when the RL learning experience is centered around a game (Taylor 2011).

Reinforcement Learning background

In Reinforcement Learning (RL), an agent has to learn how to act based on scalar reward signals detected over the course of interaction with its environment. The agent’s world is typically represented as a Markov Decision Process (MDP), a 5-tuple $<\mathcal{S},\mathcal{A},\mathcal{T},\mathcal{R,\gamma}>$ , where $\mathcal{S}$ is a set of states, $\mathcal{A}$ is a set of actions, $\mathcal{T}:\mathcal{S}\times\mathcal{A}\rightarrow\Pi(\mathcal{S})$ is a transition function that maps the probability of moving to a new state given action and current state, $\mathcal{R}:\mathcal{S}\times\mathcal{A}\rightarrow\mathbb{R}$ gives the reward of taking an action in a given state, and $\gamma\in[0,1)$ is the discount factor. We consider episodic tasks in which the agent starts in an initial state $s_{0}$ and upon reaching a terminal state $s_{term}$ , a new episode begins.

At each interaction step, the agent observes its state, and chooses an action according to its policy $\pi:\mathcal{S}\rightarrow\mathcal{A}$ . The goal of the agent is to learn an optimal policy $\pi^{*}$ that maximizes the long-term expected sum of discounted rewards. One way to learn the optimal policy is to learn the optimal action-value function $Q^{*}(s,a)$ , which gives the expected sum of discounted rewards for taking action $a$ in state $s$ , and following policy $\pi^{*}$ after:

Q^{*}(s,a)=\mathcal{R}(s,a)+\gamma\sum_{s^{\prime}}\mathcal{T}(s^{\prime}|s,a)\times max_{a^{\prime}}Q^{*}(s^{\prime},a^{\prime})

Q-learning (Watkins and Dayan 1992) is a common algorithm used to learn the optimal action-value function, where the Q-function is initialized arbitrarily (e.g., all zeros) and upon performing action $a$ in state $s$ , observing reward $R$ and ending up in state $s^{\prime}$ , the Q-function is updated by:

Q(s,a)\leftarrow Q(s,a)+\alpha(R+\gamma\hskip 2.5ptmax_{a^{\prime}}Q(s^{\prime},a^{\prime})-Q(s,a))

where $\alpha$ , the learning rate, is typically a small value (e.g., $0.05$ ). The agent decides which action to select using an $\epsilon$ -greedy policy: with small probability $\epsilon$ , the agent chooses a random action; otherwise, it chooses the action with the highest Q-value in its current state (i.e., the agent acts greedily with respect to its current action-value function).

RL often requires large amounts of interaction with the environment. To speed up learning, researchers have proposed human-in-the-loop RL methods, e.g., learning-from-demonstration (LfD), where human teachers take over the action selection step, often providing several trajectories of complete solutions before the agent starts learning autonomously (Schaal 1997). In a related paradigm (Griffith et al. 2013), the agent can seek “advice” from its human partner, e.g., what action to select in a particular state for which the Q-values are thought to be unreliable due to lack of experience. One of the goals of our system is to visualize the RL agent’s knowledge and learning process as to provide information that makes human teachers more effective.

System Design

Our system seeks to illustrate RL in an intuitive way as to provide students with a more immersive learning experience. There are three main components in our system, a LEGO SPIKE Prime robot, an AR application on a mobile device, and an ESP8266 WiFi module that is used for communication. We also designed a treasure hunting activity based on our platform in which students could train the robot to find treasure and escape through the exit.

Hardware

The educational robot, a LEGO SPIKE Prime, comes with different sensors (e.g., distance sensor and light sensor), and has 3 motors. The SPIKE Prime is customizable allowing students to modify it with additional sensors and pieces. However, one of the limitations is that it does not support WiFi. To solve this, we connected an ESP8266 microcontroller to one of the the robot’s sensor port as an information transfer station. The ESP8266 board could communicate with the SPIKE through Universal Asynchronous Receiver-Transmitter (UART) protocol. Since the ESP8266 board has WiFi, we built an HTTP server on the board that could send and receive HTTP requests. Users could send robot data and commands by sending HTTP requests to the ESP8266 board and the board would transfer it to the robot through UART. On the other side, the robot could send its perception to the world to ESP8266 module through UART and the board would send a HTTP request that contains the data to the AR application for users to visualize. We have an image target at the corner of a physical grid for AR interface to track. Since the robot has to move a lot in the grid during training which can cause it to drift (usually 1 or 2 degrees after each move, also depends on the ground material), we used the Inertial Measurement Unit (IMU) inside SPIKE to adjust its position and direction after each move.

Activity Design

The RL problem we designed is a grid world treasure hunting activity. The robot has to navigate itself in a 3 $\times$ 3 grid to find the shortest path of getting the treasure and escape from the exit. In each state, the robot can only choose actions that won’t make it across the boundary of the grid (e.g., robot can only move down or right when it is on the upper left corner). We have two virtual walls as showed in Figure 2 which robot can navigate to but unable pass. The robot is trained with a Q-learning algorithm that uses a $\epsilon$ -greedy policy to make decisions. The reward system for robot finding the treasure, arriving the exit, hitting the wall or normal steps are +20, +30, -10, -1. Each episode ends when the robot successfully arrives at the exit no matter if it finds the treasure. While a 3x3 maze problem is a highly simplified version of an RL problem, we chose it because we wanted the user to be able to intuitively see the AI thinking that is happening in the AR experience.

The learning objectives of this activity were to let students: 1) Understand key RL concepts (state, action, reward and episode); 2) Associate actions performed by the agent with the corresponding policy function, i.e., relate the robot’s choice with values in the Q-table; 3) Understand the learning process of the robot in each training step; 4) Understand how humans can be part of the training process as to speed-up learning; and 5) Understand exploration and exploitation in the context of reinforcement learning.

AR Interface

Our AR application was developed using Unity¹¹1https://unity.com/ and Vuforia²²2https://developer.vuforia.com/. The application can be deployed on iOS and Android mobile devices. Our interface contains four parts as shown in Figure 2; the left part contains a button for users to start training the robot and a slider to tweak the exploration rate ( $\epsilon$ ). By default, the virtual grid in the middle shows the structure of the maze, but students could select what they want to visualize in the virtual grid through three checkboxes in the left UI.

•

“Learning Experience”: If this checkbox is selected, a visualization of the Q-table will be displayed on the grid (Picture 1 in Figure 3). The arrows represents different possible actions in each state. As the training proceeds, the color of the arrows will update in real-time, e.g., a greener arrow represents a higher Q-value in the state-action pair, while a redder one represents a lower value.
•

“State-Action Pair Visited”: This checkbox lets the virtual grid shows numbers of how many times each state-action pair has been visited by the robot (Picture 2 in Figure 3). This information can help students connect Q-values to training steps. If users want to train the robot with demonstrations, they can refer to this visualization to figure out in which states the robot needs more training.
•

“Past Trajectory”: By selecting this checkbox, an animation will show the trajectory the robot went through in the last episode (Picture 3 in Figure 3)

The upper part of the UI shows key RL parameters and variables used in training, including state, action, reward, episode and score (a gamification of accumulated reward that robot gets in current episode).

The right portion of the UI contains the human-in-the-loop training interface, which allows users to switch between manual and automatic training modes. Most of the training would be done automatically, but we also want the users to provide instructions to the robot when it is making non-optimal actions during training process (e.g., unnecessary explorations, stuck in local optima). By switching to manual training mode, users can first give a reward to robot’s last action using the slider, and this reward would replace the automatic reward received by the robot. Then users could tell the robot where it should go next by moving the robot in physical world and facing the robot to a desired direction. To detect the position and direction of the robot, we have another image target attached to the robot which could be tracked in the AR app. The manual training function allows users to participate in the training process through evaluating robot’s past actions and guide the robot’s future moves. This can make the activity more interactive and engaging, also help students understand how human input can be utilized in the training process.

On top of the right UI is an animation to illustrate each training step. When in manual training mode, the current phase of the robot is highlighted both in the animation loop and the corresponding part in upper UI. For example, if the robot is waiting for the user to provide advice, the “Choose Action” part in the loop and the “Action” part in the upper UI will be highlighted. This part is for helping students obtain a deeper understanding on the RL training process.

Students can fold and unfold the upper, left and right part of the UI any time they want with three arrow buttons aside of the virtual grid so they won’t get overwhelmed by all the information showing together on the UI. Considering the limitation of screen size, this also makes it more convenient for students to operate functions or observe information that they are interested in on mobile devices.

Proposed Applications

K-12 AI Education with Robots

The main application of our system is AI education for K-12 students, specifically projects that leverage physical robots. The AR interface can be customized to demonstrate different AI concepts and provide visual aids for users to understand the robot’s perception of the environment and the state of its learning process. Also, for some AI tasks that might take too long for the robot to perform all the training steps, the AR interface provides a way for users to do the training with virtual agents and implement the trained model on the robot to finish the task. Researchers and educators can use AR to design more engaging and immersive AI activities using physical objects to help students have a deeper understanding on how AI can be used to solve real world problems. Our platform also does not rely on desktop or laptop computers, which means students can try and learn AI conveniently on their mobile devices and in diverse environments.

Human-in-the-loop Robot Training

While training robots in the physical world with AI techniques, it is important to find an intuitive way for the robot to communicate its behaviors, plans and knowledge to humans, especially when humans need to assist in training the robot. Current platforms for human-in-the-loop RL typically hide the robot’s representation from the human teacher. Our platform shows the possibility of using AR to build a straightforward and convenient HRI interface to help users finish human-in-the-loop robot training tasks. Compared with receiving information and controlling the robot with a traditional 2D screen, the AR interface can reduce unnecessary mental rotations from the screen to the physical world. The interface may also make it easier for users to track how the robot is interacting with the physical world and to provide appropriate feedback. We hypothesize that creating common ground with AR between a human teacher and a robot learner may result in more efficient teaching, thus reducing the number of interactions needed to learn a good policy.

Conclusions and Future Work

We proposed an AR robot platform for establishing common ground between a learning robot and a human user. We designed a treasure hunting activity based on it to introduce RL concepts to middle and high school students. The main significance of this project is to present an approach that can provide students a more interactive and immersive AI learning experience by combining physical robots with a virtual AR interface. We described the different components of our system and the RL activity we designed. We expect our platform can benefit other researches on K-12 AI education and provide some help on how to use AR in the visualization of human-in-the-loop AI training of robots.

We are designing a pilot study with students this autumn to evaluate their learning outcome and get feedback to improve our platform. We plan to separate students into two groups, one group use our platform to learn AR and another group learn the same concepts through an online platform we designed before (Zhang et al. 2021). The learning outcome of RL would be measured through a set of pre- and post- test. We would also compare the usability of two approaches and evaluate the value of AR and educational robots in K-12 RL learning. We are also working on constructing both easier and harder problems and create a storyline to let students explore them in a sequence. Finally, we also believe that visualizing the learning robot’s internal representation may help human teachers be better at providing feedback to the robot and we plan to evaluate this hypothesis with an additional study.

Acknowledgments

The research described in this paper was supported in part by a gift from the Verizon Foundation and LEGO Education.

References

Cheli et al. (2018) Cheli, M.; Sinapov, J.; Danahy, E. E.; and Rogers, C. 2018. Towards an augmented reality framework for K-12 robotics education. In Proceedings of the 1st International Workshop on Virtual, Augmented, and Mixed Reality for HRI (VAM-HRI).
Cleaver et al. (2020) Cleaver, A.; Faizan, M.; Amel, H.; Short, E.; and Sinapov, J. 2020. SENSAR: A Visual Tool for Intelligent Robots for Collaborative Human-Robot Interaction. CoRR abs/2011.04515.
Coppens, Bargiacchi, and Nowe (2019) Coppens, Y.; Bargiacchi, E.; and Nowe, A. 2019. Reinforcement Learning 101 with a Virtual Reality Game. URL http://eduai19.ist.tugraz.at/. 1st International Workshop on Education in Artificial Intelligence K-12 : held in conjunction with IJCAI-2019, EduAI-19 ; Conference date: 11-08-2019 Through 11-08-2019.
da Silva (2019) da Silva, B. B. 2019. AR Lab: Augmented Reality App for Chemistry Education.
Fosel et al. (2018) Fosel, T.; Tighineanu, P.; Weiss, T.; and Marquardt, F. 2018. Reinforcement Learning with Neural Networks for Quantum Feedback. Physical Review X 8: 031084.
Griffith et al. (2013) Griffith, S.; Subramanian, K.; Scholz, J.; Isbell, C. L.; and Thomaz, A. L. 2013. Policy shaping: Integrating human feedback with reinforcement learning. In Advances in neural information processing systems, 2625–2633.
Hennerley, Dickinson, and Jiang (2017) Hennerley, J.; Dickinson, M.; and Jiang, M. 2017. Augmented Reality Enhanced Human Robot Interaction for Social Robots as e-Learning Companions. 1–6. doi:10.14236/ewic/HCI2017.98.
Huang, Li, and Fong (2016) Huang, Y.; Li, H.; and Fong, R. 2016. Using Augmented Reality in early art education: a case study in Hong Kong kindergarten. Early Child Development and Care 186(6): 879–894.
Ibáñez et al. (2014) Ibáñez, M. B.; Ángela Di Serio; Villarán, D.; and Delgado Kloos, C. 2014. Experimenting with electromagnetism using augmented reality: Impact on flow student experience and educational effectiveness. Computers & Education 71: 1–13. ISSN 0360-1315.
Ikeda and Szafir (2021) Ikeda, B.; and Szafir, D. 2021. An AR Debugging Tool for Robotics Programmers. In International Workshop on Virtual, Augmented, and Mixed-Reality for Human-Robot Interaction (VAM-HRI).
Kiumarsi et al. (2018) Kiumarsi, B.; Vamvoudakis, K. G.; Modares, H.; and Lewis, F. L. 2018. Optimal and Autonomous Control Using Reinforcement Learning: A Survey. IEEE Trans. on Neural Networks and Learning Systems 29(6): 2042–2062.
Laut, Kapila, and Iskander (2013) Laut, J.; Kapila, V.; and Iskander, M. 2013. Exposing middle school students to robotics and engineering through LEGO and matlab. In 120th ASEE Annual Conference and Exposition. 120th ASEE Annual Conference and Exposition ; Conference date: 23-06-2013 Through 26-06-2013.
Lee and Park (2018) Lee, D.; and Park, Y. S. 2018. Implementation of Augmented Teleoperation System Based on Robot Operating System (ROS). In 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 5497–5502.
Petrovič (2021) Petrovič, P. 2021. Spike Up Prime Interest in Physics. In Robotics in Education, 146–160. Springer International Publishing.
Ritschel et al. (2019) Ritschel, H.; Seiderer, A.; Janowski, K.; Wagner, S.; and André, E. 2019. Adaptive linguistic style for an assistive robotic health companion based on explicit human feedback. Proceedings of the 12th ACM International Conference on Pervasive Technologies Related to Assistive Environments .
Schaal (1997) Schaal, S. 1997. Learning from demonstration. In Advances in neural information processing systems, 1040–1046.
Sutton and Barto (2018) Sutton, R. S.; and Barto, A. G. 2018. Reinforcement Learning: An Introduction. Cambridge, MA, USA: A Bradford Book. ISBN 0262039249.
Taylor (2011) Taylor, M. E. 2011. Teaching reinforcement learning with mario: An argument and case study. In Proceedings of the 2011 AAAI Symposium Educational Advances in Artificial Intelligence.
van der Vlist et al. (2008) van der Vlist, B.; van de Westelaken, R.; Bartneck, C.; Hu, J.; Ahn, R.; Barakova, E.; Delbressine, F.; and Feijs, L. 2008. Teaching Machine Learning to Design Students. In Pan, Z.; Zhang, X.; El Rhalibi, A.; Woo, W.; and Li, Y., eds., Technologies for E-Learning and Digital Entertainment, 206–217. Berlin, Heidelberg: Springer Berlin Heidelberg. ISBN 978-3-540-69736-7.
Watkins and Dayan (1992) Watkins, C. J.; and Dayan, P. 1992. Q-learning. Machine learning 8(3-4): 279–292.
Yu and Short (2021) Yu, H.; and Short, E. S. 2021. Active Feedback Learning with Rich Feedback. In Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction, HRI ’21 Companion, 430–433. New York, NY, USA: Association for Computing Machinery. ISBN 9781450382908.
Zhang et al. (2021) Zhang, Z.; Willner-Giwerc, S.; Sinapov, J.; Cross, J.; and Rogers, C. 2021. An Interactive Robot Platform for Introducing Reinforcement Learning to K-12 Students. In Robotics in Education, 288–301. Cham: Springer International Publishing.
Zhou, Li, and Zare (2017) Zhou, Z.; Li, X.; and Zare, R. N. 2017. Optimizing Chemical Reactions with Deep Reinforcement Learning. ACS Central Science 3(12): 1337–1344.

An Augmented Reality Platform for Introducing Reinforcement Learning to K-12 Students with Robots