largesymbols”00 largesymbols”01

Dynamic Competency Self-Assessment for Autonomous Agents

Nicholas Conlon [email protected] 0000-0001-8262-2169 University of Colorado at Boulder3775 Discovery DriveBoulderColoradoUSA80303 , Nisar R. Ahmed [email protected] 0000-0002-7555-5671 University of Colorado at Boulder3775 Discovery DriveBoulderColoradoUSA80303 and Daniel Szafir [email protected] 0000-0003-1848-7884 University of North Carolina at Chapel Hill201 S. Columbia StreetChapel HillNorth CarolinaUSA27599

(2023; 1 March 2023)

Abstract.

As autonomous robots are deployed in increasingly complex environments, platform degradation, environmental uncertainties, and deviations from validated operation conditions can make it difficult for human partners to understand robot capabilities and limitations. The ability for a robot to self-assess its competency in dynamic and uncertain environments will be a crucial next step in successful human-robot teaming. This work presents and evaluates an Event-Triggered Generalized Outcome Assessment (ET-GOA) algorithm for autonomous agents to dynamically assess task confidence during execution. The algorithm uses a fast online statistical test of the agent’s observations and its model predictions to decide when competency assessment is needed. We provide experimental results using ET-GOA to generate competency reports during a simulated delivery task and suggest future research directions for self-assessing agents.

Human-robot teaming, robot self-assessment

^†^†copyright: rightsretained^†^†journalyear: 2023^†^†price: 0^†^†doi: X^†^†conference: ACM/IEEE HRI 2023 workshop on Variable Autonomy for human-robot Teaming; March 13, 2023; Stockholm, Sweden^†^†ccs: Computer systems organization Robotic autonomy^†^†ccs: Computing methodologies Machine learning^†^†ccs: Computing methodologies Artificial intelligence^†^†ccs: Human-centered computing Interaction techniques

1. INTRODUCTION

Autonomous self-assessments are a critical component to advancing complex robot deployments. Consider a scenario where a search-and-rescue (SAR) team is delivering much needed supplies after a disaster. The environment is quite dangerous so the team decides to employ semi-autonomous robots to navigate the environment and deliver the supplies. The team’s reliance on the robot is based on their perception of its ability. However, if there is misalignment between the team’s perception of the robot’s abilities, and the robot’s actual capabilities and limitations, the team may inadvertently push the robot beyond its competency boundaries (Hutchins et al., 2015). To make appropriate tasking decisions (e.g., safe delivery locations, control handoffs, choice of autonomy level), the SAR team must understand the robot’s competency and how it may change during the mission.

In recent work, we showed that agents which reported a priori self-assessed confidence in task success helped align operator perception with actual robot competency, thus improving decision making and performance in a variable-autonomy navigation task (Conlon et al., 2022c). However, in dynamic and uncertain environments like the SAR scenario outlined above, an a priori confidence assessment can quickly become stale due to environmental changes like falling debris or obstacles. In this work, we propose an algorithm called Event-Triggered Generalized Outcome Assessment (ET-GOA) which enables autonomous agents to continually monitor in situ changes to competency, and if necessary, self-assess and report in response to events that significantly alter outcome likelihoods. We demonstrate that our method can capture dynamic environmental changes that positively or negatively impact task success and that using ET-GOA confidence assessments in autonomous decision-making can lead to significant increases in task performance.

2. Background and Related Work

Competency self-assessment enables autonomous agents to assess their capabilities and limitations with respect to task constraints and environmental conditions. This critical information can be used to improve internal decision-making and/or can be communicated to a human partner to improve external decision-making. Pre-mission (a priori) self-assessments enable an autonomous agent to assess its competency before execution of a task or mission. These methods generally compute agent self-confidence based on simulation (Ardón et al., 2020; Israelsen et al., 2019) or previous experience (Frasca et al., 2020). Our recent work showed that reporting of a priori self-assessments lead to better choices of reliance (Conlon et al., 2022b) and improvements to performance and trust (Conlon et al., 2022c). However in dynamic environments, a priori assessment is a poor predictor of the agent’s confidence due to factors that are not accounted for before execution, such as environmental changes, task changes, or interactions with other agents. Running a priori methods online (periodically) could conceivably capture dynamic competency changes. However such assessments can waste computational resources if competency has, in fact not changed, or may be too expensive for certain kinds of decision-making agents (Conlon et al., 2022a; Acharya et al., 2022; Gautam et al., 2022).

In-mission (in situ) self-assessment enables an autonomous agent to assess (or reassess) its competency during task execution. Popular methods such as online behavior classification can identify poor behavior and trigger the agent to halt operation and ask for help in the event of a failure (Rojas et al., 2017; Fox et al., 2006; Wu et al., 2017). These methods, while able to capture dynamic competency changes, require examples of both good (competent) and poor (incompetent) behavior, which may be difficult or impossible to acquire in many real-world applications. Another method of in situ self-assessment involves monitoring features of the agent’s current state. For example, Gautam et al. developed a method to monitor deviations from design assumptions (Gautam et al., 2022), while Ramesh et al. used the “vitals” of a robot to monitor its health during task execution (Ramesh et al., 2022). Both methods provide a valuable instantaneous snapshot of the agent’s state at a given time which can indicate performance degradation online; however, neither predicts higher level task competency (e.g., does the degradation actually impact the task outcome?). In contrast, we propose a method of in situ self-assessment that offers a hybrid approach by monitoring the alignment between the agent’s predictions and observations and triggering a (re)assessment of task confidence using an accurate a priori method when there is a deviation in alignment.

3. Dynamic Self-Assessment of Task Confidence

3.1. When to Assess Competency

To prevent unnecessary assessments and save onboard computational resources, we take an event triggered approach to in situ self-assessment. Our algorithm assesses confidence only when there is evidence that the agent’s task confidence has changed.

One promising method for detecting such a change is the Surprise Index (SI). SI is defined as the sum of probabilities of more extreme (or less probable) events than an observed event given a probabilistic model (Zagorecki et al., 2015). For a given event $e\in E$ , SI is computed by summing over the probabilities of more extreme events in the distribution $p(E)$ :

(1)

\displaystyle SI(e,p(E))=\int_{p(E)<p(e)}p(E)dE

SI can be though of as how (in)compatible an observation $e$ is given a set of possible events $E$ . This is a similar to the more well known entropy based surprise (Baldi and Itti, 2010; Benish, 1999); however entropy based surprise is unbounded, while Surprise Index is bounded between zero (most surprising) and one (least surprising). SI also shares similarities with the tail probability or the p-value given the hypothesis that $e$ is from the distribution $p(E)$ ; a large p-value (large SI) indicates strong evidence that $e$ is likely from $p(E)$ , while a small p-value indicates strong evidence to the contrary.

In this work we are interested in determining when the agent should re-assess its task confidence. We propose computing the SI of the agent’s observed state $s_{t}$ with respect to that agent’s model prediction $p(\hat{s}_{t})$ , and triggering a re-assessment when the SI falls below a threshold $\delta$ . In essence, we are monitoring the quality of the agent’s model given the task and triggering an assessment when quality wanes. Because some aspects of $s_{t}$ may be more relevant to competency than others, we compute SI over a subset of the state marginals.

3.2. How to Assess Competency

To assess competency we leverage the Generalized Outcome Assessment (GOA) (Conlon et al., 2022a). Given a probabilistic world model $M$ , GOA simulates task execution by rolling out state predictions $p_{M}(s_{t+1}|s_{t},a_{t})$ . Note that a $M$ could take the form of a Monte Carlo based planner(Israelsen, 2019), a black box neural network (Conlon et al., 2022a; Ha and Schmidhuber, 2018), or similar. GOA then analyses the state predictions and computes the agent’s margin of confidence in attaining an outcome better than some target outcome threshold $Z$ . Examples of target outcomes could include craters hit (which we prefer less of) or packages delivered (which we prefer more of). The confidence value can be reported as a raw value $\in(0,1)$ or mapped to a semantic label indicating confidence such as highly likely, likely, unlikely, highly unlikely. For the experiments outlined later, we use the raw numerical values of confidence.

3.3. Event-Triggered Generalized Outcome Assessment Algorithm

We call our method for surprise-based dynamic self-assessment Event-Triggered Generalized Outcome Assessment (ET-GOA). The algorithm is presented in Alg. 1 and can be broken up into two components: (1) before task execution and (2) during task execution.

Before task execution (lines 1-5): Line 1 takes as input a model M, a task specification T, a set of outcome thresholds Z (one for each outcome), and a set surprise thresholds $\delta$ (one for each state marginal of interest). Next (line 2) the model M is used to simulate execution of task T given initial state $s_{0}$ . This results in a set predicted state distributions $[p(\hat{s}_{t})]_{t=0:N}$ , one for each time step $t$ . We further break the state distribution for a given time step into $K$ marginal components. For example if we were interested in using the $x,y,$ and $z$ position in the SI trigger, $K=3$ and $p(\hat{s}_{t})$ would be broken into the set of marginal probability distributions $[p(\hat{s}_{t,x}),p(\hat{s}_{t,y}),p(\hat{s}_{t,z})]$ . This additional step of marginalization is implicit in the algorithm, but important to note. The predicted marginals for each time step are then stored in an experience buffer (line 3), and then used to compute the initial Generalized Outcome Assessment (line 4), which can be reported to an operator (line 5).

1 Algorithm ET-GOA(M, T, Z, $\delta$ ):

[p(\hat{s}_{1}),...,p(\hat{s}_{N})]\leftarrow

simulate

\textbf{M}(\textit{T},s_{0})

4 exp_buffer

\leftarrow[p(\hat{s}_{1}),...,p(\hat{s}_{N})]

goa\leftarrow

GOA(exp_buffer, Z)

6 report goa

7 for ( $t$ in $1:N$ )

s_{t}\leftarrow

receive_state_observation(

t

)

p(\hat{s}_{t})

\leftarrow

exp_buffer(

t

)

si_{min}=\min_{i=1:K}{\textit{SI}(s_{t,i},p(\hat{s}_{t,i}))}

11 if $si_{min}\leq\delta$ then

[p(\hat{s}_{t+1}),...,p(\hat{s}_{N})]\leftarrow

simulate

\textbf{M}(\textit{T},s_{t})

13 exp_buffer

\leftarrow[p(\hat{s}_{t+1}),...,p(\hat{s}_{N})]

goa\leftarrow

GOA(exp_buffer, Z)

15 report goa

17 else

18 continue

Algorithm 1 Event-Triggered Generalized Outcome Assessment

During task execution (lines 6-16): The agent observes the state $s_{t}$ at time $t$ (line 7). It then retrieves the state distributions (i.e., the predictions) for time $t$ from the experience buffer (line 8). Next the algorithm computes $si_{min}$ , the minimum of the SI of each of the $K$ observed state marginals $s_{t,i}$ given the predicted marginal distributions $p(\hat{s}_{t,i})$ (line 9). If $si_{min}$ is below $\delta$ an anomalous or surprising state observation has been received and confidence should be reassessed (line 10). In this case, a new set of predicted state distributions are simulated from $M$ (line 11) and saved in the experience buffer (line 12). A new self-assessment is then computed using the newly updated experience buffer (line 13) and reported to an operator (line 14). $si_{min}$ above $\delta$ indicates that the the agent’s predictions align with its observations and no confidence update is needed at this time (line 16). This loop (line 6) continues for the duration of the task, comparing predicted state marginal distributions to real observations and (if necessary) recomputing and reporting updates to the agent’s task confidence.

4. Experiments

We evaluated ET-GOA in two simulation experiments. The first investigated ET-GOA’s impact on task performance. The second investigated ET-GOA’s ability to capture changes in task difficulty.

4.1. Delivery Scenario Overview

Our experimental scenario was based on the motivating SAR example from section 1: A single agent was tasked to safely deliver cargo to one of three goals. The environment contained two types of obstacles: craters and dust zones, which were difficult for the agent to avoid. Driving over craters damaged the agent, and if enough craters were hit while navigating then the agent was considered broken and failed the delivery task. Dust zones degraded sensors and injected noise into the agent’s state transition dynamics. Dust zones were generally found near craters, which increases the chance that the agent hit a craters if it found itself in dust. To simulate environmental changes that would occur in realistic deployments, new obstacles could spawn at random locations (except for the agent’s location) during task execution.

The environment was a custom OpenAI Gym environment (Brockman et al., 2016). The agent was modeled as a discrete state/action Markov Decision Process with state space $s=(s_{x},s_{y},s_{c},s_{z})$ consisting of the agent’s $(x,y)$ location and the counts of craters $(s_{c})$ and dust zones $(s_{z})$ within its sensor field of view (FOV). The sensor FOV was modeled as omnidirectional with a radius of 10 grid squares. The total size of the 2D environment was 50x50 grid squares. We trained one policy for each goal using Q-Learning (Watkins and Dayan, 1992). No obstacles were present during training to prevent the agent from learning how to overcome the difficulties of the environment.

The world model $M$ used for self-assessment was a copy of the environment that had an identical state transition function but only included known craters and dust zones. The agent chose the goal which had the maximum assessed confidence. If there was a tie in confidence, the agent chose the closest goal. An example environment can be seen in Fig. 1.

Refer to caption — Figure 1. Example environment illustrating the agent’s location and FOV (orange), the goal area (green), truth locations of dust zones (blue circle) and craters (white circles). The obstacles are highlighted with blue and red to improve visual contrast.

We evaluated two different environments, static and dynamic. In the static environment, the locations of craters and dust zones were known by the agent a priori and remained unchanged for the entire task execution. In the dynamic environment, the locations of craters and dust zones were initially known, but changed at a predetermined time to simulate a previously generated onboard navigation map suddenly becoming out-of-date.

4.2. Hypotheses

We had three hypotheses: (1) In a static environment, ET-GOA and GOA will perform equally well and will both outperform a random goal choice; (2) In a dynamic environment, ET-GOA will outperform both GOA and random choice; (3) ET-GOA can capture both positive and negative changes in task difficulty. We analyzed agent performance (number of deliveries) for hypothesis 1 and 2, and we analyzed reported confidence relative to task difficulty changes for hypothesis 3.

4.3. Improvements to Performance

Our first experiment was used to validate our first two hypotheses. At $t=0$ we initialized the agent with the locations of all obstacles. For dynamic conditions we changed the locations of the obstacles without the agent’s knowledge at $t=10$ . Three conditions were considered: no assessment, GOA, and ET-GOA. The no assessment condition did not use any competency assessment. Rather, at $t=0$ the agent chose the goal at random and navigated directly to it. The GOA condition used the standard Generalized Outcome Assessment analysis discussed in (Conlon et al., 2022a). At $t=0$ the agent selected and navigated to the goal $g\in G$ with the highest GOA confidence according to Eqn. 2.

(2)

g=\arg\max_{i\in G}{GOA_{i}}

The ET-GOA condition used the ET-GOA algorithm discussed in 3.3. We used two state marginals as triggers: $s_{c}$ (and $\hat{s}_{c}$ ), the actual (and predicted) craters visible in the agent’s FOV; and $s_{z}$ (and $\hat{s}_{z}$ ), the actual (and predicted) dust zones visible in the agent’s FOV. This essentially computed the surprise between the expected obstacle locations (from the initial location information) and the “on the ground” obstacle locations observed while traversing the environment. We chose these specific marginals because they align with the sensor capability in modern robots. Additionally, looking at surprising observations in the agent’s sensor FOV enabled the algorithm to trigger a re-assessment prior to the agent physically coming into contact with a possibly dangerous obstacle. The algorithm triggered a re-assessment if the minimum SI of either state marginal was less than $\delta=0.05$ :

\min(SI(s_{c},\hat{s}_{c}),SI(s_{z},\hat{s}_{z}))<0.05

The agent then selected the goal based on eqn. 2. The agent navigated directly to that goal until it either reached the goal or triggered a re-assessment and chose a new goal. For each condition the agent attempted 100 delivery tasks.

4.3.1. Results

We found significant main effects of the environment on number of deliveries ( $t(598)=4.65,p<0.0001$ ) indicating that the static environment was easier than the dynamic environment, which was expected. In the static environment, we found significant effects of each reporting condition on deliveries ( $F$ ( $2,297$ ) $=43.5,p<0.0001$ ). Post-hoc analysis using Tukey’s HSD revealed significant increase in deliveries in ET-GOA compared to random ( $p=0.0001$ ), and significant increase in deliveries in GOA compared to and random ( $p<0.0001$ ). There was no difference between GOA and ET-GOA in the static environment, which was expected. In the dynamic environment, we found significant effects of each reporting condition on deliveries ( $F$ ( $2,297$ ) $=44.1,p<0.0001$ ). Post-hoc analysis using Tukey’s HSD revealed significant increase in deliveries in ET-GOA compared to both random ( $p=0.0001$ ) and GOA ( $p<0.0001$ ). These results confirm our first and second hypotheses and can be seen in Fig. 2.

4.4. Detecting Changes in Difficulty

Our second experiment was used to validate our third hypothesis. Here we evaluated at how well ET-GOA captured changes in the environment that impacted task difficulty. For this evaluation, we isolated the agent to navigate to a single static goal location under two conditions. The first, called $easy\xrightarrow{}hard\xrightarrow{}easy$ , starts out with no obstacles, then obstacles are randomly added at time step 10, then all obstacles are deleted at time step 30. The second, called $hard\xrightarrow{}easy\xrightarrow{}hard$ , starts out with randomized obstacles, then all obstacles are deleted at time step 10, and then new obstacles are added at time step 30. Adding obstacles increases task difficulty for the agent and vice versa for deleting obstacles. Obstacles at time step zero were known to the agent, while the obstacles added/deleted at time steps 10 and 30 had to be observed in situ. We ran 100 episodes for each condition and recorded the initial assessment and the ET-GOA assessment after each add/delete event.

4.4.1. Results

We observed a significant difference in the agent’s confidence between $easy\xrightarrow{}hard\xrightarrow{}easy$ tasks and $hard\xrightarrow{}easy\xrightarrow{}hard$ at the initial assessment $(t(99)=110.0,p<0.0001)$ , after the first environmental change $(t(99)=27.7,p<0.0001)$ , and after the second environmental change $(t(99)=37.5,p<0.0001)$ . This confirms our third hypothesis that ET-GOA can capture both positive and negative impacts to task difficulty. A plot of the results can be seen in Fig. 3.

5. Conclusions and Future Work

In this work we presented an algorithm for Event-Triggered Generalized Outcome Assessment which computes an autonomous agent’s in situ task confidence in dynamic and uncertain environments. ET-GOA chooses when to assess task confidence based on the Surprise Index between an agent’s predicted and actual state. We evaluated ET-GOA on a delivery task in both static and dynamic environments and found that it led to significant performance improvements over baseline methods. We also found that ET-GOA was able to capture changes in agent confidence indicating changes in task difficulty. That is, our method can determine when tasks become more or less difficult. Our next step is to validate ET-GOA both on live platforms and in a human subjects study. We hypothesize that the presence of ET-GOA will help operators make better decisions when it comes to relying on an autonomous robot, leading to improved performance and reductions in workload.

ET-GOA can enable autonomous robots to provide critical information about their “on the ground” confidence in task success, when that confidence changes, and why. We believe that it can be invaluable to human-robot teams, particularly those working in high risk and uncertain environments where human operators need to make critical decisions with respect to task execution, level of autonomy, and/or control.

Acknowledgements.

This work was supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR001120C0032. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of DARPA.

References

(1)
Acharya et al. (2022) Aastha Acharya, Rebecca Russell, and Nisar R. Ahmed. 2022. Competency Assessment for Autonomous Agents using Deep Generative Models. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 8211–8218. https://doi.org/10.1109/IROS47612.2022.9981991
Ardón et al. (2020) Paola Ardón, Eric Pairet, Yvan Petillot, Ronald Petrick, Subramanian Ramamoorthy, and Katrin Lohan. 2020. Self-Assessment of Grasp Affordance Transfer. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE. https://doi.org/10.1109/IROS45743.2020.9340841
Baldi and Itti (2010) Pierre Baldi and Laurent Itti. 2010. Of bits and wows: A Bayesian theory of surprise with applications to attention. Neural Networks 23, 5 (2010), 649–666. https://doi.org/10.1016/j.neunet.2009.12.007
Benish (1999) William A. Benish. 1999. Relative Entropy as a Measure of Diagnostic Information. Medical Decision Making 19, 2 (1999), 202–206. https://doi.org/10.1177/0272989X9901900211
Brockman et al. (2016) Greg Brockman, Vicki Cheung, Ludwig Pettersson, Jonas Schneider, John Schulman, Jie Tang, and Wojciech Zaremba. 2016. OpenAI Gym. CoRR abs/1606.01540 (2016). arXiv:1606.01540 http://arxiv.org/abs/1606.01540
Conlon et al. (2022a) Nicholas Conlon, Aastha Acharya, Jamison McGinley, Trevor Slack, C. Alexander Hirst, Mitchell Hebert, Chris Reale, Eric Frew, Rebecca Russell, and Nisar Ahmed. 2022a. Generalizing Competency Self-Assessment for Autonomous Vehicles Using Deep Reinforcement Learning. In AIAA SciTech Forum. AIAA.
Conlon et al. (2022b) Nicholas Conlon, Daniel Szafir, and Nisar Ahmed. 2022b. Investigating the Effects of Robot Proficiency Self-Assessment on Trust and Performance. In Proceedings of the AAAI Spring Symposium Series: Closing the Assessment Loop: Communicating Proficiency and Intent in Human-Robot Teaming. AAAI.
Conlon et al. (2022c) Nicholas Conlon, Daniel Szafir, and Nisar Ahmed. 2022c. “I’mConfident This Will End Poorly”: Robot Proficiency Self-Assessment in Human-Robot Teaming. In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2127–2134. https://doi.org/10.1109/IROS47612.2022.9981653
Fox et al. (2006) Maria Fox, Malik Ghallab, Guillaume Infantes, and Derek Long. 2006. Robot introspection through learned hidden Markov models. Artificial Intelligence 170, 2 (2006), 59–113. https://doi.org/10.1016/j.artint.2005.05.007
Frasca et al. (2020) Tyler Frasca, Evan Krause, Ravenna Thielstrom, and Matthias Scheutz. 2020. “Can you do this?” Self-Assessment Dialogues with Autonomous Robots Before, During, and After a Mission, In Proceedings of the 2020 Workshop on Assessing, Explaining, and Conveying Robot Proficiency for Human-Robot Teaming. arXiv e-prints.
Gautam et al. (2022) Alvika Gautam, Tim Whiting, Xuan Cao, Michael A. Goodrich, and Jacob W. Crandall. 2022. A Method for Designing Autonomous Robots that Know Their Limits. In 2022 International Conference on Robotics and Automation (ICRA). IEEE, Philadelphia, PA, USA, 121–127. https://doi.org/10.1109/ICRA46639.2022.9812030
Ha and Schmidhuber (2018) David Ha and Jürgen Schmidhuber. 2018. Recurrent World Models Facilitate Policy Evolution. In Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.), Vol. 31. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2018/file/2de5d16682c3c35007e4e92982f1a2ba-Paper.pdf
Hutchins et al. (2015) Andrew R Hutchins, Mary L Cummings, Mark Draper, and Thomas Hughes. 2015. Representing autonomous systems’ self-confidence through competency boundaries. In Proceedings of the Human Factors and Ergonomics Society Annual Meeting, Vol. 59. SAGE Publications Sage CA: Los Angeles, CA, Sage, 279–283.
Israelsen (2019) Brett Israelsen. 2019. Algorithmic Assurances and Self-Assessment of Competency Boundaries in Autonomous Systems. Ph. D. Dissertation. University of Colorado at Boulder, Boulder.
Israelsen et al. (2019) Brett Israelsen, Nisar Ahmed, Eric Frew, Dale Lawrence, and Brian Argrow. 2019. Machine Self-confidence in Autonomous Systems via Meta-analysis of Decision Processes. In International Conference on Applied Human Factors and Ergonomics. Springer, Springer, 213–223.
Ramesh et al. (2022) Aniketh Ramesh, Rustam Stolkin, and Manolis Chiou. 2022. Robot Vitals and Robot Health: Towards Systematically Quantifying Runtime Performance Degradation in Robots Under Adverse Conditions. IEEE Robotics and Automation Letters 7, 4 (2022), 10729–10736. https://doi.org/10.1109/LRA.2022.3192612
Rojas et al. (2017) Juan Rojas, Shuangqi Luo, Dingqiao Zhu, Yunlong Du, Hongbin Lin, Zhengjie Huang, Wenwei Kuang, and Kensuke Harada. 2017. Online robot introspection via wrench-based action grammars. In 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 5429–5436. ISSN: 2153-0866.
Watkins and Dayan (1992) Christopher JCH Watkins and Peter Dayan. 1992. Q-learning. Machine learning 8, 3 (1992), 279–292.
Wu et al. (2017) Hongmin Wu, Hongbin Lin, Yisheng Guan, Kensuke Harada, and Juan Rojas. 2017. Robot Introspection with Bayesian Nonparametric Vector Autoregressive Hidden Markov Models. In 2017 IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids) (Birmingham, United Kingdom). IEEE, 882–888. https://doi.org/10.1109/HUMANOIDS.2017.8246976
Zagorecki et al. (2015) Adam Zagorecki, Marcin Kozniewski, and Marek J. Druzdzel. 2015. An approximation of surprise index as a measure of confidence. In Proceedings of the AAAI Fall Symposium Series: Self-Confidence in Autonomous Systems. AAAI.