Towards a Unified Framework For Interpretable Planning
Introduction
There is a growing interest in studying and developing AI-based systems that are able to work with and alongside humans. A particular sub-problem within this direction that has been getting a lot of attention recently has been the development and study of interpretable behavior. In general, these works look at generating agent behaviors that are designed to manifest specific changes in the observer’s mental state without the need for explicit communication. While many classes of interpretable properties have been previously investigated chakraborti2019explicability, they have generally been studied in isolation. Regardless of the fact that each of these properties generally focuses on generating complementary effects on the human’s mental state, there has been surprisingly little work in developing methods that are able to reason about all these properties simultaneously.
The goal of this paper is to address this lacuna, in particular, to come up with a framework that is able to reason about the three most widely considered interpretable behaviors, namely; legibility, explicability, and predictability. We aim to achieve this objective by
-
1.
Identify general enough settings where all three interpretability measures can be exhibited.
-
2.
Map each of these interpretability measures to specific phenomena at the user’s end
-
3.
Propose a unified reasoning framework that is able to model all relevant effects on the user’s mental state given a specific agent behavior.
Specifically, we will base our reasoning framework on Bayesian probabilistic reasoning because there already exists Bayesian formulations for two of the measures we consider and also there is evidence to believe that humans may be inherently using Bayesian reasoning. This means in addition to extending existing formulation to the new setting, we will also propose a Bayesian formulation for explicability that is completely compatible with earlier formulations. In addition, to developing this framework we will also explore how one could leverage communication (which may be limited or expensive) to boost each of these interpretability measures. Moreover we will demonstrate the usefulness of this framework in a simple grid world domain, which we will also use to perform user studies to validate predictions made through our framework.
Illustrative Example: An Office Robot
Throughout this paper we will consider the operations of an office root to illustrate the various behaviors made possible by the setting. The basic setting consists of a mobile robot that is operating in an office floor. The robot is generally tasked with performing various menial duties in the office, including delivering various object to specific employees. You as the floor manager has been tasked with observing the robot and making sure it is working properly. Given you have observed the robot in action previously, you may have come to form some expectations on the robot’s capabilities and common tasks it generally pursues. Though you may not know them for certain. For example in the situation illustrated in Figure 1, you may think the most possible goals of the robot are either to deliver coffee or to deliver mail to room C. Though you may not know this for certain and there may be other possibilities that you haven’t considered.

If one of the operational objectives of the agent is to make sure that you the observer is kept in the loop, then one of the most effective approaches it can employ is to generate interpretable behavior (assuming the communication capabilities of the agent is limited). Through careful selection of its behavior it can ensure its behavior meets users expectation or reveals its true model (or in this case the objective) or choose plans that are easy to predict. The general approach has been to consider these properties in isolation. One of the core goals of this paper will be to establish whether we can even generate that are able to convey its model information, be easy to predict, while still remaining predictable.
Background
Through most of the discussion we will be agnostic to the specific models used to represent the agent. We will also use the term model in a general sense to include information not only about the actions that the agents are capable of doing and their effects on the world but also includes information on the current state of the world, the reward/cost model and if there are any goal states associated with the problem. For certain cases we will assume that the model itself could be parameterized and will use to characterize the value of a parameter for the model . Since we are interested in cases where an human is observing an agent acting in the world, we will be mainly focusing on agent behavior (as opposed to plans or policies) and their likelihood given the model. Specifically, a behavior in this context will consist of a sequence of state, action pairs (we will generally denote it using ), and we assume the likelihood function will take the form of , where is the space of possible models and is the set of possible behavioral traces the agent can generate.
When dealing with specific examples, we will assume the models are represented as STRIPS style models, where each model is given by the tuple geffner. Here F is the set of propositional fluents, A is the set of actions and G the goal description and the cost function associated with each action. For likelihood function we will rely on a noisy Boltzmann distribution (), where is a trace corresponding to an optimal plan for the model. In deterministic cases we will also ignore the difference between a sequence of actions and the full trace containing states and action sequences.
Given the human-aware reasoning scenario, we are in fact dealing with two different models, the actual agent model that is driving the agent behavior (denoted ) and the human’s belief about the agent’s model (when its usually a single model we will denoted it as ). We make no assumption that these two models are represented using equivalent representational schemes or use equivalent likelihood functions. Though application of communication in these setting requires an additional mapping from model information in the agent’s representation scheme to a form that can be consumed by the human observer.
Interpretability Measures
With the basic notations for the scenario in place, we are now ready to give a high level descriptions of the interpretability measures of interest. Note that here we have followed the interpretation of the measures as laid out in chakraborti2019explicability, with some generalization allowed to transfer binary concepts of interpretability in some cases to more general continuous scores.
Legibility ()
Legibility of a behavior was originally formalized as the ability of the behavior to reveal its underlying objective. The usual scenario considered involves an observer who has access to a set of possible goals () for the agent and is trying to identify the exact goal from the observed behavior. The generation of legible behavior is usually formalized as generating behavior that maximizes the probability of the original goal
Where is the original goal of the agent. In general legibility can be generalized to implicit communication of any model parameter SZ:MZaamas20.
Explicability ()
A behavior is assumed to be explicable if the behavior meets the user’s expectation for the agent in the given task. In the binary form this is usually taken to mean that the behavior is explicable if it is one of the plans the observer would have expected the agent to generate. In the continuous form this is usually translated as being proportional to the distance between the current trace and the closest expected behavior. Thus the problem of generating the most explicable behavior is usually formalized as
Where is some distance function and is the expected behavior for the model . While there is no consensus on the distance function or expected behavior, a reasonable possibility for the expected set may be the set of optimal plans and the distance could be cost difference kulkarni2020designing.
Predictability ():
This measure correspond to the user’s ability to complete a given behavior given the current behavior prefix. Now the goal of the agent is to choose a behavior prefix such that
Where is the probability assigned to a future behavior given the current observed prefix under the model .
One might note that the first two measures were defined with respect to entire behavior traces while Predictability is defined with respect to prefix. This also points to the larger point that while one could meaningfully describe an offline notion of legibility and explicability (i.e. the score is assigned to the complete behavior trace), predictability only makes sense in an online settings (i.e. the observer is in the process of observing the agent in the middle of execution). Since our objective is to provide a unified discussion of all the metrics, we will limit ourselves to online versions of legibility and predictability. The descriptions provided above can easily be extended to such settings by considering prefixes (i.e. probability of goals/models given a prefix and the distance between current prefix and expected prefixes). As we will see in the next section internally this would require considering the most likely completions.
Interpretability Measures

All of the measures discussed above in one way or another try to reason about the effect of the agent’s behavior in the observer’s end. Thus an important part of our reasoning framework would be to uniformly reason about such effects. We will assume the observer is engaged in Bayesian reasoning with the model provided in Figure 2. This is motivated by both the popularity of such models in previous works in the direction and since there are enough evidence to suggest that people do leverage in Bayesian reasoning l2008bayesian. The random variable stands for the possible models the human thinks the agent could have. The possible values could take would include the explicit models the human thinks are possible and a special model that captures other possibility and the human’s confusion about the underlying agent model. There are a few ways one could model could be represented, but a simple way might be to represent it using a high entropy model. As we will see this is a prerequisite for modelling the notion of explicability as this would correspond to the hypothesis the human would attribute unexpected or surprising behavior from the robot to. The random variable correspond to the behavior prefix that the user observed, while correspond to possible completions of the prefix.
We assert that in this setting, observation of any given agent behavior leads to the human observer updating both their beliefs about the agent model and possible future actions of the agent. We will show how one could map each of the interpretability score and some other related scores that while are relevant to human-aware aren’t necessarily interpretability measures.
Interpretability Measures:
Legibility w.r.t a Parameter set
The legibility score of a behavior prefix for a specific model parameter can be defined to be directly proportional to the probability of the human’s belief in that parameter’s value in the true model, i.e.,
While this may appear to be a direct generalization of the legibility descriptions previously discussed, there are some important points of departure. For one there is no assumption that the actual parameter being conveyed or the actual robot model is part of the hypothesis set being maintained by the user. Thus it is not always guaranteed that high legibility score can be guaranteed. Also note that the parameter is not tied to a single model in the set. Finally, the presence of with non-zero prior distribution would affect what constitutes legible behavior. Since most of the earlier work had an explicit assumption that the observer is certain that the robot’s model is one of the few they are considering. This means in many cases existing legible behavior generation methods would create extremely circuitous route that is more likely in one model than others. For example in Figure 1, a legible planner might suggest the robot take a step towards the left to reveal its trying to deliver coffee, even though that correspond to an extremely suboptimal plan. In more realistic settings, where people aren’t so certain about possible robot models such behavior would leave them confused as opposed to help them understand the exact model. By explicitly incorporating , we hope to capture such dynamics.
Explicability
The explicability score of a behavior prefix can be defined to be directly proportional to the probability assigned to all model that are not
While at first this might look unconnected to the distance based formulation that appear in literature, we will go onto show that these formulations are in fact equivalent under reasonable assumptions. To start with let us consider a setting where the set of possible models at the user’s end consists of just and another model . Moreover to keep ourselves in settings previously explored by previous explicability works, let us assume that we are dealing with offline settings. Such settings will be captured in our formulation by restricting the completions (). Now the above formulation would have the explicability score to be
Now given the fact that corresponds to the entire plan, is the same as the likelyhood function described earlier, which gives us.
Now let us consider two plausible models for likelihood here. For the first one, let us assume the normative model, where the agent is expected to be optimal. Which would mean is either thus leading to high explicability or thus not explicable. This brings us to the original binary explicability formulation used in works like balancing; chakraborti2019explicability etc. Now another possible likelihood function could be to use a noisy rational model. Such distributions have been previously considered for works like fisac2020generating. Here the likelihood function would be given as
Where is the cost of the best possible behavior. This would now again map the formulation to the distance based formulation for explicability where the distance is defined on the cost kulkarni2020designing. Regardless of the specific formulation, in the end explicability is a measure that is meant to capture the user’s understanding of the robot model. Earlier formulations rely on using the space of expected plans as a stand-in for the model itself. This is further supported by the fact that works that have looked at updating user’s perceived explicability value of a plan do it by providing information about the model and not by directly modifying the users understanding of the expected set of plans. Now moving onto the more general settings, where we have now multiple possible models. Notice that if we have models with the same prior belief, then the formulation would make no difference between plans that work in both models equally well versus one that works in individual models, which would mirror our intuitions about behavior in such scenarios.
Predictability
The final measure we are interested in capturing in this formulation is predictability. Which we could be captured as
Where is the completion considered by the agent. This is more or less a direct translation of the predictability measure to this more general setting (for example in earlier works like fisac2020generating generally you consider a single possible model). One interesting point to note here is predictability only optimizes for the probability of the current completion, which allows for the system to choose unlikely prefixes for cases where the agent is only required to achieve required levels of predictability after a few steps.
Deception and Interpretability
Note that the interpretability measures being discussed involve leveraging reasoning processes at the human’s end to allow them to reach specific conclusions. At least for legibility and predictability, the behavior is said to exhibit that particular interpretability property when the conclusion lines up with the ground truth at the agent’s end. Though as far as the human is considered, at least at a given step they would not be able to distinguish between cases where the behavior is driving them to true conclusions or not. This means that the mechanisms used for interpretability could be easily leveraged to perform behaviors that may be adversarial. Two common classes of such behavior could be, deceptive behavior and obfuscatory behaviors. Deceptive behavior corresponds to behavior that is designed to convince the user of incorrect information. This might be equally applicable to both model information and future plans. A formulation for deceptive behavior score in such cases could be
This formulation can be also adapted to parameters and plans.
The next form of adversarial behavior could be those that are meant to confuse the user. This could take the form of either generating inexplicable plans (those that try to increase the posterior on ) or more interestingly the ones that try to obfuscate. Which can be specified as
i.e it is proportional to the conditional entropy of the model distribution provided the observed behavior.This formulation can be also adapted to parameters and plans.
Now in the case of explicability, the question of deceptive behavior becomes more interesting because explicable plan generation is really useful in isolation when the agent model isn’t part of the set of possible models that the human is considering. Else per our formulation, the agent can just stick to generating optimal plans and those would stay explicable. This means that explicable behavior is deceptive in the sense that they are reinforcing incorrect notions about the agent model. In fact, in such cases, these plans would have high deceptive score per the formulation above given the fact that in such cases. But one could make the arguments that explicable behaviors are white lies as the goal here is just to ease the interaction and not to any malicious intent. In fact, one could even further restrict the explicability formulation to generate a version that only lies by omission, by restricting the agent to generate only optimal behavior. That is among the optimal behaviors it chooses the one that best aligns with human’s expectations.
Interpretable Planning
Role of Communication
General Framework for Planning
Evaluation
Demonstration
User Study
Most of the definitions are made to be parallel to the results made by paper.A side effect of the assumption made by the setting, is you can’t make arbitrary decisions about legibility. That may be.