\nocopyright

Explanations for Automated Composition in Aggregated Assistants

Sarath Sreedharan¹

\cdot

Tathagata Chakraborti²

\cdot

Yara Rizk²

\cdot

Yasaman Khazaeni²
¹Arizona State University, Tempe AZ USA 85281
²IBM Research AI, Cambridge MA USA 02142
[email protected], { tchakra2, yara.rizk } @ibm.com, [email protected] Work done as an intern at IBM Research AI, Cambridge (USA) during the summer of 2020.

Abstract

A new design of an AI assistant that has become increasingly popular is that of an “aggregated assistant” – realized as an orchestrated composition of several individual skills or agents that can each perform atomic tasks. In this paper, we will talk about the role of planning in the automated composition of such assistants and explore how concepts in automated planning can help to establish transparency of the inner workings of the assistant to the end-user.

Conversational assistants such as Siri, Google Assistant, and Alexa have found increased user adoption over the last decade and are adept at performing tasks like setting a reminder or an alarm, putting in an order online, control a smart device, and so on. However, the capability of such assistants remain quite limited to episodic tasks that mostly involve a single step and do not require maintaining and propagating state information across multiple steps.

A key hurdle in the design of more sophisticate assistants is the complexity of the programming paradigm – at the end of the day, end-users and developers who are not necessarily subject matter experts have to be able to build and maintain these assistants. A particular architecture that has emerged recently to address this issue is that of an ”aggregated assistant” where the assistant is build out of individual components called skills. Skills are the unit of automation and they perform atomic tasks that can be composed together to build the assistant capable of performing more complex tasks. Prominent examples of this include IBM Watson Assistant Skills¹¹1IBM Watson Assistant Skills https://ibm.co/33f58Hc and Amazon Alexa Skills²²2Amazon Alexa Skills: https://amzn.to/35xK2Xv.

This setup is not particularly confined to personal assistants either. An increasingly popular use of assistants is in enterprise applications. Here also, examples of aggregated assistant can be seen in offerings from Blue Prism³³3Blue Prism Digital Exchange: https://bit.ly/2Ztzdla, Automation Anywhere⁴⁴4Automation Anywhere Bot Store: https://bit.ly/33hcr12, and others. These individual skills belong to the class of Robotic Process Automation (RPA) tools that automate simple repetitive tasks in a business process.

With recent advances in AI and conversational assistants, the scope of RPAs has also been evolving to take on more complex tasks, as we outline in [bpm-rpa-forum]. From the point of view of the planning community, this poses an interesting challenge: one of composing skills in a goal-oriented fashion to automatically realize assistants that can achieve longer-term goal-directed behavior. In [d3ba] we demonstrated one such possibility of optimizing an existing workflow or business processes by composing it with skills to maximize automation and minimize case worker load. We showed how this design can be done offline by adopting a non-deterministic planning substrate. This design choice does not, however, readily lend itself to the automated composition of aggregate assistant – the composed assistant is going to evolve rapidly based on user interactions and it does not seem reasonable to model all possibilities over all goals up front.

Refer to caption — Figure 1: Simplified architecture diagram of Verdi [Rizk2020AUnified] illustrating the assistant-agent-skill hierarchy.

Instead, in this paper, we will describe how we can use a rapid planning and re-planning loop to compose assistants on the fly. The first part of the paper will be outlining the implementation of this aggregated assistant in the form of a continuously evolving planning problem. One interesting outcome of this design is that a lot of the backend processes that affect the user interaction do not get manifested externally. In the second part of the paper, we will explore how concepts of causal chains and landmarks in automated planning will help us navigate transparency issues in aggregate assistants composed on the fly.

1 Aggregated Assistants

The particular aggregated assistant we will focus on is Verdi, as introduced in [Rizk2020AUnified] – architecturally, this subsumes all the examples above. Figure 1 provides a simple view of the system. An assistant in Verdi interfaces with the end-user (for a personal assistant) or case worker (for a business process assistant) in the form of an orchestrated set of agents, which are in turn composed of a set of skills.

Events

The assistant is set up to handle and respond to different kinds of phenomenon in its environment – e.g. a text from the end-user, an alert from a service it is monitoring, a data object or a pointer to a business process, and so on. In general, we will call them “events” $\mathcal{E}$ . In the examples in this paper, unless otherwise mentioned, we will deal with events involving user utterances only.

Facts

The assistant also has access to a shared memory or knowledge base which stores global information known to the assistant – this is referred to as the Long Term Memory or LTM in Verdi. This information can be accessed by different agents and skills inside the assistant. We will refer to a variable in the LTM as a fact $F$ with value $f$ . For the purposes of this paper, this abstraction will be enough and we will not go into the details of the architectural implementation of the LTM. We will also assume that all variables in the LTM share the same vocabulary (more on this later).

Skills

A skill, as we mentioned before, is an atomic function that performs a specific task or transform on data. Each skill $\phi$ is defined in terms of the tuple:

Skill $\phi=\langle[[i,\{o\}_{i}],\ldots],\delta\rangle$
- where $i\subseteq F$ and $o\subseteq F$ are the inputs and outputs of the skills respectively, and $delta$ is the number of times the skill can be called in the lifespan of one user session. Notice that the outcome is a set: each member $o_{i}$ is a possible outcome corresponding to the input $i$ , and there are multiple such pairs of inputs and corresponding output sets for each skill.

For example, a skill that submits a credit card application for the user would need as an input the necessary user information and as output will either produce a successful or failed application status or may require further checks. It can also be pinged with a different input such as an application number to retrieve the status of the application.

Agents

An agent can then be defined as the tuple:

Agent $\Phi=\langle\mathcal{L},\{\phi\}_{i}\rangle$
- where $\phi_{i}$ is its constituent skills and $\mathcal{L}$ is the logic (e.g. a piece of code) that binds the skills together.

Every agent defines two functions:

$\Phi\textit{.preview}\ :\ \Phi\times\mathcal{E}\mapsto\mathbb{E}[\mathcal{E}]$
- This is the preview function ( $\Phi.p$ in short) that provides an estimate of what will happen if an agent is pinged with an event. The output of the preview is an expectation on the actual output. In this paper this expectation will take the form of a probability that the agent can do something useful with the input event, thus: $\Phi\textit{.preview}\ :\ \Phi\times\mathcal{E}\mapsto[0,1]$ .
- Note that an agent may or may not have a preview. For example, if the evaluation requires a state change then an agent may not provide a preview: e.g. calling a credit score service multiple times can be harmful.
$\Phi\textit{.execute}\ :\ \Phi\times\mathcal{E}\mapsto\mathcal{E}$
This is the execute function ( $\Phi.e$ in short) that actually calls an agent to consume an event. Every agent must have an execute function.

Assistant

The assistant can then be thought of as a mapping from an event and a set of agents to a new event:

Assistant $\mathbb{A}\ :\ \mathcal{E}\times\{\Phi_{i}\}\mapsto\mathcal{E}$

The exact nature of this mapping determines the behavior of the assistant – this is the orchestration problem.

while $\mathcal{E}$ do

\{\Phi\}_{i}\times\mathcal{E}\mapsto\Phi_{eval},\Phi_{eval}\subseteq\{\Phi\}_{i}

/* subset of agents to preview */

foreach $\Phi\in\Phi_{eval}$ do

if $\exists\Phi\textit{.preview}$ then

\mathbb{E}_{\Phi}[\mathcal{E}]\leftarrow\Phi\textit{.preview}(\mathcal{E})

/* compute expected outcomes */

else

\mathbb{E}_{\Phi}[\mathcal{E}]\leftarrow\emptyset

end if

end foreach

\{\Phi\}_{i}\times\{\mathbb{E}_{\Phi}[\mathcal{E}]\}_{i}\times\mathcal{E}\mapsto\Phi_{exe},\Phi_{exe}\subseteq\{\Phi\}_{i}

/* subset of agents to execute */

foreach $\Phi\in\Phi_{exe}$ do

return

\Phi\textit{.execute}(\mathcal{E})

end foreach

end while

Algorithm 1 Verdi Flow of Control

2 Orchestration of an Aggregated Assistant

In this section, we will introduce how different orchestration patterns can be developed to model different types of assistants and describe the role of planning in it. Algorithm 1 provides an overview of the flow of control in an aggregated assistant, between the orchestrator and its agents.

Apriori and Posterior Patterns

Following the flow of control laid out in Algorithm 1 there are two general strategies for orchestration:

-

apriori where the orchestrator acts as a filter and decides which agents to invoke a preview on, based on $\mathcal{E}$ ; and
-

posterior where all the agents receive a broadcast and respond with their previews, and let the orchestrator pick the best response to execute.

These two strategies are not mutually exclusive. The apriori option is likely to have a smaller system footprint, but involves a single bottleneck based on the accuracy of the selector which determines which agents to preview. The posterior option – despite increased latency and computational load – keeps the agent design less dependent on the orchestration layer as long as the confidences are calibrated.⁵⁵5This can be achieved by learning from agent previews over time and gradually adapting a normalizer or a more sophisticated selection strategy over the confidences self-reported by the agents, e.g. using contextual bandits [sohrabi2010customizing]. The ability to learn such patterns can also be used to eventually realize a healthy mix of apriori and posterior orchestration strategies.

while $\mathcal{E}$ do

foreach $\Phi\in\{\Phi\}_{i}$ do

if $\exists\Phi\textit{.preview}$ then

P(\Phi)\leftarrow\Phi\textit{.preview}(\mathcal{E})

// confidence

S(\Phi)\leftarrow\textnormal{{Scorer}}(P(\Phi))

else

S(\Phi)\leftarrow 0

end if

end foreach

\Phi_{exe}\leftarrow\textnormal{{Selector}}(\{S(\Phi)\}_{i}),\Phi_{exe}\subseteq\{\Phi\}_{i}

\langle\Phi\rangle\leftarrow\textnormal{{Sequencer}}(\Phi_{exe})

foreach $\Phi\in\langle\Phi\rangle$ do

return

\Phi\textit{.execute}(\mathcal{E})

end foreach

end while

Function Scorer( $P(\Phi)$ ):

return

P(\Phi)

// default

return

Function Selector( $\{S(\Phi)\}_{i}$ ):

return

\Phi_{ext}=\{\Phi\ |\ S(\Phi)\geq\delta\}

such that:

|\Phi_{ext}|\leq k

and

\not\exists\Phi\in\Phi_{ext},\Phi^{\prime}\not\in\Phi_{ext}

with

S(\Phi)>S(\Phi^{\prime})

/* select top-

k

above threshold

\delta

return

Function Sequencer( $\{\Phi\}_{i}$ ):

return List( $\{\Phi\}_{i}$ )

// default

return

Algorithm 2 S3-Orchestrator

2.1 The S3 Orchestrator

The S3-orchestrator [Rizk2020AUnified] is a stateless, posterior orchestration algorithm that consists of the following three stages: Scoring, Selecting and Sequencing. Hence, the first step is to broadcast the event to all agents and solicit a preview of their actions: this preview must not cause any side-effects on the state of the world in case the agent is not selected. Once the orchestrator obtains the agents’ responses, it scores these responses using an appropriate scoring function so that the agents can be compared fairly (this where the normalization, as mentioned above, can happen, for example). After computing the scores, a selector evaluates them and picks one or more agents based on its selection criteria. One example is simply picking the agents that have the maximum score if greater than a threshold (referred to as Top-K selector). Finally, if more than one agent is selected for execution, a sequencer determines the order in which agents must execute to properly handle the event. Algorithm 2 details the process. Since agents’ executions may be dependent on each other, the order is important. Currently this is handled through simple heuristics and rules coded directly into the Sequencer. We will discuss later how planning and XAIP has a role to play here as well.

while $\mathcal{E}$ do

\langle\Phi\rangle\leftarrow\textnormal{{Planner}}(\{\Phi\}_{i})

foreach $\Phi\in\langle\Phi\rangle$ do

return

\Phi\textit{.execute}(\mathcal{E})

end foreach

end while

Algorithm 3 Planner-Based Orchestration

2.2 Stateful Orchestration using Planning

The S3-orchestrator does not maintain state and does not make any attempt to reason about the sequence of agents or skills being composed. In the following, we will go into details of a planner-based orchestrator that can compose agents or skills on the fly to achieve user goals. In comparison to the S3, this means we no longer need a selector-sequencer pattern: the planner decides which agents or skills to select and how to sequence them. In order to be able to do this effectively, we need to have the following external components:

Specification. The ability to compose agents on the fly requires a specification of agents in terms of (at least) their inputs and outputs.⁶⁶6Of course, this can be extended to include additional rules relevant at the orchestration layer, such as ordering constraints [allen1983maintaining] to model preferred sequences: e.g. “make sure do start with the chit-chat agent before anything else”. This is somewhat similar to applications of planning [sohrabi2010customizing, carman2003web] in the web service composition domain [srivastava2003web].

A necessary prerequisite to be able to use these variables from the shared memory across different agents and skills (developed independently and from different sources) is that they share some vocabulary. This needs to be enforced on the specification through the use of some schema or ontology to ensure consistency up front, or by fuzzy matching variable names on the fly during execution. Existing works in using planning for web service composition have explored advanced technique on this topic [hoffmann2007web, hepp2005semantic, sirin2004htn]. As we mentioned before, for this paper, we assumed that the skill and agent specifications share the same vocabulary.
Goal-reasoning engine. Finally, the composition at the orchestration layer requires a goal: the planner-based orchestrator produces goal-directed sequences of agents and skills. This goal is something that is derived from events and is usually related to something the user is trying to achieve. However, this may not always be the case. We will go into further details of this in the next section.

3 Stateful Orchestration using Planning

We will now go into the details of implementation of the planner-based orchestrator, and then a detailed example of it in action. This will help us better understand the XAIP issues endgenderd by automated orchestration of an aggregated assistant and the details of the proposed solution.

3.1 Agent and skill specification

The central requirement to start converting skill sequencing problem into a planning problem is to have access to some form of a model of how the skills operate. For now, we expect the designer of the skills to provide this specification, though one could in the future look at potentially learning such specifications in the future, particularly through simulation of individual skills. Our goal with skill specification has been to keep the specification as low as possible and move as much of the specifics to the execution component as possible (where we could even leverage evaluate options when possible). This not only allows us to alleviate modeling overhead from skill designers but also allows us to keep the planning layer light. Thus allowing us to perform rapid execution and replanning. The agent specification file itself would contain a set of general information relevant to the entire agent configuration, a set of specifications for inbuilt Verdi skills that the agent could use, and specifications for the external skills that may be developed by independent developers and may live in a common skill catalog. We require that every element of these specifications are synced via an ontology that could potentially include information correlating the various elements of the specifications.

For each external skill, we expect the specifier to provide at the very least the following information. (a) the function endpoint of the skill (b) a user understandable description of the skill (to be used in generating explanatory messages) (c) an upper limit on the number of times its reasonable to retry the same skill to get a specific desired outcome and finally (d) an approximate specification of the function of the skills. Here the function of the skill is taken to be a set of tuples of input and possible output pairs, that represents roughly the various operational modes of the skills. For example, a visualization skill could take raw data and generate the plot that it believes best represents that data or it could take the data along with the plot type to create the specified plot. Each pair consists of a set of inputs the skill can take and specifies a set of possible outputs that could be generated by the skills for those specific inputs. The possible output set consists of mutually exclusive sets of output information that can be generated by the skills. These can be thought of as the non-deterministic effects of the skills, though in the next section we will discuss how these scenarios diverge from traditional non-deterministic planning settings. Each of the input and output elements is assumed to have a corresponding entry in the ontology. For now we will assume that any specific requirements on the values of these items will be reflected by distinct elements in the ontology. For example, if there are two different loan application skills, one for lower loan amounts and another that is meant for higher loan amounts, then we will assume the input for the second skill will be denoted by the element high_loan_amount (which may be a subclass of the element loan_amount). Though going forward we could also allow authors to specify constraints directly, which could be then be used by the planner.

The special inbuilt skills also get an entry in the specification with a similar structure to the ones for the external skill, but we do not require them to refer to elements in the ontology. They may use keywords that may map to specific compilations when we build the planning model out of the specification. The agent level specification entry could specify global constraints. For example, one could assign specific properties for certain information items at an agent level and use such properties to specify preferences over certain sequences.

Compilation to PDDL

The next step would be to convert the agent specification into a planning model. Given our problem is closely related to non-deterministic planning, we decided to map the agent specification into a non-deterministic planning model with additional considerations provided to support problem elements that are uncharacteristic of standard non-deterministic planning problems.

In terms of modeling, we create a type for each element that appeared in the specification and we created a type for each individual input, the output set pairs of the skill function specification. We create a pre-specified number of objects for each element type (which for the example domain was limited to one element per type) and one object each for the skill pairs. From the ontology, we also incorporate the subsumption information into the planning model by converting it into a type hierarchy. For most popular planners this means restricting the relations to a tree-like hierarchy. We can do this by converting hierarchy into a tree by introducing new types or updating the grounder to allow for non-tree like type hierarchies. Going forward, we can also allow the incorporation of other types of information from the ontology similar to [jeorg]. In the simplest case, we are interested in using the skills to establish the value of certain elements. Thus the central fluent of the planning problem is (known - ?x). Thus the planner itself doesn’t care about the specific values that each element takes but rather tries to find the value associated with each element.

In terms of action specification, we made use of an all outcome determinization of the non-deterministic model to plan with. Wherein for each external skill $s_{i}$ , whose functional specification can be represented by the tuple $s_{i}=\langle(I^{1},\{O_{1}^{1},..,O_{k}^{1}\}),...,(I^{m},\{O_{j}^{m},..,O_{j}^{m}\})\rangle$ , we create a different action for each possible input and output pair. So just for the tuple $(I^{1},\{O_{1}^{1},..,O_{k}^{1}\})$ we create $k$ possible lifted actions each meant to capture the ability of the skill to achieve the specific outcome. For each action $a_{1,1}^{s_{k}}$ , the precondition enforces that the value of each input element in $I^{1}$ is known and the effects ensure that for each output element in $O_{1}^{1}$ the known value corresponding to that element is set to true. The goal of the planning problem becomes to set the value of the element known. For cases where there is a single object per element, the goal is tied to that object. For cases where we may use multiple objects per type (to allow duplicate objects of the same type), we could use a disjunction in the goal. Usually, the mapping from user utterance to goals is a domain-specific one and needs to be specified for each agent. The initial state is provided by the current set of known values for each element.

Internal Skills and Other Constraints

Unlike the external skill, we currently allow for more flexibility when compiling internal skills. For example, in the scenario considered we had two internal skill. A slot filling skill that allowed querying the user for the value of a specific element (which basically has an add effect that setts that element to be known) and an authorization skill that authorizes the use of certain skills over sensitive information. In addition to the specific compilation of inputs and outputs, we also allow the inclusion of more global constraints. One of the examples, we had was this need to perform authorize action before sending sensitive information (listed in the specification) to skills. We did this by adding an authorize precondition to skills that took sensitive information as an input.

3.2 Learning Through Execution

As mentioned earlier, the exact setting we are looking at is not the standard non-deterministic planning setting studied in the literature. In fact, many of the non-deterministic aspects of skill execution could be tied to unobservable factors and incomplete specifications. In such cases, many of the standard assumptions made for non-deterministic planning like fairness is no longer met. Thus we can’t directly apply traditional non-deterministic planning and we need to allow for these additional considerations. In this case, we will use the information gathered through observation of execution to overcome such limitations. Currently, we allow for two such mechanisms. One to prevent the system from getting trapped in loops trying to establish specific values that the skill may not be able to establish currently for any reason. We do this by introducing a new predicate called cannot_establish that takes in two arguments: the skill (or more specifically the skill pair) and a specific element. For each output of an action, we add a negation of cannot_establish for the corresponding output element and the skill pair corresponding to that skill into that action’s precondition. This means, if the value for the predicate is set true in the initial state then the action wouldn’t be executed. This value is established by the central execution manager, that monitors all the skills that have been executed till now and maintains a counter of how many times a skill has been unsuccessfully called to establish some value. When the counter crosses the limit set in the specification. The corresponding value is set in the initial state for all future planner calls.

The next problem we wanted to address was the one arising from the fact that these specifications are inherently incomplete. Thus the planner may try to call the skills using values that may not valid or expect outputs for certain inputs that the skill may not be able to generate. We expect currently that such situations can be detected by either the skill itself or the execution manager and once such an incorrect invocation of the skill is detected the corresponding grounded action is pruned out of the model.

3.3 Illustrative Example

3.4 Illustrative Example

We used a sample banking scenario to demonstrate the features of our system. The assistant here is meant to help users submit a loan application and credit card applications, which includes gathering relevant information from the users and submitting them to the appropriate channels and reporting the results. The assistant here mainly consists of an agent, that has access to six skills and includes an NLU component capable of recognizing a number of intents, including asking for a loan, a credit card, requesting to retrieve specific information and show it to them, ask for explanations, summaries and to stop the current request. The six skills the agent has access to include (Figure 2), skills for actually performing the request submission, a skill to perform OCR, a skill to retrieve information from Bank DataBase (where it can retrieve records either by providing the user’s email id or their account number). The agent also has access to two Verdi internal skills; namely, the slot filling skill meant to get information from the user and an authorization skill meant to ask users to authorize the passing of sensitive information to skills. The slot filling skill additionally have access to the list of information it can get from the user, which is not currently part of the specfication and thus is not known to the the user.

3.5 Properties of Stateful Orchestration

Graceful Digression

Authentication and Privacy

Goal Reasoning

4 Automated Orchestration and XAIP

Even though using planners to compose stateful orchestration patterns on the fly allowed us to model the complex conversational patterns above, from the end-user perspective, we have a problem: in the example in Figure 4 we noted the internal processes going on in response to the user utterances but none of this is actually visible to the user. This means when the goal is achieved, a series of events have unfolded in the background for the assistant to reach that conclusion unbeknownst to the user. As such, this can be quite unsettling, as seen in the sudden response with the loan result after a couple of interactions with the assistant acquiring information from the user. We propose to navigate this outcome of the automated orchestration approach using three techniques:

“What?” questions. The first step towards achieving transparency is to be able to provide summaries [amir2018highlights, sreedharan2020tldr] of what was done in the backend back to the user. While one could simply dump the entire steps of the plan, our focus here was to provide only the most-pertinent set of information to the user and provide them with ability to drill down as required. This is motivated not just by the fact that the plan may have many number of steps, but also the fact that the given the online nature of the planning problem, the agent may have performed steps that were in the end not pertinent to the final solution. To extract just steps or in this case the pieces of information central to achieving the goal, we extract just the landmarks corresponding to the achievement of the goal from the original initial state, since regardless of the exact steps all information corresponding to landmark facts should have been collected to get to the goal. Such techniques have been used also recently to summarize policies [sreedharan2020tldr]. Unlike those cases since in this case, the system is also learning about the problem in the process of execution we also need to incorporate any cannot_establish facts that may have been learned along the way into the initial state and also remove any actions that may have been pruned along the way. This means even though the first time the planner was called, it believed it could establish the goal with a single step, through the execution it realized it needs to gather multiple information items. Figure 5 shows the summary generated for the current demo domain. The actual summary text presented to the user uses a template-based generation, that relies on ordering between the landmarks generated via a topographical ordering of landmarks corresponding to the establishment of information (while ignoring any compilation specific fluents). In the future we could also go with just using the order it was collected in the executed plan.
“How?” questions. Once a summary is provided, we allow the user to drill down on any aspect of it by asking how that was achieved. This involves identifying the skill that established the value of the element and then providing that details to the user. Here we rely on the simple description provided by the skill author in the specification. In addition to showing the skill, we also provide the inputs that the skill took, so if the user requires they can chain even further back.
“Why?” questions. Furthermore, the user can also go forward from that fact and explore why that was required to be established, in terms of its role in achieving the goal. Here we will adapt the popular idea of using the causal chain from an action to the goal as the justification for its role in the plan (c.f. [manuela, seegebarth]) to individual facts. We identify whether an action contributed something to the goal by performing a regression from the goal over each action that was executed by the agent. Where for a skill execution that took as input the set of facts $I_{i}$ and generated an output set $O_{i}$ , for a current regressed state $s_{i}$ , the next state $s_{i-1}$ is provided as $(s_{i}\setminus O_{i})\cup I_{i}$ . Now we can allow for the fact that the system may have made missteps, by ignoring any actions that don’t contribute something to the current regressed state. We stop as soon as we reach a skill that contributes to the regressed state and used the fact in question. We can either provide the full causal chain to the goal or just provide this action. Figure 6, shows the example explanation provided for our demo domain.

There are many reasons to surface the internal processes to the end-user on demand

Explanations for Automated Composition in Aggregated Assistants

Abstract

1 Aggregated Assistants

Events

Facts

Skills

Agents

Assistant

2 Orchestration of an Aggregated Assistant

Apriori and Posterior Patterns

2.1 The S3 Orchestrator

2.2 Stateful Orchestration using Planning

3 Stateful Orchestration using Planning

3.1 Agent and skill specification

Compilation to PDDL

Internal Skills and Other Constraints

3.2 Learning Through Execution

3.3 Illustrative Example

3.4 Illustrative Example

3.5 Properties of Stateful Orchestration

Graceful Digression

Authentication and Privacy

Goal Reasoning

4 Automated Orchestration and XAIP

5 Work in Progress