This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

RestGPT: Connecting Large Language Models
with Real-World RESTful APIs

Yifan Song1, Weimin Xiong1, Dawei Zhu1, Wenhao Wu1, Han Qian2, Mingbo Song2
Hailiang Huang2, Cheng Li3, Ke Wang3, Rong Yao3, Ye Tian3, Sujian Li1111Corresponding author.
1
School of Computer Science, Peking University 
2School of Electronics Engineering and Computer Science, Peking University 
3Huawei Technologies 
{yfsong, lisujian}@pku.edu.cn

https://restgpt.github.io
Abstract

Tool-augmented large language models (LLMs) have achieved remarkable progress in tackling a broad range of tasks. However, existing methods are mainly restricted to specifically designed tools and fail to fulfill complex instructions, having great limitations when confronted with real-world scenarios. In this paper, we explore a more realistic scenario by connecting LLMs with RESTful APIs, which adhere to the widely adopted REST software architectural style for web service development. To address the practical challenges of tackling complex instructions, we propose RestGPT, which exploits the power of LLMs and conducts a coarse-to-fine online planning mechanism to enhance the abilities of task decomposition and API selection. RestGPT also contains an API executor tailored for calling RESTful APIs, which can meticulously formulate parameters and parse API responses. To fully evaluate the performance of RestGPT, we propose RestBench, a high-quality benchmark which consists of two real-world scenarios and human-annotated instructions with gold solution paths. Experiments show that RestGPT is able to achieve impressive results in complex tasks and has strong robustness, which paves a new way towards AGI.

1 Introduction

Large language models (LLMs), such as GPT-3 [1] and ChatGPT [2], have shown various emergent abilities, including in-context learning [1, 3], reasoning [4, 5], and step-by-step planning [6, 7]. In pursuit of advancing the capabilities of LLMs for practical applications, an ongoing research direction is investigating the incorporation of external tools/APIs to enhance the functionality of LLMs[8, 9, 10, 11]. This endeavor has yielded successful integration of diverse tools, including search engines and other foundational models, with LLMs[12, 13, 14].

Despite significant progresses, we find that existing API-augmented LLMs are still in the experimental stage and have yet to fully meet the demands of real-world user instructions. As shown in Table 1, current methods are limited to connect with a small number of specially designed tools/APIs [11, 12, 15]. For example, Chameleon [12] designs a set of 15 tools, such as table verbalizer and image captioner. Additionally, the absence of a standardized API design specification obstructs the scalability of previous endeavors. Thus, the potential for connecting LLMs with a diverse range of real-world APIs, like RESTful APIs, remains under-explored and challenging. Furthermore, when dealing with a complex instruction in real scenario, it is necessary to decompose it into smaller sub-tasks and accomplish them by employing a mix of various APIs. As a result, it becomes essential for API-augmented LLMs to have robust planning and decision-making capabilities to effectively tackle real-world tasks [9]. Nonetheless, existing techniques, either offline introspective plan-then-execute methods [9, 13, 12] or the ReAct framework [16], encounter challenges in effectively adapting API feedback and generating viable plans.

Model API/Tool Use Framework
Num. Extensibility Schema Planning Planning Form Feedback Plug-n-Play
ReAct 3 - Specialized Online Natural Lang.
Toolformer 5 - Specialized -
Visual ChatGPT 22 - Specialized - Human
ViperGPT 11 - Python func. Offline Program
HuggingGPT 24111HuggingGPT [13] claims it has integrated hundreds of models on HuggingFace. However, all of the models only cover 24 tasks such as text classification, object detection, etc. ++ HuggingFace Offline Natural Lang.
API-Bank 53 - Specialized - Human
Chameleon 15 - Specialized Offline Natural Lang.
Gorilla 1645 ++ JSON -
GPT4Tools 31 - Specialized - Human
RestGPT (ours) 100+ ++++ RESTful Online Coarse-to-Fine
Table 1: A comparison of work that augments LLMs with API/tool usage. denotes API selection with retrieval.

In this work, we delve into a more realistic scenario by connecting LLMs with real-world RESTful APIs, aiming at fulfilling practical user instructions. RESTful is the de facto standard for web service development [17], which utilizes HTTP methods (e.g., GET, POST) and URIs to manipulate resources. RESTful API development typically adheres to the OpenAPI Specification (OAS) [18], which describes the operations, parameters, and response schemas of each API endpoint. Therefore, our resulting framework can connect with any RESTful application and offer standardized API development processes, thereby enabling enhanced extensibility compared to previous approaches. However, connecting LLMs with RESTful APIs also brings practical challenges. First, calling real-world APIs may give rise to a multitude of unforeseen situations, necessitating the framework to exhibit strong robustness and conduct reasonable planning. Second, the parameters and responses of RESTful APIs often follow specific formats, leading to difficulty in API invoking and response parsing.

To tackle the limitations of previous methods and the practical challenges associated with RESTful APIs, we propose RestGPT, a LLM-based framework connecting with RESTful APIs to handle complex instructions. RestGPT comprises three main modules: a Planner, an API Selector, and an Executor. The core of each module is prompting an LLM. Unlike prior work that uses static or ReAct style planning which lacks flexibility in realistic scenarios, RestGPT adopts an iterative coarse-to-fine online planning mechanism. Given a complicated instruction, the planner generates a sub-task for current task in the format of natural language. Subsequently, the API selector maps the coarse high-level sub-task to finer API calling plan, forming a coarse-to-fine task planning. The executor, responsible for invoking RESTful APIs and get execution results, is further divided it into two sub-modules: a Caller and a response Parser. The caller organizes API call parameters based on the API plan and API documentation, while the parser utilizes the response schema defined in OAS to generate Python code to parse responses. Once receiving the execution results of the API plan, the planner performs online planning for the subsequent sub-task in the next step. Through the integration of the three modules, our method RestGPT shows superior extensibility and flexibility in mastering RESTful APIs.

To evaluate the performance of RestGPT in utilizing RESTful APIs, we introduce RestBench, a human-annotated benchmark consisting of two realistic scenarios, TMDB movie database and Spotify music player. For each scenario, we collect diverse real-world user instructions that require the utilization of multiple APIs to complete. Based on the RestBench, we conduct comprehensive experiments to investigate the performance of RestGPT across different dimensions. The experimental results demonstrate that RestGPT exhibits robust capabilities in handling complex user instructions and has significant advantages in task planning, API understanding, and response parsing.

Our contributions can be summarized as follows:

  1. 1.

    For the first time, we attempt to connect large language models with RESTful APIs, enabling the resulting framework to be compatible with existing real-world applications while also providing powerful extensibility.

  2. 2.

    We propose RestGPT, a coarse-to-fine online planning framework that effectively handles the practical challenges associated with connecting LLMs with RESTful APIs, including API understanding, planning, and API response parsing.

  3. 3.

    To evaluate the performance of RestGPT, we build a human-annotated benchmark, RestBench, which comprises two practical scenarios. Experimental results show the capability of RestGPT to effectively utilize a number of RESTful APIs to accomplish complex instructions.

2 Background

2.1 Tool-Augmented Language Models

The emergence of recent powerful LLMs has enabled artificial intelligence systems to match human skills in utilizing tools [8, 9]. To enhance the performance of LLMs in accessing up-to-date information and carrying out precise mathematical reasoning, early work leverages simple tools like web search engines and calculators, such as ReAct [16], Toolformer [11], and ART [19]. Another line of research has focused on equipping LLMs to coordinate with external models for complex AI tasks, exemplified by HuggingGPT [13], ViperGPT [20], Visual ChatGPT [14] and Chameleon [12]. Recently, some work study how to enable open-sourced LLMs, such as LLaMa, to perform API usage [21, 15, 22]. Additionally, API-Bank [23] provides a systematic benchmark to showcase the efficacy of LLMs using tools to respond to human instructions.

Despite the notable advancements in incorporating tools for large language models, previous methods have exhibited certain limitations, most notably their restricted support for a limited number of specially designed APIs [12] and their inferior planning methods [9, 24, 12]. We compare RestGPT with other tool-augmented language models in Table 1. As shown, our work stands out by supporting for over 100 RESTful APIs. Furthermore, compared with most previous approaches adopt static offline planning which cannot interact with APIs and utilize feedback to adjust the plan, we employ a coarse-to-fine online planning framework with feedback, facilitating more flexible planning for complex instructions. Our work shares the similar spirit of AutoGPT, an autonomous agent capable of accomplishing complex tasks with numerous tools. While AutoGPT relies on developers to ensure compatibility with various applications, RestGPT can be integrated with any RESTful API-based applications in a plug-and-play fashion.

2.2 RESTful APIs

RESTful APIs have become a popular way to expose functionalities and data of web services to client applications [25, 17]. RESTful APIs also provide a standard for integrating external systems together with using a simple yet powerful interface. There are millions of RESTful APIs available on Internet, such as Spotify, Twitter, Gmail, etc. RESTful APIs are based on the REST architectural style, which emphasizes a client-server communication via stateless HTTP requests, including GET, POST, etc, where resources are identified by self-descriptive URIs [25]. The response of RESTful APIs are always structured in JSON format and contain various information. Thus, LLMs connected with RESTful APIs must possess a strong ability to extract the required information from the response.

OpenAPI Specification (OAS, or Swagger) [18], has been widely adopted as a standard for defining RESTful APIs. OAS is a structured documentation file which describes the endpoints, operations, parameters, response schemas, and other details of an API endpoint, providing a clear interface for our method to use the APIs.

3 RestGPT

Refer to caption
Figure 1: Overview of RestGPT. The planner, API selector, executor collaborate to form the coarse-to-fine online planning framework. The caller and response parser in the executor provides robust execution of the RESTful API calling plan.

3.1 RestGPT Architecture

As demonstrated in Figure 1, RestGPT is composed of three main modules: a Planner 𝒫\mathcal{P}, an API Selector 𝒮\mathcal{S} and an Executor \mathcal{E}. The planner decomposes each user instruction into several sub-tasks, while the API selector selects APIs to address each sub-task. The executor, consisting of a Caller and a response Parser, performs RESTful API calls and extracts useful information from the JSON response to form the execution result. The core of each component is an LLM with the corresponding prompt and in-context examples describing the function of the component.

One of the challenges in connecting LLMs with a vast number of APIs is to ensure that the framework is able to fully understand the API documents with a limited context window size of LLMs. As depicted in Figure 1, we designate different modules to read distinct parts of the OpenAPI Specification (OAS). This strategy allows us to leverage OAS information to its fullest potentials when working with RESTful APIs. Specifically, the API selector reads the endpoint descriptions of all APIs to select a proper API for solving the current sub-task. Then, the caller uses the detailed documents of the API within the API plan to generate the correct API calling parameters and request body. Lastly, the parser is developed to make use of the response schema within OAS to generate the parsing code for information extraction.

3.2 Coarse-to-fine Online Planning

To fully exploit the planning and decision making capabilities of LLMs and enable our method to dynamically adjust the plan to changing circumstances when accomplishing real-world user instructions, we propose a coarse-to-fine online planning mechanism in RestGPT.

The workflow of RestGPT can be characterized as an iterative “plan and execution” loop. During the planning stage, the planner and API selector collaborate to accomplish an instruction through iteratively decomposing it into suitable natural language sub-tasks and corresponding APIs. In each step tt, the planner 𝒫\mathcal{P} leverages commonsense knowledge to generate a natural language (NL) sub-task ptp_{t} based on the user instruction qq, previous NL plans (p1,,pt1)(p_{1},...,p_{t-1}), and execution results (r1,,rt1)(r_{1},...,r_{t-1}), thereby constructing a high-level NL plan. Then, the API selector 𝒮\mathcal{S} reads the descriptions of available API endpoints to select appropriate APIs and construct the finer API plan ata_{t}, which may contain a single or multiple API calls to solve the current NL plan ptp_{t}. Then the executor \mathcal{E} executes the API plan ata_{t} and gets the execution result rtr_{t} for current step. This process can be formulated as:

NL Plan: pt𝒫(q;p1,r1,pt1,rt1),\displaystyle p_{t}\leftarrow\mathcal{P}(q;p_{1},r_{1}...,p_{t-1},r_{t-1}), (1)
API Plan: at𝒮(pt;r1,,rt1),\displaystyle a_{t}\leftarrow\mathcal{S}(p_{t};r_{1},...,r_{t-1}),
Exec. Res.: rt(at;r1,,rt1).\displaystyle r_{t}\leftarrow\mathcal{E}(a_{t};r_{1},...,r_{t-1}).

In this way, the planner and API selector are dedicated to NL sub-task planning and API selection, respectively, effectively utilizing the large language model’s abilities of planning and text comprehension.

Alongside the “plan and execution” loop, we design two special states, “continual” and “end”, for the planner to monitor the execution result from the executor. Specifically, if the planner finds that the current executor’s output rtr_{t} has not completed the present NL sub-task ptp_{t}, it will output a “continue” signal and provide a special NL plan pt+1p_{t+1} to the API selector, instructing it to continue fulfilling the plan ptp_{t}. In such cases, the API selector will re-generate a new API plan based on the original NL plan ptp_{t}, new NL plan pt+1p_{t+1}, previous API plan ata_{t} and execution result rtr_{t}. This process is described as:

API Plan: at+1𝒮(pt,pt+1;r1,,rt1;at,rt),\displaystyle a_{t+1}\leftarrow\mathcal{S}(p_{t},p_{t+1};r_{1},...,r_{t-1};a_{t},r_{t}), (2)
Exec. Res.: rt+1(at+1;r1,,rt1,rt).\displaystyle r_{t+1}\leftarrow\mathcal{E}(a_{t+1};r_{1},...,r_{t-1},r_{t}).

If the planner assesses that the user’s request has been completed, it will give the termination signal “end” and output the final result. With such a design, our method achieves a more flexible online planning which is capable of handling various situations encountered in real-world scenarios.

The planner, API selector, and executor collaborate to form RestGPT’s coarse-to-fine online planning framework. This framework significantly enhances the ability to decompose tasks and select appropriate APIs, providing the model with the flexibility to effectively tackle user instructions.

3.3 API Plan Execution

Refer to caption
Figure 2: Example output of the caller.

Once an API calling plan is generated, the next step is to execute it. The executor \mathcal{E} consists of a caller and a response parser. The caller should read the API documents carefully and generate correct parameters or request body for the API call. Due to the constraints of maximum context length, we filter API documents and only preserve APIs appearing in current API plan ata_{t}. Given the generated parameters and request body, we use Requests Python library to call the RESTful API. Besides, to guide the response parser to extract information from the API response, the caller also generates a response description and output instruction for the response parser. Figure 2 presents an example output of the caller.

RESTful APIs typically return a JSON formatted response with much redundant information. The executor needs to extract the required information from the response and return it to the planner. However, the response may sometimes have a complex structure or be lengthy, making it difficult to extract important information via directly prompting the LLMs. To address this problem, we make use of the response schema defined in the OAS. Specifically, we utilize the coding capability of LLM to generate Python parsing code based on the provided schema and output instructions generated by the caller. Next, the Python code is executed to get the final result. If there are no execution exceptions or errors, the output is returned. Otherwise, the LLM is prompted to parse the response directly as a backup.

4 RestBench

To assess the effectiveness of RestGPT in processing complex user instructions through RESTful APIs, we introduce RestBench, a high-quality human annotated dataset comprising of two real-world scenarios. Existing researches have proposed several benchmarks for the evaluation of tool/API augmented LLMs [23, 21, 9]. However, these benchmarks primarily focus on simple tasks that can be accomplished using a single API. We hope RestBench can facilitate the exploration on utilizing multiple APIs to address real-world user instructions.

4.1 Scenarios and APIs

We select two common real-world scenarios: TMDB movie database and Spotify music player. The main consideration is to evaluate the capabilities of RestGPT: (1) augmenting LLMs with external specialized domain database via RESTful APIs; (2) connecting LLMs with RESTful APIs to autonomously control real-world applications. TMDB offers official RESTful APIs encompassing the information of movies, TVs, actors, and images. Spotify music player provides API endpoints to retrieve content metadata, receive recommendations, create and manage playlists, and control playback. For these two scenarios, we filter out 54 and 40 commonly used APIs respectively and obtain the corresponding OpenAPI Specifications to build RestBench.

4.2 Dataset Collection

Scenario Num. APIs Len. of Solution Path Avg. Len. Total
1 2 3 4
TMDB 54 5 66 27 2 2.3 100
Spotify 40 8 18 22 9 2.6 57
Table 2: Statistics of RestBench test set. We report the number of instructions with different lengths of solution path.
\triangleright TMDB
Instruction:
Who is the director of today’s most trending movie?
Gold Solution Path:
1. GET /trending/{media_type}/{time_window}
2. GET /movie/{movie_id}/credits
\triangleright Spotify
Instruction:
Make me a playlist containing three songs of Mariah Carey and name it ’Love Mariah’
Gold Solution Path:
1. GET /search
2. GET /me
3. POST /users/{user_id}/playlists
4. POST /playlists/{playlist_id}/tracks
Table 3: Example instructions and the corresponding gold solution paths of RestBench.

High-quality instructions generally satisfy two crucial aspects: (1) to reflect a wide range of real user needs; (2) to cover different levels of complexity to fully study the reasoning and planning ability of our method. To achieve these goals, we adopt a bottom-up instruction collection approach. We employ 6 experts that work on NLP research to brainstorm instructions for different combinations of APIs. Along with the instructions, the experts need to annotate the gold API solution path for each instruction. To guarantee the quality of the instructions, we employ two additional experts to thoroughly verify the solvability of each instruction and correctness of the corresponding solution path. Ultimately, we annotate 10 instruction-solution pairs for each scenario as the development set, and 100 pairs for TMDB and 57 pairs for Spotify as the test set. Though the data scale is not large, these instructions are typical of the frequently raised user requests. Moreover, different from prior work which uses LLMs to get API calling procedure, we utilize human labeled API solution paths for evaluation. Table 3 presents example instructions of the two scenarios. The statistics of RestBench are shown in Table 2.

4.3 Evaluation Metrics

Since some user requests are time-dependent (see the TMDB example in Table 3), it is impractical to annotate a fixed ground-truth answer for each instruction, whereas, the API solution paths for most instructions remain consistent. If the model-generated API call path contains the gold API call path as a subsequence (with the elements not necessarily being contiguous), we think that the model has generated a correct path. To further evaluate the model’s performance, we rely on human evaluation to determine if the model result successfully fulfills the user query. We calculate the proportion of correct paths and successful query completions as metrics, i.e., Correct Path Rate and Success Rate. Moreover, the number of actual API calls can be utilized to measure the planning efficiency of different methods. Given the length of gold solutions, we further define Δ\Delta Solution Len. as the mean number of additional API calls required to successfully execute an instruction:

ΔSolution Len.=1Nsi=0N(LrealiLgoldi)𝕀(i,success),\Delta\text{Solution Len.}=\frac{1}{N_{s}}\sum_{i=0}^{N}(L^{i}_{real}-L^{i}_{gold})\cdot\mathbb{I}(i,\text{success}),

where NsN_{s} is the number of successfully accomplished instructions, LrealiL^{i}_{real} and LgoldiL^{i}_{gold} are the actually and gold number of API calls for the ii-th instruction respectively, 𝕀(i,success)\mathbb{I}(i,\text{success}) denotes whether the ii-th instruction is successfully completed.

5 Experiments

5.1 Experimental Setup

We compare RestGPT with four recent baselines, including offline introspective method [9] used in HuggingGPT [13] and Chameleon [12], DEPS [7], ReAct [16] and Reflexion [26]. Since some methods are not originally designed for tool/API usage, we reproduce them and add the API executor proposed in Section 3.3 to make them able to call RESTful APIs. The maximum steps for DEPS is set to 10 and the maximum trials for Reflexion is set to 2.

To showcase the planning and API calling capabilities of our method, we implement two ablation variants of RestGPT. The first variant involves removing the planner and allowing the API selector to directly choose APIs in a ReAct style. This approach can be seen as ReAct equipped with our proposed executor. The second one is to replace the schema-based response parser with an LLM that directly reads and extracts the required information from the JSON response.

In our experiments, we employ text-davinci-003 from OpenAI as the LLM for RestGPT and all baselines. The decoding temperature is set to 0 for the most deterministic generation.

Model TMDB Spotify
Success% CP% Δ\Delta Solution Len. Success% CP% Δ\Delta Solution Len.
Offline [9] 29.0 33.0 +1.52 14.5 36.4 +1.10
DEPS [7] 38.0 43.0 +1.20 19.3 43.8 +1.74
ReAct [16] 44.0 57.0 +0.76 54.5 49.1 +0.31
Reflexion [26] 52.0 59.0 +1.37 59.6 61.4 +1.68
RestGPT 75.0 79.0 +0.55 72.7 74.5 +0.25
   w/o Planner 44.0 57.0 +0.76 54.5 49.1 +0.31
   w/o Parser 46.0 53.0 +0.60 47.3 52.7 +0.24
RestGPT (ChatGPT) 68.0 65.0 +0.72 69.1 72.3 +0.28
RestGPT (Llama2-13B) 0.0 0.0 - 0.0 0.0 -
RestGPT (Vicuna-13B) 9.0 15.0 +1.21 12.7 20.6 +1.52
Table 4: Success rate (%), Correct Path rate (CP, %), and Δ\Delta Solution Length on two scenarios of RestBench. The best results are in boldface. \dagger RestGPT w/o planner is equivalent with ReAct equipped with our proposed executor.

5.2 Main Results

Table 4 shows the performance of RestGPT and baselines on two scenarios. Our approach outperforms all other methods in both scenarios, achieving a success rate of 75% on the movie database and over 70% on the music player. Note that in most cases, the correct path rate is slightly higher than success rate, indicating that the method may generate correct API calling plan but fail to execute it. RestGPT also stands out with its minimal solution length, showcasing the superior planning ability of the coarse-to-fine online planning mechanism.

Ablation experiments on coarse-to-fine planning and schema-based parser show both mechanisms are conductive to the model performance. Particularly, when removing the planner, the performance degrades significantly, indicating that current LLMs are unable to simultaneously conduct planning, API understanding and selection. Thus, the coarse-to-fine planning mechanism plays a crucial role in our framework. The ablation results without parser demonstrates that the schema-based parser enables LLMs to better comprehend and parse the real-world API responses with complicated structure.

To investigate the performance of our method with different base LLMs, we implement RestGPT with ChatGPT (gpt-3.5-turbo-0301), Llama2-13B (Llama-2-13b-chat-hf), and Vicuna-13B (vicuna-13b-v1.5). As shown in Table 4, the performance of ChatGPT is slightly worse than text-davinci-003. Interestingly, we have tried all official checkpoints of Llama2-13B, but none of them were able to comprehend the prompt and generate valid plans. In contrast, Vicuna-13B, which is fine-tuned from Llama2 on user-shared conversations, can accomplish some simple instructions. This result indicates that by fine-tuning LLMs on ChatGPT-generated data, the model can acquire the ability to understand and follow complicate prompts.

Refer to caption
Figure 3: Error breakdown of RestGPT on RestBench. Error types are categorized by the module where the error occurred.
Refer to caption
Figure 4: Scaling ability of RestGPT. (a) (b) Scaling curves of the gold solution path on TMDB and Spotify. The length of gold API solution path indicates the complexity of the instruction. (c) Scaling curves of the number of APIs on TMDB scenario.

5.3 Error Analysis

To further investigate the effectiveness of different modules in RestGPT, we conduct error analysis. In Figure 3, we classify errors based on the module in which they occur. We discover that the majority of errors occur during the planning stage, i.e., within the planner (purple) and API selector (blue). The planner sometimes loses track of its intended objective after multiple rounds of execution, resulting in early exit before completing the instruction. For the API selector, it may either select incorrect APIs or hallucinate to make up in-path parameters. This error analysis highlights the insufficient planning and decision-making capabilities of LLMs.

Compared with text-davinci-003, ChatGPT tends to make more errors in the planning stage, leading to slightly worse performance on both scenarios. More specifically, we find that ChatGPT is often too verbose and tend to continue planning even after the user instruction has been fulfilled. This behavior can be attributed to the fact that ChatGPT is trained specifically for conversational interactions, which encourages it to generate more lengthy responses.

5.4 Scaling Curves

In this section, we aim to demonstrate the scaling ability of RestGPT on two dimensions: scaling the difficulty of the tasks and scaling the number of APIs.

For each instruction in RestBench, the length of gold solution path indicates the complexity of the instruction. We calculate the success rate of models on instructions with varying complexities. As depicted in Figure 4 (a) (b), the success rate of all methods decreases as the complexity of the instruction increases. Notably, when the gold path length is 4, all baselines struggle to complete the task in both scenarios. In contrast, our proposed RestGPT can still achieve a success rate of over 40%, showing its superior performance in planning and API calling.

Before conducting experiments on scaling the number of APIs, we handpicked 10 APIs from TMDB and created a small test set comprising 15 instructions. All 15 instructions can be resolved using the selected 10 APIs. Then, we increasingly expanded the number of APIs and introduced additional noise APIs sourced from the official TMDB APIs. The results are shown in Figure 4 (c). As the number of noise APIs increases, the performance of all baseline methods deteriorates due to their inferior planning and reasoning. However, our method almost remains unaffected. These results effectively demonstrate the strong extensibility of our proposed RestGPT.

5.5 Case Study

Refer to caption
Figure 5: Case study of three methods, (a) Offline [9, 13, 12], (b) ReAct [16], and (c) RestGPT. For offline method, we only show the generated plan. For ReAct and RestGPT, we omit the detailed execution process of the executor.

In Figure 5, we conduct a case study to compare the planning ability of RestGPT with the offline planning [9, 12] and ReAct [16] framework. Firstly, we observe the offline method is unable to solve most user instructions. As depicted in Figure 5 (a), the planner not only selects the wrong API (step 2), but also ignores the dependencies between APIs and used the parameter “user_id” before obtaining it (step 4). Regarding ReAct which generates chain-of-thought and actions in an interleaved manner, we find that current LLMs have a limited ability to simultaneously conduct planning, API understanding and selection. As shown in Figure 5 (b), the planner of ReAct generates a sub-task that is difficult to solve (step 2) and also ignores the dependencies between different APIs (step 3). Due to the inferior planning, it consumes 6 API calls to complete the task. In contrast, RestGPT employs a planner to generate high-level NL sub-tasks and an API selector to choose appropriate APIs to solve the sub-task. Notably, in step 3, the planner assesses the playlist that has not been successfully created and generate "continue" signal with further instructions for the API selector. Our method accomplishes the instruction with only 4 API calls. The coarse-to-fine online planning framework of RestGPT fully exploits the LLMs’ planning and document understanding capabilities, providing the model with the flexibility to tackle complex user requests.

6 Conclusion

In this paper, we explore the scenarios of connecting current large language models (LLMs) with real-world applications via RESTful APIs. To overcome the limitations of existing approaches and tackle the challenges in integrating LLMs with RESTful APIs, we propose RestGPT, an approach that leverages LLMs to complete complex user instructions. Our method features a coarse-to-fine online planning mechanism to enable more flexible planning and API selection. Furthermore, to handle the complex scenario of calling RESTful APIs, we designed a specialized API executor to formulate parameters and parse API responses. To assess the performance of our method, we build a high-quality dataset, RestBench, which consists of human-annotated instructions from two realistic scenarios. Extensive experiments demonstrate that RestGPT achieves impressive results in complex tasks and exhibits strong robustness, which paves a new way towards AGI. In the future, we aim to delve into a broader range of intricate tasks, thoroughly examining the immense potential of RestGPT across both academic and industrial domains.

References

  • Brown et al. [2020] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  • OpenAI [2022] OpenAI. Chatgpt, 2022. URL https://openai.com/blog/chatgpt.
  • Dong et al. [2022] Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Zhiyong Wu, Baobao Chang, Xu Sun, Jingjing Xu, and Zhifang Sui. A survey for in-context learning. arXiv preprint arXiv:2301.00234, 2022.
  • Wei et al. [2022a] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Ed Chi, Quoc Le, and Denny Zhou. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022a.
  • Wei et al. [2022b] Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, et al. Emergent abilities of large language models. arXiv preprint arXiv:2206.07682, 2022b.
  • Huang et al. [2022] Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR, 2022.
  • Wang et al. [2023a] Zihao Wang, Shaofei Cai, Anji Liu, Xiaojian Ma, and Yitao Liang. Describe, explain, plan and select: Interactive planning with large language models enables open-world multi-task agents. arXiv preprint arXiv:2302.01560, 2023a.
  • Mialon et al. [2023] Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru, Roberta Raileanu, Baptiste Rozière, Timo Schick, Jane Dwivedi-Yu, Asli Celikyilmaz, et al. Augmented language models: a survey. arXiv preprint arXiv:2302.07842, 2023.
  • Qin et al. [2023] Yujia Qin, Shengding Hu, Yankai Lin, Weize Chen, Ning Ding, Ganqu Cui, Zheni Zeng, Yufei Huang, Chaojun Xiao, Chi Han, et al. Tool learning with foundation models. arXiv preprint arXiv:2304.08354, 2023.
  • Parisi et al. [2022] Aaron Parisi, Yao Zhao, and Noah Fiedel. Talm: Tool augmented language models. arXiv preprint arXiv:2205.12255, 2022.
  • Schick et al. [2023] Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. arXiv preprint arXiv:2302.04761, 2023.
  • Lu et al. [2023] Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, and Jianfeng Gao. Chameleon: Plug-and-play compositional reasoning with large language models. arXiv preprint arXiv:2304.09842, 2023.
  • Shen et al. [2023] Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in huggingface. arXiv preprint arXiv:2303.17580, 2023.
  • Wu et al. [2023] Chenfei Wu, Shengming Yin, Weizhen Qi, Xiaodong Wang, Zecheng Tang, and Nan Duan. Visual chatgpt: Talking, drawing and editing with visual foundation models. arXiv preprint arXiv:2303.04671, 2023.
  • Yang et al. [2023] Rui Yang, Lin Song, Yanwei Li, Sijie Zhao, Yixiao Ge, Xiu Li, and Ying Shan. Gpt4tools: Teaching large language model to use tools via self-instruction. arXiv preprint arXiv:2305.18752, 2023.
  • Yao et al. [2022] Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
  • Li et al. [2016] Li Li, Wu Chou, Wei Zhou, and Min Luo. Design patterns and extensibility of rest api for networking applications. IEEE Transactions on Network and Service Management, 13(1):154–167, 2016.
  • SmartBear [2023] SmartBear. Swagger, 2023. URL https://swagger.io/.
  • Paranjape et al. [2023] Bhargavi Paranjape, Scott Lundberg, Sameer Singh, Hannaneh Hajishirzi, Luke Zettlemoyer, and Marco Tulio Ribeiro. Art: Automatic multi-step reasoning and tool-use for large language models. arXiv preprint arXiv:2303.09014, 2023.
  • Surís et al. [2023] Dídac Surís, Sachit Menon, and Carl Vondrick. Vipergpt: Visual inference via python execution for reasoning. arXiv preprint arXiv:2303.08128, 2023.
  • Patil et al. [2023] Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large language model connected with massive apis. arXiv preprint arXiv:2305.15334, 2023.
  • Tang et al. [2023] Qiaoyu Tang, Ziliang Deng, Hongyu Lin, Xianpei Han, Qiao Liang, and Le Sun. Toolalpaca: Generalized tool learning for language models with 3000 simulated cases. arXiv preprint arXiv:2306.05301, 2023.
  • Li et al. [2023] Minghao Li, Feifan Song, Bowen Yu, Haiyang Yu, Zhoujun Li, Fei Huang, and Yongbin Li. Api-bank: A benchmark for tool-augmented llms. arXiv preprint arXiv:2304.08244, 2023.
  • Wang et al. [2023b] Peiyi Wang, Lei Li, Liang Chen, Dawei Zhu, Binghuai Lin, Yunbo Cao, Qi Liu, Tianyu Liu, and Zhifang Sui. Large language models are not fair evaluators. arXiv preprint arXiv:2305.17926, 2023b.
  • Masse [2011] Mark Masse. REST API design rulebook: designing consistent RESTful web service interfaces. " O’Reilly Media, Inc.", 2011.
  • Shinn et al. [2023] Noah Shinn, Federico Cassano, Beck Labash, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. arXiv preprint arXiv:2303.11366, 2023.

Appendix A RESTful APIs and OAS

RESTful API (Representational State Transfer API) is an architectural style used for designing networked applications. It is based on the principles of Representational State Transfer (REST) and is widely used for building web services.[25, 17]. In a RESTful API, resources (such as data objects or services) are represented as URLs (Uniform Resource Locators), known as endpoints. These endpoints are accessed over the HTTP protocol, and different HTTP methods (GET, POST, etc.) are used to perform operations on the resources. There are millions of RESTful APIs available on Internet, such as Spotify, Twitter, Gmail, etc.

The OpenAPI Specification (OAS), formerly known as Swagger, is a specification for defining and documenting RESTful APIs [18]. It provides a standardized way to describe the structure, functionality, and behavior of an API, making it easier for developers to understand and interact with the API. The OpenAPI Specification is written in JSON or YAML format and consists of a set of rules and conventions that define the endpoints, request/response formats, parameters, authentication methods, and other details of the API. More specifically, an OAS consists of the following aspects for each API endpoint:

  • API Path: a relative path to an individual API endpoint, e.g., /{person_id}/details.

  • API Description: what the API does, how it works, and any potential errors or exceptions that may be raised.

  • Request Method: the desired action to be performed for the API, e.g., GET, POST, DELETE.

  • Parameter List: parameter name, parameter description, data type, default value, optional values of each parameter for the API.

  • Response Schema: the schema of the response of the API. This information can assist the response parser to extract useful information from the JSON response.

  • Response Example (Optional): an example of a API call which can help demonstrate what the API will response.

  • Error and Exception: potential error codes and their corresponding descriptions.

We provide an example of an OAS description of an API endpoint in Figure 6 and Figure 7.

Refer to caption
Figure 6: A RESTful API from TMDB.
Refer to caption
Figure 7: The OpenAPI Specification (OAS) of the API endpoint in Figure 6.
Refer to caption
Figure 8: Case study on the response parser. The purpose of response parsing is to extract required information from the API response (b) according to the plan (a). We compare our proposed parser (d) with directly prompting an LLM as a parser (c).

Appendix B Baselines

To demonstrate the effectiveness of RestGPT, we compare it with four recent baselines:

Offline introspective method [9] is widely used in work on tool-augmented LLMs, such as HuggingGPT [13] and Chameleon [12]. This method directly generates multi-step plans for tool use and then sequentially executes the API plan. Due to the absence of API feedback, offline method cannot adjust the plan to adapt the unforeseen situations and errors in complex scenarios.

DEPS [7], or “Describe, Explain, Plan and Select”, is an interactive planning approach based on LLMs to improve the long-term planning ability on open-world problems. DEPS will iteratively update the plan based on the environment feedback until the task is finished.

ReAct [16] generates the chain-of-thought and actions in an interleaved manner. The LLMs will reason about the current situation to make a better subsequent plan.

Reflexion [26] is inspired from the “trial and error” of human learning process. It verbally reflect on task feedback signals, then maintain the reflective text in an episodic memory to induce better planning in subsequent trials.

Since some methods are not originally designed for tool/API usage, we reproduce them and add the API executor we proposed to make them able to call RESTful APIs.

Appendix C Case Study on the Executor

Figure 8 illustrates how RestGPT parses JSON responses of RESTful APIs. We compare RestGPT parser with directly prompting an LLM as a parser. As shown, responses from RESTful APIs are usually in lengthy and complex JSON format (694 lines in Figure 8 (b)) The RestGPT parser generates information extraction code based on the response schema in the OAS to avoid parsing lengthy JSON files directly, significantly improving the efficiency and accuracy of response parsing.

Appendix D More Cases on RestBench

In this section, we show more cases of RestGPT accomplishing complex instructions in two scenarios of RestBench. Table 5 and 6 show two cases of the TMDB scenario. Table 7 and 8 show two cases of the Spotify scenario.

\triangleright Instruction of TMDB Give me the number of movies directed by Sofia Coppola. \triangleright RestGPT Planner: search for the director “Sofia Coppola”

API Selector: GET /search/person to search for the director “Sofia Coppola”

Caller:
I need to make a GET request to the /search/person endpoint with the query parameter set to “Sofia Coppola”.
Operation: GET
Input: {
xxxx“url”: “https://api.themoviedb.org/3/search/person”,
xxxx“params”: {
xxxxxxxx“query”: “Sofia Coppola”
xxxx},
xxxx“description”: “The API response is a list of people matching the query.”,
xxxx“output_instructions”: “What is the id of Sofia Coppola?”
}

Parser: The id of the person is 1769

Caller: Successfully call GET /search/person to search for the director “Sofia Coppola”. The id of Sofia Coppola is 1769.

Planner: Get the number of movies directed by Sofia Coppola (1769)

API Selector: GET /person/1769/movie_credits to get the number of movies directed by Sofia Coppola (id 1769)

Caller:
I need to call GET /person/1769/movie_credits to get the number of movies directed by Sofia Coppola (id 1769).
Operation: GET
Input: {
xxxx“url”: “https://api.themoviedb.org/3/person/1769/movie_credits”,
xxxx“description”: “The API response is the movie credit list of Sofia Coppola (id 1769)”,
xxxx“output_instructions”: “What are the names and ids of the movies directed by this person?”
}

Parser: The movies directed by Sofia Coppola are Lost in Translation (153), The Virgin Suicides (1443), Marie Antoinette (1887), Somewhere (39210), Lick the Star (92657), The Bling Ring (96936), A Very Murray Christmas (364067), Bed, Bath and Beyond (384947), The Beguiled (399019), On the Rocks (575417), The Custom of the Country (758422), New York City Ballet (826970), Priscilla (1020006), Un matin partout dans le monde (835318)

Caller: Successfully call GET /person/1769/movie_credits to get the movie credit list of Sofia Coppola (id 1769). The movies directed by Sofia Coppola are Lost in Translation (153), The Virgin Suicides (1443), Marie Antoinette (1887), Somewhere (39210), Lick the Star (92657), The Bling Ring (96936), A Very Murray Christmas (364067), Bed, Bath and Beyond (384947), The Beguiled (399019), On the Rocks (575417), The Custom of the Country (758422), New York City Ballet (826970), Priscilla (1020006), Un matin partout dans le monde (835318).

Planner: I am finished executing a plan and have the information the user asked for or the data the used asked to create. Final Answer: Sofia Coppola has directed 14 movies.

Table 5: Case 1 for TMDB.

\triangleright Instruction of TMDB Who is the director of Leonardo DiCaprio’s latest movie? \triangleright RestGPT Planner: search for the latest movie of Leonardo DiCaprio
API Selector: GET /search/person to get the id of Leonardo DiCaprio
Caller:
I need to make a GET request to the /search/person endpoint with the query parameter set to “Leonardo DiCaprio”
Operation: GET
Input: {
xxxx“url”: “https://api.themoviedb.org/3/search/person”,
xxxx“params”: {“query”: “Leonardo DiCaprio”},
xxxx“description”: “The API response is a list of people.”,
xxxx“output_instructions”: “What is the id of Leonardo DiCaprio?”
}
Parser: The id of the person is 6193
Caller: Successfully call GET /search/person to search for the person “Leonardo DiCaprio”. The id of Leonardo DiCaprio is 6193.

Planner: Continue. search for the latest movie of Leonardo DiCaprio (6193)
API Selector: GET /person/6193/movie_credits to get the latest movie of Leonardo DiCaprio (6193)
Caller:
I need to call GET /person/6193/movie_credits to get the latest movie of Leonardo DiCaprio (6193).
Operation: GET
Input: {
xxxx“url”: “https://api.themoviedb.org/3/person/6193/movie_credits”,
xxxx“description”: “The API response is the movie credit list of Leonardo DiCaprio (id 6193)”,
xxxx“output_instructions”: “What is the name and id of the latest movie of Leonardo DiCaprio (6193)?”
}
Parser: The name of the latest movie of Leonardo DiCaprio is Killers of the Flower Moon and the id is 466420
Caller: Successfully call GET /person/6193/movie_credits to get the movie credit list of Leonardo DiCaprio (id 6193). The name of the latest movie of Leonardo DiCaprio is Killers of the Flower Moon and the id is 466420.

Planner: search for the director of Killers of the Flower Moon (466420)
Caller:
I need to call GET /movie/{movie_id}/credits to get the director of the movie Killers of the Flower Moon (id 466420).
Operation: GET
Input: {
xxxx“url”: “https://api.themoviedb.org/3/movie/466420/credits”,
xxxx“description”: “The API response is the cast and crew for the movie Killers of the Flower Moon (id 466420).”,
xxxx“output_instructions”: “What is the name and id of the director of the movie Killers of the Flower Moon (id 466420)?”
}
Parser: The name of the director of the movie Killers of the Flower Moon (id 466420) is Martin Scorsese (1032)
Caller: Successfully call GET /movie/466420/credits to get the director of the movie Killers of the Flower Moon (id 466420). The name and id of the director of the movie Killers of the Flower Moon (id 466420) is Martin Scorsese (1032).

Planner: I am finished executing a plan and have the information the user asked for or the data the used asked to create. Martin Scorsese (1032) is the director of Leonardo DiCaprio’s latest movie Killers of the Flower Moon (466420).

Table 6: Case 2 for TMDB.

\triangleright Instruction of Spotify Set the volume to 60 and skip to the next track. \triangleright RestGPT Planner: set the volume to 60

API Selector: PUT /me/player/volume to set the volume to 60

Caller:
I need to make a PUT request to the /me/player/volume endpoint with the volume_percent parameter set to 60.
Operation: PUT
Input: {
xxxx“url”: “https://api.spotify.com/v1/me/player/volume”,
xxxx“params”: {
xxxxxxxx“volume_percent”: “60”
xxxx},
xxxx“description”: “Set the volume for the current playback device.”
}

Parser: The volume for the current playback device has been set to 60%.

Caller: Successfully call PUT /me/player/volume to set the volume to 60. The volume for the current playback device has been set to 60%.

Planner: skip to the next track

API Selector: POST /me/player/next to skip to the next track

Caller:
I need to make a POST request to the /me/player/next endpoint to skip to the next track.
Operation: POST
Input: {
xxxx“url”: “https://api.spotify.com/v1/me/player/next”,
xxxx“description”: “Skips to next track in the user’s queue.”,
}

Parser: The API call successfully skipped to the next track in the user’s queue.

Caller: Successfully call POST /me/player/next to skip to the next track.

Planner: I am finished executing a plan and completed the user’s instructions. I have set the volume to 60 and skipped to the next track.

Table 7: Case 1 for Spotify.

\triangleright Instruction of Spotify Add Summertime Sadness by Lana Del Rey in my first playlist. \triangleright RestGPT Planner: Get the id of my first playlist
API Selector: GET /me/playlists to get the list of playlists owned or followed by the current Spotify user
Caller:
I need to use the GET method and generate the parameters according to the API documentation.
Operation: GET
Input: {
xxxx“url”: “https://api.spotify.com/v1/me/playlists”,
xxxx“description”: “Get a list of the playlists owned or followed by the current Spotify user.”,
xxxx“params”: {“limit”: “1”, “offset”: “0”},
xxxx“output_instructions”: “What are the ids and names of the playlists?”
}
Parser: The id and name of the first playlist in the list is 6GHT and My Pop
Caller: Successfully call GET /me/playlists to get the list of playlists owned or followed by the current Spotify user. The id and name of the first playlist in the list is 6GHT and My Pop.

Planner: Search for Summertime Sadness by Lana Del Rey
API Selector: GET /search to search for the track “Summertime Sadness” by Lana Del Rey
Caller:
I need to call GET /search with the appropriate parameters to search for the track “Summertime Sadness” by Lana Del Rey.
Operation: GET
Input: {
xxxx“url”: “https://api.spotify.com/v1/search”,
xxxx“params”: {“q”: “track:Summertime Sadness artist:Lana Del Rey”, “type”: “track”, “limit”: “1”, “offset”: “0”},
xxxx“description”: “The API response is a track object for the track ’Summertime Sadness’ by Lana Del Rey.”,
xxxx“output_instructions”: “What is the id of the track?”
}
Parser: The id of the track is 1Ist
Caller: Successfully call GET /search to search for the track “Summertime Sadness” by Lana Del Rey. The id of the track is 1Ist.

Planner: Add Summertime Sadness by Lana Del Rey (1Ist) in my first playlist (6GHT)
Caller:
I need to make a POST request to add the track to the playlist.
Operation: POST
Input: {
xxxx“url”: “https://api.spotify.com/v1/playlists/6GHT/tracks”,
xxxx“data”: {“uris”: [“spotify:track:1Ist”]},
xxxx“description”: “The API response is the track added to the playlist.”,
xxxx“output_instructions”: “What is the id of the snapshot of the playlist?”
}
Parser: The snapshot id of the playlist is “NCxj”.
Caller: Successfully call POST /playlists/6GHT/tracks to add the track “Summertime Sadness” (1Ist) to the playlist “My Pop” (6GHT).
Planner: I am finished executing a plan and have the data the used asked to create. I have added Summertime Sadness by Lana Del Rey in your first playlist My Pop.

Table 8: Case 2 for Spotify. Note that all IDs provided here are fake due to copyright restrictions.

Appendix E Failure Cases

We also present two failure cases of RestGPT. In the first case (Table 9), the API selector hallucinate to make up the in-path parameter tv_id for the API. The second failure case (Table 10) involved multiple issues. Firstly, the executor correctly provided the desired query result, while the planner continued to generate a redundant plan. Additionally, the response parser generated incorrect parsing code and got the name of the album instead of the track. As a result, RestGPT returned an incorrect answer. These failure cases highlight the limitations of RestGPT and the need for further improvement.

\triangleright Instruction of TMDB Give me a photo belong to the second episode of the first season of the Witcher \triangleright RestGPT Planner: search for the id of the second episode of the first season of the Witcher

API Selector: GET /tv/63926/season/1/episode/2 to get the id of the second episode of the first season of the Witcher

Caller:
Operation: GET
Input: {
xxxx“url”: “https://api.themoviedb.org/3/tv/63926/season/1/episode/2”,
xxxx“description”: “Get the id of the second episode of the first season of the Witcher”,
xxxx“output_instructions”: “What is the id of the second episode of the first season of the Witcher?”
}

Parser: The id of the second episode of the first season of the Witcher is 1132654

Caller: Successfully call GET /tv/63926/season/1/episode/2 to get the id of the second episode of the first season of the Witcher. The id of the second episode of the first season of the Witcher is 1132654.

Planner: search for the photo belong to the second episode of the first season of the Witcher (1132654)

…(Finally Failed)

Table 9: A failure case for TMDB.

\triangleright Instruction of Spotify What is the name of the song I playing right now? \triangleright RestGPT Planner: Get the current playing song

API Selector: GET /me/player/currently-playing to get the track currently being played on the user’s Spotify account

Caller:
Operation: GET
Input: {
xxxx“url”: “https://api.spotify.com/v1/me/player/currently-playing”,
xxxx“params”: {
xxxxxxxx“market”: “US”
xxxx},
xxxx“description”: “Get the track currently being played on the user’s Spotify account.”,
xxxx“output_instructions”: “What is the name and id of the track currently being played?”
}

Parser: The name of the track currently being played is LOVE Destiny , and its id is 5gAC

Caller: Successfully call GET /me/player/currently-playing to get the track currently being played on the user’s Spotify account. The name of the track currently being played is LOVE Destiny , and its id is 5gAC.

Planner: Get the name of the song (5gAC)

API Selector: GET /tracks/5gAC to get the name of the song

Caller:
I need to call GET /tracks/{id} to get the name of the song
Operation: GET
Input: {
xxxx“url”: “https://api.spotify.com/v1/tracks/5gAC”,
xxxx“description”: “Get Spotify catalog information for the track LOVE Destiny ”,
xxxx“output_instructions”: “What is the name of the song?”
}

Parser: The name of the song is A BEST

Caller: Successfully call GET /tracks/5gACkVtLdO8ITqbEF3ZfD1 to get the name of the song. The name of the song is A BEST.

Planner: I am finished executing a plan and have the information the user asked for. The name of the song you are playing right now is A BEST.

Table 10: A failure case for Spotify. Note that all IDs provided here are fake due to copyright restrictions.

Appendix F Implementation Details

In this section, we show the details of the prompt design in RestGPT. Some prompts are inspired by the OpenAPI agent implemented by LangChain222https://python.langchain.com/. The prompts of the planner, API selector, caller, response parser are shown in Table 11, 12, 13, 14, respectively.

\triangleright Prompt for the planner in RestGPT You are an agent that plans solution to user queries.
You should always give your plan in natural language.
Another model will receive your plan and find the right API calls and give you the result in natural language.
If you assess that the current plan has not been fulfilled, you can output "Continue" to let the API selector select another API to fulfill the plan.
If you think you have got the final answer or the user query has been fulfilled, just output the answer immediately. If the query has not been fulfilled, you should continue to output your plan.
The plan should be as specific as possible. It is better not to use pronouns in plan, but to use the corresponding results obtained previously. If you want to iteratively query something about items in a list, then the list and the elements in the list should also appear in your plan. The plan should be straightforward. If you want to search, sort or filter, you can put the condition in your plan.

Starting below, you should follow this format:

User query: the query a User wants help with related to the API.
Plan step 1: the first step of your plan for how to solve the query
API response: the result of executing the first step of your plan, including the specific API call made.
Plan step 2: based on the API response, the second step of your plan for how to solve the query. If the last step result is not what you want, you can output "Continue" to let the API selector select another API to fulfill the plan. For example, the last plan is "add a song (id xxx) in my playlist", but the last step API response is calling "GET /me/playlists" and getting the id of my playlist, then you should output "Continue" to let the API selector select another API to add the song to my playlist. Pay attention to the specific API called in the last step API response. If a inproper API is called, then the response may be wrong and you should give a new plan.
API response: the result of executing the second step of your plan
… (this Plan step n and API response can repeat N times)
Thought: I am finished executing a plan and have the information the user asked for or the data the used asked to create
Final Answer: the final output from executing the plan

Example:
{in-context examples}

Begin!

User query: {query}
Plan step 1:

Table 11: The prompt of the planner.

\triangleright Prompt for the API selector in RestGPT You are a planner that plans a sequence of RESTful API calls to assist with user queries against an API.
Another API caller will receive your plan call the corresponding APIs and finally give you the result in natural language.
The API caller also has filtering, sorting functions to post-process the response of APIs. Therefore, if you think the API response should be post-processed, just tell the API caller to do so.
If you think you have got the final answer, do not make other API calls and just output the answer immediately. For example, the query is search for a person, you should just return the id and name of the person.

—-

Here are name and description of available APIs.
Do not use APIs that are not listed here.

endpoints

—-

Starting below, you should follow this format:

Background: background information which you can use to execute the plan, e.g., the id of a person, the id of tracks by Faye Wong. In most cases, you must use the background information instead of requesting these information again.
User query: the query a User wants help with related to the API
API calling 1: the first api call you want to make. Note the API calling can contain conditions such as filtering, sorting, etc. If user query contains some filter condition, such as the latest, the most popular, the highest rated, then the API calling plan should also contain the filter condition. If you think there is no need to call an API, output "No API call needed." and then output the final answer according to the user query and background information.
API response: the response of API calling 1
Instruction: Another model will evaluate whether the user query has been fulfilled. If the instruction contains "continue", then you should make another API call following this instruction.
… (this API calling n and API response can repeat N times, but most queries can be solved in 1-2 step)

Examples:

{icl_examples}

Note, if the API path contains "{}", it means that it is a variable and you should replace it with the appropriate value. In most cases, the id value is in the background or the API response. Just copy the id faithfully. If the id is not in the background, instead of creating one, call other APIs to query the id.

Begin!

Background: {background}
User query: {plan}
API calling 1:

Table 12: The prompt of the API selector.

\triangleright Prompt for the caller in RestGPT You are an agent that gets a sequence of API calls and given their documentation, should execute them and return the final response.
If you cannot complete them and run into issues, you should explain the issue. If you’re able to resolve an API call, you can retry the API call. When interacting with API objects, you should extract ids for inputs to other API calls but ids and names for outputs returned to the User.
Your task is to complete the corresponding api calls according to the plan.

Here is documentation of the API:
Base url: {api_url}
Endpoints:
{api_docs}

If the API path contains "{}", it means that it is a variable and you should replace it with the appropriate value. For example, if the path is "/users/{user_id}/tweets", you should replace "{user_id}" with the user id. "{" and "}" cannot appear in the url.

You can use http request method, i.e., GET, POST, DELETE, PATCH, PUT, and generate the corresponding parameters according to the API documentation and the plan.
The input should be a JSON string which has 3 base keys: url, description, output_instructions
The value of "url" should be a string.
The value of "description" should describe what the API response is about. The description should be specific.
The value of "output_instructions" should be instructions on what information to extract from the response, for example the id(s) for a resource(s) that the POST request creates. Note "output_instructions" must be natural language and as verbose as possible! It cannot be "return the full response". Output instructions should faithfully contain the contents of the api calling plan and be as specific as possible. The output instructions can also contain conditions such as filtering, sorting, etc.
If you are using GET method, add "params" key, and the value of "params" should be a dict of key-value pairs.
If you are using POST, PATCH or PUT methods, add "data" key, and the value of "data" should be a dict of key-value pairs.

Examples: {icl_examples}

I will give you the background information and the plan you should execute.
You should execute the plan faithfully and give the Final Answer as soon as you successfully call the planned APIs, don’t get clever and make up steps that don’t exist in the plan. Do not make up APIs that don’t exist in the plan.

Starting below, you must follow this format:

Background: background information which you can use to execute the plan, e.g., the id of a person.
Plan: the plan of API calls to execute
Thought: you should always think about what to do
Operation: the request method to take, should be one of the following: GET, POST, DELETE, PATCH, PUT
Input: the input to the operation
Response: the output of the operation
Thought: I am finished executing the plan
Execution Result: based on the API response, the execution result of the API calling plan.

Begin!

Background: {background}
Plan: {api_plan}
Thought:

Table 13: The prompt of the caller.

\triangleright Prompt for the parser in RestGPT Here is an API response schema from an OAS and a query.
The API’s response will follow the schema and be a JSON.
Assume you are given a JSON response which is stored in a python dict variable called ’data’, your task is to generate Python code to extract information I need from the API response.
Note: I will give you ’data’, do not make up one, just reference it in your code.
Please print the final result as brief as possible. If the result is a list, just print it in one sentence. Do not print each item in a new line.
Note you should generate only Python code.
DO NOT use fields that are not in the response schema.

API: {api_path}
API description: {api_description}
Parameters or body for this API call:
{api_param}

Response JSON schema defined in the OAS:
{response_schema}

The response is about: {response_description}

Query: {query}

The code you generate should satisfy the following requirements:
1. The code you generate should contain the filter in the query.
2. If the response is something about X, then the filter condition cannot include searching for X.
3. Do not use f-string in the print function. Use "format" instead.
4. Please print the final result as brief as possible. If the result is a list, just print it in one sentence. Do not print each item in a new line.

Begin!
Python Code:

Table 14: The prompt of the parser.