AutoPatent: A Multi-Agent Framework for Automatic Patent Generation

Qiyao Wang^1,2, Shiwen Ni¹¹¹footnotemark: 1, Huaren Liu², Shule Lu², Guhong Chen^1,3,
Xi Feng¹, Chi Wei¹, Qiang Qu¹, Hamid Alinejad-Rokny⁵, Yuan Lin², Min Yang^1,4²²footnotemark: 2
¹Shenzhen Key Laboratory for High Performance Data Mining,
Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences
²Dalian University of Technology, ³Southern University of Science and Technology
⁴Shenzhen University of Advanced Technology, ⁵The University of New South Wales
[email protected], [email protected], {sw.ni, min.yang}@siat.ac.cn Equal ContributionMin Yang and Yuan Lin are corresponding authors.

Abstract

As the capabilities of Large Language Models (LLMs) continue to advance, the field of patent processing has garnered increased attention within the natural language processing community. However, the majority of research has been concentrated on classification tasks, such as patent categorization and examination, or on short text generation tasks like patent summarization and patent quizzes. In this paper, we introduce a novel and practical task known as Draft2Patent, along with its corresponding D2P benchmark, which challenges LLMs to generate full-length patents averaging 17K tokens based on initial drafts. Patents present a significant challenge to LLMs due to their specialized nature, standardized terminology, and extensive length. We propose a multi-agent framework called AutoPatent which leverages the LLM-based planner agent, writer agents, and examiner agent with PGTree and RRAG to generate lengthy, intricate, and high-quality complete patent documents. The experimental results demonstrate that our AutoPatent framework significantly enhances the ability to generate comprehensive patents across various LLMs. Furthermore, we have discovered that patents generated solely with the AutoPatent framework based on the Qwen2.5-7B model outperform those produced by larger and more powerful LLMs, such as GPT-4o, Qwen2.5-72B, and LLAMA3.1-70B, in both objective metrics and human evaluations. We will make the data and code available upon acceptance¹¹1https://github.com/QiYao-Wang/AutoPatent.

Qiyao Wang^1,2^†^†thanks: Equal Contribution, Shiwen Ni¹¹¹footnotemark: 1, Huaren Liu², Shule Lu², Guhong Chen^1,3, Xi Feng¹, Chi Wei¹, Qiang Qu¹, Hamid Alinejad-Rokny⁵, Yuan Lin²^†^†thanks: Min Yang and Yuan Lin are corresponding authors., Min Yang^1,4²²footnotemark: 2 ¹Shenzhen Key Laboratory for High Performance Data Mining, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences ²Dalian University of Technology, ³Southern University of Science and Technology ⁴Shenzhen University of Advanced Technology, ⁵The University of New South Wales [email protected], [email protected], {sw.ni, min.yang}@siat.ac.cn

1 Introduction

Refer to caption — Figure 1: Draft2Patent Task. Automating patent drafting by simulating real-world scenarios.

As a representative of Intellectual Property (IP), a patent is an exclusive right granted for an invention, which can benefit inventors by providing them with legal protection of their inventions²²2https://www.wipo.int/web/patents. The inventor should draft a patent and submit it to a national or regional intellectual property (IP) office, such as the United States Patent and Trademark Office (USPTO) or the European Patent Office (EPO), to obtain a patent grant (Toole et al., 2020). The patent will be examined by a patent examiner for patentability. The examiner decides whether the proposed invention is useful, non-obvious, and statutory, and searches for prior arts within the technology field of the invention to confirm whether it is novel (USTPO, 2020; EPO, 1994). Therefore, inventors should draft a detailed, embodied patent and maximize the legal scope of protection for the invention without infringing on other patents.

A patent typically consists of a title, abstract, background, summary, detailed description and claims (World Intellectual Property Organization, 2022). The drafting of patent is usually carried out by a human patent agent who is familiar with patent law and has passed the patent bar exam. Patent covers various technical fields, which requires human patent agents to possess a broad knowledge base. However, this process is still entirely conducted manually which results in high labor and time costs and lowering efficiency.

We introduce a novel real-world task named Draft2Patent for converting an inventor’s draft into a complete patent, as shown in Figure 1. We construct a challenging benchmark named D2P for this task, which contains 1,933 draft-patent pairs and other patent metadata. The source patents are derived from the HUPD dataset (Suzgun et al., 2023), and we generate drafts meeting the quality requirements by interacting with GPT-4o-mini (Achiam et al., 2023) using five specific questions. There are mainly two challenges: the average length of drafts and patents in D2P are exceed 4K and 17K tokens, and a patent must ensure its content is both patentable and compliant with technical and legal standards.

With the development of Large Language Models (LLMs), LLM-based agents have demonstrated their powerful capabilities in understanding, planning, memory, rethinking, and action (Cheng et al., 2024) in many knowledge-intensive domains, such as biomedicine (Gao et al., 2024; Kim et al., 2024), finance (Yu et al., 2023; Zhang et al., 2024; Yang, 2024), education (Ni and Yang, 2024) and law (Cui et al., 2023; Sun et al., 2024a; Chen et al., 2024a; Sun et al., 2024b). LLM-based agents can meet the knowledge demands of patent drafting in both law and technical fields. We propose a multi-agent framework for automatic patent drafting named AutoPatent, which can generate a complete patent using specialized expert agents.

We experiment on the D2P benchmark using commercial models such as GPT-4o, GPT-4o-mini (Achiam et al., 2023) and open source models such as LLAMA3.1 series (8B and 70B) (Dubey et al., 2024), Qwen2.5 series (7B, 14B, 32B and 72B) (Yang et al., 2024), and Mistral-7B (Jiang et al., 2023). Compared to other approaches, Qwen2.5-7B with AutoPatent demonstrates outstanding performance, achieving higher scores on objective metrics and human evaluation while significantly reducing repetition errors. Our main contributions include:

•

We introduce a new task with high application value, Draft2Patent, and construct the corresponding D2P benchmark, which contains 1,933 draft-patent pairs and requires the LLM to generate complete patent documents with an average length of 17K tokens.
•

We present AutoPatent, an innovative multi-agent framework that leverages the collaborative efforts of LLM-based planning agents, writing agents, and examining agents for automatically producing high-quality patents.
•

We propose two innovative methods, PGTree (Patent Writing Guideline Tree) and RRAG (Reference-Review-Augmented Generation), and ablation experiments demonstrate the effectiveness of these two modules.
•

Numerous experiments prove that our AutoPatent excellent in both objective metrics and human evaluation. Moreover, our AutoPatent framework has nice migration and generalization properties, which can significantly improve the patent generation capability of various LLMs.

2 Related Work

Patent Writing.

Researchers have processed the text structure in patent domain with multiple nature language processing methods. (Ni et al., 2024; Wang et al., 2024) focus on the application of patent law-related question-answering task in real-world scenario of intellectual property field. (Jiang and Goetz, 2024) summarizes the patent-related tasks into two types: patent analysis and patent generation. The patent generation task typically includes summarization (Souza et al., 2019; Sharma et al., 2019), translation (Wirth et al., 2023; Heafield et al., 2022), simplification (Casola et al., 2023), and patent writing.

The patent writing task previously focused on the internal conversion of a patent. (Lee and Hsiang, 2020) preliminarily validated the feasibility of using GPT-2-based (Radford et al., ) language models to construct patent claims. (Lee, 2020) converted patent abstract to claims through fine-tuning transformer-based models. (Jiang et al., 2024) introduce a task for generating claims based on detailed description and constructed a benchmark to test this capability of LLMs. (Zuo et al., 2024) use LLMs to convert claims into abstract and generate subsequent independent or dependent claims from existing claims.

But these tasks did not focus on writing a complete patent, (Knappich et al., 2024) constructed a dataset with paper-patent pairs based on chunk-based, outline-guided method to convert papers into patents. But in real-world scenarios, patent granting is affected by previously published papers. Our Draft2Patent task focuses more on interaction between inventors and patent agents, aiming to generate a complete and high-quality patent that can even be submitted to the IP office.

LLMs-based Multi-Agent Framework for Long Text Generation.

(Suzgun et al., 2023) revealed that the average length of patents’ detailed description exceeds 10k tokens, making it challenging to generate. To solve the difficult and valuable task, researchers constructed many multi-agent framework based on role-playing or customized collaboration process Guo et al. (2024); Hong et al. (2024); Chen et al. (2024b). (Bai et al., 2024) indicated that the limitations on LLMs’ output length stem from the long-tail distribution of the training dataset’s length. And they proposed a writing pipeline to solve it based on the agents’ capabilities of planning. (Shao et al., 2024) focused on the generation of wikipedia-like articles based on the agents’ brainstorm before writing. And (Huot et al., 2024) divided the story writing task into the specific agent’s writing task and designed a evaluation framework to asses long narratives.

3 Draft2Patent Task

The IP office only publishes granted patents of inventors, while inventors and patent agents never make their drafts public. In this section, we introduce our novel patent drafting task Draft2Patent and the agent-based method we used to construct the D2P benchmark, in detail.

3.1 Task Definition

We define Draft2Patent task by simulating real-world scenarios, as shown in Figure 1. In real scenario, inventors usually deliver a patent technical draft, which contains the most comprehensive information, to a human patent agent, and ask them to draft a high-quality patent. The patent agent will review and rewrite the patent in terms of patentability, terminology standardization, terminology consistency, claim drafting and legal compliance. This process typically costs a lot of time before the patent is submitted to the IP office.

We leverage an agent-based method to simulate the interaction between inventors and patent agents, designing five questions $q_{1},q_{2},\dots,q_{5}$ that encompass all the relevant information about an invention. We combine them with the inventors’ answers $a_{1},a_{2},...,a_{5}$ to form the patent draft $\mathcal{D}$ and then we can use it to generate the patent $\mathcal{P}$ :

\mathcal{D}=\{(q_{1},a_{1}),(q_{2},a_{2}),...,(q_{5},a_{5})\}

(1)

Where five questions $q_{1},q_{2},...,q_{5}$ are as shown in Appendix A.1. And where the $\mathcal{P}$ consists of the title, abstract, background, summary, claims, and detailed description of a patent.

3.2 D2P Dataset Construction

Draft Construction.

We use the HUPD (Suzgun et al., 2023) dataset constructed based on USPTO’s publicly available patent as data source. We randomly select granted patent samples labeled with decision as ACCEPTED, ensuring that the contents of them are complete. For the patent $\mathcal{P}$ , we simulate GPT-4o-mini as the inventor, asking it the five questions $q_{1},q_{2},...,q_{5}$ . The corresponding answers $a_{1},a_{2},...,a_{5}$ are then combined to form the draft.

Draft Quality Review.

After obtaining 2,000 patent drafts, we establish a standard for assessing the quality of the answers to each question $q_{i}$ to ensure they contain sufficient information about the invention. We simulate GPT-4o as a patent examiner agent to evaluate whether the answers fully address these questions, the concrete prompt as shown in Appendix A.2. Finally, we obtain 1,933 patent drafts that meet the quality standard through the collaborative assessment of the LLM patent agent and human patent agent. These drafts are then divided into a training set of 1,500, a validation set of 133, and a test set of 300.

Other Metadata Construction.

Our D2P dataset not only contains draft-patent pairs, but also includes a fine-tuning dataset for short components generation and patent writing guideline tree (PGTree) generation. We combine the metadata in HUPD with drafts to create paired data, such as draft-title pairs. For patent $\mathcal{P}$ , we simulate GPT-4o-mini as a assistant to summarize each part of the $\mathcal{P}$ ’s description, using it as a PGTree $\mathcal{W}$ for writing the detailed description.

Table 1: Text length distribution in the D2P benchmark.

Section	All #Tokens	Train #Tokens	Valid #Tokens	Test #Tokens
Title	16.2	16.1	16.3	16.2
Abstract	158.7	159.8	151.9	156.7
Claims	1295.1	1292.9	1279.5	1313.1
Summary	1287.5	1300.7	1103.9	1303.1
Background	598.2	595.4	589.9	616.1
Description	14081.4	14210.9	12725.9	14035.0
PGTree	1356.8	1352.6	1373.6	1344.3
Draft	4076.5	4078.0	4022.5	4082.9
Patent	17005.3	17619.9	15911.6	17484.3

We calculate the average length of the 1,933 drafts, patents, PGTrees, and other metadata in D2P benchmark using GPT-4o-mini’s tokenizer, as shown in Table 1. The average length of a complete patent exceeds 17K tokens, with the detailed description averaging over 14K tokens and accounting for more than 80% of the total, while each remaining section averages less than 2K tokens.

4 AutoPatent Framework

We propose an automatic multi-agent patent drafting framework named AutoPatent for Draft2Patent, as shown in Figure 2. We design a specialized pipeline with eight agents and three steps to simulate the process of patent drafting in real-world scenario. We use five specialized agents to generate various sections of a patent, assigning a dedicated writer agent to each short component. And we assign a planning agent to generate a two layers, multi-way PGTree that instruct the reference-review-augmented generation (RRAG) process. We also assign an examiner agent to evaluate the quality of the generated subsections and provide modification suggestions.

In this section, we introduce each agent and the workflow of our AutoPatent framework for drafting the patent $\mathcal{P}$ in detail, given the draft $\mathcal{D}$ . We also provide Algorithm 1, which outlines the key steps of AutoPatent framework.

Algorithm 1 AutoPatent Framework.

0: Draft

\mathcal{D}

0: Complete patent

\mathcal{P}

1: T for Title, A for Abstract, B for Background, S for Summary and C for Claims;

2: T, A, B, S, C = componentWriter.write(

\mathcal{D}

)

3: Reference

\mathcal{R}

= {T, A, B, S, C,

\mathcal{D}

}

4: PGTree

\mathcal{W}

= planningAgent.plan(

\mathcal{D}

) with

m

sections, while the

i

-th section has

t_{i}

subsections;

5: for

i\leftarrow 1

m

6: for

j\leftarrow 1

t_{i}

7: Retrieval content

r_{ij}

from

\mathcal{R}

with guideline

n_{ij}

;

r_{ij}

= descriptionWriter.retrieve(

n_{ij}

\mathcal{R}

)

9: Description writer write the subsection

d_{ij}

;

10:

d_{ij}

= descriptionWriter.write(

r_{ij}

n_{ij}

\mathcal{W}

\mathcal{D}

)

11: Examiner Agent review

d_{ij}

and give feedback;

12: Review, Feedback = examinerAgent.review(

d_{ij}

)

13: for Review is Fail do

14:

d_{ij}

= descriptionWriter.refine(

d_{ij}

, Feedback)

15: Review, Feedback=examinerAgent.review(

d_{ij}

)

16: end for

17: end for

18: end for

19: Detailed Description D =

\{d_{ij}|1\leq j\leq t_{i},1\leq i\leq m\}

20:

\mathcal{P}

= concat(T, A, B, S, D, C)

21: return Complete patent

\mathcal{P}

4.1 Agents

We define each agent $\mathcal{A}$ as a sequence-to-sequence model that takes text as input and generates text as output. In AutoPatent framework, we design eight agents, categorized into three types: writer, planner, and examiner. Each agent has its own task and set of instructions.

Writer Agent.

We categorize six writer agents into two types: short component writer and description writer. Different parts of a patent exhibit significant stylistic differences, with the abstract typically being a single short paragraph and the claims often being lengthy and structured with numbered points. As shown in Table 1, all the average length of component is less than 2K tokens, except detailed description.

We define these agents as $\mathcal{A}_{i}$ , where $i\in\{T,A,B,S,C,D\}$ , representing the corresponding component writers responsible for generating specific texts, such as the title writer. The description writer $\mathcal{A}_{D}$ is responsible for drafting each subsection of the description and executes the RRAG by retrieving references and interacting with the examiner agent to refine the content.

Planner Agent.

The average length of the detailed description exceeds 14,000 tokens, as shown in Table 1. It is currently impossible for LLMs to generate high-quality descriptions that meet legal and technical standards in a single pass. Leveraging the agent’s ability to organize and structure content, we define a planning agent $\mathcal{A}_{P}$ to generate PGTree for the detailed description, which serve to instruct the description writer in RRAG.

Examiner Agent.

We define a patent examiner agent, $\mathcal{A}_{E}$ , for two types of quality assessments. When an inventor submits a new draft, the examiner agent needs to evaluate its quality to determine whether it meets the required standards and work collaboratively with the inventor to refine the draft, as detailed in Section 3.2. When the description writer completes the generation of a subsection, the examiner agent actively intervenes to assess the quality of the content and provide feedback to the description writer for refinement until the review is successfully passed.

Table 2: Experiment results for objective metric. The bold number indicates the best performance under the same base model conditions, while the underlined number represents the second-best. The red number represents the IRR scores below 90 when

t=0.4

and below 80 when

t=0.2

Models	BLEU	ROUGE-1	ROUGE-2	ROUGE-L	IRR (t=0.2)	IRR (t=0.4)	Avg #Tokens
LLAMA3.1-8B	10.83	27.63	10.96	11.89	86.06	96.45	6431.17
LLAMA3.1-8B + SFT	39.62	30.82	15.65	19.44	49.17	64.90	17052.23
LLAMA3.1-70B	3.27	28.80	11.76	10.95	88.41	97.38	1999.40
Qwen2.5-7B	8.51	30.61	12.51	12.77	89.49	98.16	2927.31
Qwen2.5-7B + SFT	49.10	36.82	19.12	22.60	71.18	86.58	11716.18
Qwen2.5-14B	5.70	30.08	12.01	11.41	91.33	98.48	2480.07
Qwen2.5-32B	2.65	27.49	10.95	10.06	88.22	98.06	1916.09
Qwen2.5-72B	15.10	33.09	13.54	13.86	91.14	98.58	3804.12
Mistral-7B	2.46	25.75	9.66	8.90	89.70	98.04	1703.09
Mistral-7B + SFT	46.73	33.18	16.85	21.14	62.49	81.11	15031.85
GPT-4o	0.94	23.26	7.90	6.96	90.60	99.02	1247.49
GPT-4o-mini	3.32	27.28	10.06	9.17	91.48	98.94	1935.26
AutoPatent_LLAMA3.1-8B	49.22	31.95	13.72	18.75	89.54	96.07	12496.74
AutoPatent_Qwen2.5-7B	53.03	31.68	13.76	19.08	93.61	98.79	11432.79
AutoPatent_Mistral-7B	47.56	30.40	12.64	17.72	82.50	91.55	15481.03
AutoPatent_GPT-4o-mini	50.83	30.85	11.24	16.37	96.92	99.56	13018.17

4.2 Framework Workflow

As shown in the blue section of Figure 2, we divide the workflow of the AutoPatent framework into three steps, simulating the real-world scenario of patent drafting.

Short Components Generation.

In step I, we leverage different agents to generate various short components of a patent based on draft $\mathcal{D}$ , considering differences in style. For open-source models with a parameter size of less than 14B, we fine-tune them using the D2P training set to enhance their ability to generate high-quality short components, while zero-shot prompting is used for commercial or larger models. The concrete prompt for supervised fine-tuning of short component generation is shown in Appendix B.1. We then combine the generated title, abstract, background, summary, and claims with the draft $\mathcal{D}$ to form the reference $\mathcal{R}$ , which can offer useful information for detailed description generation.

Patent Writing Guideline Tree (PGTree) Building.

In step II, given draft $\mathcal{D}$ , we fine-tune the planning agent for smaller models and use zero-shot prompting for others, to generate a PGTree $\mathcal{W}$ for a detailed description. Assuming that the a PGTree consist of $m$ sections, the $i$ -th section contains $t_{i}$ subsections, as shown in Figure 3. The generated PGTree is structured as a two-layer multi-way tree, dividing the description generation task into two levels of outlines. The first level of the outline provides an overview of the section, while the second level offers concrete instructions for the description writer to generate concise content for that part of the description. The prompt as shown in Appendix B.2.

Reference-Review-Augmented Generation (RRAG).

In step III, given the guideline of $n_{ij}$ , the description writer retrieves useful information $r_{ij}$ from reference $\mathcal{R}$ . This process connects the planning agent with other short component writers, enhancing consistency throughout the entire patent and reducing the difficulty of tasks like writing claims in the detailed description. The prompt for retrieval as shown in Appendix B.1.

After obtaining $r_{ij}$ , we combine it with $n_{ij}$ and the PGTree $\mathcal{W}$ to instruct the description writer in generating the corresponding part $d_{ij}$ of the description. The examiner agent then actively intervenes to review the quality of $d_{ij}$ and provides feedback for its refinement through multi-turn interactions with the description writer until the examiner agent deems the output acceptable. After traversing the PGTree for the $m$ sections, all the accepted $d_{ij}$ are concatenated to form the complete detailed description $d$ . Finally, we combine all the generated text to form a complete patent $\mathcal{P}$ . The description generation prompt is provided in Appendix B.3, while the review prompt is in Appendix B.4.

5 Experiment

5.1 Evaluation Metric

We use the n-gram-based metric, BLEU (Papineni et al., 2002), the F1 scores of ROUGE-1, ROUGE-2, and ROUGE-L (Lin, 2004) as the objective metrics. (Sai et al., 2022) indicated that n-gram-based metrics exhibit a preference for repeated n-grams and short sentences. We propose a new metric, termed IRR (Inverse Repetition Rate), to measure the degree of sentence repetition within the patent $\mathcal{P}=\{s_{i}|1\leq i\leq n\}$ , which consists of $n$ sentences. The IRR is defined as:

\textit{IRR}(\mathcal{P},t)=\frac{C_{n}^{2}}{\sum_{i=1}^{n-1}\sum_{j=i+1}^{n}f(s_{i},s_{j})+\varepsilon}

(2)

Where $\varepsilon$ is a small value added for smoothing to prevent division by zero, and $t$ is threshold for determining whether two sentences, $s_{i}$ and $s_{j}$ , are considered repetitions based on their Jaccard similarity $J$ , calculated after removing stop words. The function $f(s_{i},s_{j})$ is defined as:

f(s_{i},s_{j})=\begin{cases}1,&\text{if }J(s_{i},s_{j})\geq t,\\ 0,&\text{if }J(s_{i},s_{j})<t.\end{cases}

(3)

We invite three experts who are familiar with the patent law and patent drafting to evaluate the quality of generated patent using a single-bind review. We provide them with the review standards in Appendix C.2 regarding accuracy, comprehensiveness, logic, clarity, coherence, and consistency. We shuffle the two patents so that the experts won’t know which one comes from our AutoPatent framework. Each expert is required to select the winner between the two patents or choose a tie, using a real patent as a reference. Due to the extensive length of patents, we opted not to use LLM-based evaluation methods as they often fail to provide biased and inaccurate results.

5.2 Compared Method

We compare our AutoPatent framework with two baseline methods: zero-shot prompting and supervised fine-tuning. Both of these methods take the draft as input and output a complete patent in an end-to-end manner.

Zero-shot Prompting Generation.

We use the zero-shot prompt shown in Appendix C.1 to instruct commercial models, including GPT-4o and GPT-4o-mini, and open-source models, including different sizes of LLAMA3.1, Qwen2.5 and Mistral to generate complete patents directly. We set the maximum token limit to all models, such as 16,384 for the GPT series models and 32,768 for the LLAMA3.1 series models. The temperature is set to 0.5 and top p to 0.9 for all models to improve the stability of the output.

Supervised Fine-Tuning Generation.

We use 1,500 draft-patent pairs from D2P’s training set for fully supervised fine-tuning of the LLAMA3.1-8B, Qwen2.5-7B and Mistral-7B. We set the batch size to 2 and the learning rate to 2e-5 for 5 epochs, using 8 NVIDIA A800 80G GPUs and DeepSpeed ZeRO3, denoted as +SFT in table.

5.3 Result

Table 3: Ablation results of ablation experiment results for removing PGTree or RRAG module.

Models	+AutoPatent	BLEU	ROUGE-1	ROUGE-2	ROUGE-L	IRR (t=0.2)	Avg #Tokens
GPT-4o-mini		50.83	30.85	11.24	16.37	96.92	13018.17
	w/o PGTree	3.43	27.61	10.27	9.23	91.05	1914.74
	w/o RRAG	48.39	30.76	11.21	15.94	96.10	11426.72

Objective Metric Results.

We report the objective metric results on D2P’s test set and the average length of generated patent in Table 2. Observing the average length in the results, all models generate patents with an average length of less than 3,000 tokens using the zero-shot prompt. While generated with AutoPatent, the average length of the patent exceeds 10K tokens. For n-gram-based metric, our AutoPatent framework achieves the higher performance both within the same base models. We observed that when leveraging Qwen2.5-7B as the base model in the AutoPatent framework surpasses the performance of GPT-4o-mini.

The average IRR score across all real patents in the test set is 91.33 when t is 0.2 and 98.57 when t is 0.4. This phenomenon is primarily attributed to the stylistic characteristics of patent language, such as the inclusion of claims within the description. The patents generated using the supervised fine-tuning method exhibit significant repetition errors, with an IRR score of 49.17 for LLAMA3.1-8B+SFT, 71.18 for Qwen2.5-7B+SFT and 62.49 for Mistral-7B+SFT when the threshold is set to 0.2. These severe repetition errors lead to over-rewarding in n-gram-based metrics, resulting in the best scores for SFT, despite its actual quality being poor.

Human Evaluation Results.

We report the human evaluation results in Figure 4, comparing Qwen2.5-7B+AutoPatent with zero-shot prompting generation (denoted as GPT-4o, GPT-4o-mini, Qwen2.5-7B, Qwen2.5-72B, LLAMA3.1-70B) and SFT generation (denoted as Qwen2.5-7B+SFT) for 50 generated patents. The three human experts all agree that the quality of the complete patents generated using the AutoPatent framework outperformed other models across six dimensions in Appendix C.2.

6 Analysis

6.1 Ablation Study

We conduct three types of ablation experiments. Two of these use GPT-4o-mini as the base model to evaluate the AutoPatent framework without PGTree or RRAG module. The third experiment uses LLAMA3.1-8B as the base model to evaluate the performance of the short components writer and planning agent without fine-tuning. The results are reported in Table 3 and Table 4.

Ablation on PGTree.

We conduct an ablation experiment on PGTree. After the different short component writers generate the corresponding parts of the patent, the description writer completes the full detailed description in a single pass without utilizing the PGTree. Observing the results in Table 3, the average length drops below 2,000 tokens, and all objective metrics decrease significantly, with BLEU experiencing an almost 15-times reduction.

Ablation on RRAG.

We conduct an ablation experiment on RRAG. When the description writer generates a subsection $d_{ij}$ using the guideline $n_{ij}$ , it simply adds it to the list of description candidates without considering advice from the examination agent. Table 3 shows that removing the RRAG results in a 4.8% reduction in the BLEU score, along with declines in all other objective metrics. Without the reference and the supervision of the examiner agent, repetition errors slightly increase, as reflected by a minor decrease in the IRR score.

Ablation on Fine-tuning Agent.

We conduct an ablation experiment on generating short components without fine-tuning. Due to differences among patents, the base model exhibits varying capabilities when generating different short components. As shown in Table 4, the objective metric scores demonstrate significant improvement, particularly in generating patent-style titles, with BLEU and ROUGE-L scores increasing by approximately 6.7 times. We also use BERTScore as a semantic metric, observing significant improvement across all tasks.

Table 4: Ablation results on fine-tuning agents. +SFT denotes the fine-tuning agent, while the other row represents results without fine-tuning.

Task		BLEU	ROUGE-L	BERTScore
D2T		8.64	8.65	60.46
D2T	+SFT	67.09	66.48	90.86
D2A		38.75	23.16	63.47
D2A	+SFT	62.52	38.66	72.05
D2B		27.55	11.93	58.95
D2B	+SFT	36.11	19.35	64.67
D2S		29.49	14.64	57.77
D2S	+SFT	28.30	21.88	66.99
D2C		39.41	24.24	70.69
D2C	+SFT	48.72	30.49	74.09
D2W		58.74	20.69	66.48
D2W	+SFT	72.37	26.31	73.46

6.2 Case study

We carefully review all the generated patents based on different methods and conduct a detailed analysis. The generated patent using SFT exhibits significant repetition errors, resulting in meaningless content and even leading to the failure to generate complete content. This phenomenon results in high ROUGE scores for SFT, but human evaluation highlights its shortcomings. As shown in Appendix D.1, we present a comparison between SFT and our AutoPatent framework.

The patents generated using AutoPatent exhibit greater comprehensiveness, with no missing parts, and are even capable of generating flowcharts. Other methods often fail to generate complete content, such as missing descriptions, as shown in Appendix D.2. The AutoPatent framework also improves consistency in generated patents, such as ensuring alignment between claims in the description and those outside it, which other methods fail to maintain.

7 Conclusion

In this work, we introduce a novel and practical task, Draft2Patent and its corresponding D2P benchmark containing 1,933 draft-patent pairs, which requires LLMs to generate full patent documents with an average length of 17K tokens. Due to the specialized nature of patents, standardized terminology, and long length, mainstream LLMs perform poorly. We propose an innovative multi-agent framework, AutoPatent, which capitalizes on the collaborative efforts of a LLM-based planning agent, six writing agents, and a review agent to produce high-caliber patent documents with novel PGTree and RRAG method.

Our experimental results indicate that AutoPatent markedly enhances the capacity of various LLMs to generate full patents. Moreover, we have discovered that patents generated exclusively with the AutoPatent framework, utilizing the Qwen2.5-7B model, surpass those produced by more extensive and potent LLMs, such as GPT-4o, Qwen2.5-72B, and LLAMA3.1-70B, in both objective metrics and human evaluations. Remarkably, the quality of patents generated by AutoPatent rivals that of human authorship. We hope that AutoPatent will revolutionize the way patents are generated and managed, simplifying the process and lowering the barriers to innovation.

Limitations

The patent evaluation task is highly challenging, involving intricate legal and technical standards that demand meticulous review by human experts. This results in low efficiency and high costs in human evaluation, and we will explore a concise method for automated patent evaluation in the future. Due to limitations in computing resources, we do not fully fine-tune LLMs with a parameter size of 14B or larger.

Ethics Statement

All patent data used in this paper are obtained from publicly accessible sources. The purpose of the Draft2Patent task is to improve the efficiency of patent agents in drafting applications before submission to the IP office. We do not encourage people to use our method to generate fake or meaningless patents that would burden the IP office’s examination department. We acknowledge that the patents generated by our method are not yet sufficient to be submitted directly to the IP office. They still require modification by a patent agent to ensure they meet patent law and technical standards.

Acknowledgement

This work was supported by GuangDong Basic and Applied Basic Research Foundation (2023A1515110718 and 2024A1515012003), China Postdoctoral Science Foundation (2024M753398), Postdoctoral Fellowship Program of CPSF (GZC20232873).

References

Achiam et al. (2023) Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. 2023. Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
Bai et al. (2024) Yushi Bai, Jiajie Zhang, Xin Lv, Linzhi Zheng, Siqi Zhu, Lei Hou, Yuxiao Dong, Jie Tang, and Juanzi Li. 2024. Longwriter: Unleashing 10,000+ word generation from long context llms. arXiv preprint arXiv:2408.07055.
Casola et al. (2023) Silvia Casola, Alberto Lavelli, and Horacio Saggion. 2023. Creating a silver standard for patent simplification. Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval.
Chen et al. (2024a) Guhong Chen, Liyang Fan, Zihan Gong, Nan Xie, Zixuan Li, Ziqiang Liu, Chengming Li, Qiang Qu, Shiwen Ni, and Min Yang. 2024a. Agentcourt: Simulating court with adversarial evolvable lawyer agents. arXiv preprint arXiv:2408.08089.
Chen et al. (2024b) Jiangjie Chen, Xintao Wang, Rui Xu, Siyu Yuan, Yikai Zhang, Wei Shi, Jian Xie, Shuang Li, Ruihan Yang, Tinghui Zhu, Aili Chen, Nianqi Li, Lida Chen, Caiyu Hu, Siye Wu, Scott Ren, Ziquan Fu, and Yanghua Xiao. 2024b. From persona to personalization: A survey on role-playing language agents. ArXiv, abs/2404.18231.
Cheng et al. (2024) Yuheng Cheng, Ceyao Zhang, Zhengwen Zhang, Xiangrui Meng, Sirui Hong, Wenhao Li, Zihao Wang, Zekai Wang, Feng Yin, Junhua Zhao, and Xiuqiang He. 2024. Exploring large language model based intelligent agents: Definitions, methods, and prospects. ArXiv, abs/2401.03428.
Cui et al. (2023) Jiaxi Cui, Zongjia Li, Yang Yan, Bohua Chen, and Li Yuan. 2023. Chatlaw: A multi-agent collaborative legal assistant with knowledge graph enhanced mixture-of-experts large language model.
Dubey et al. (2024) Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. 2024. The llama 3 herd of models. arXiv preprint arXiv:2407.21783.
EPO (1994) EPO. 1994. Guidelines for examination in the european patent office. page 1 volume ;.
Gao et al. (2024) Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, and Marinka Zitnik. 2024. Empowering biomedical discovery with ai agents. Cell, 187(22):6125–6151.
Guo et al. (2024) Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. 2024. Large language model based multi-agents: A survey of progress and challenges. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI-24, pages 8048–8057. International Joint Conferences on Artificial Intelligence Organization. Survey Track.
Heafield et al. (2022) Kenneth Heafield, Elaine Farrow, Jelmer van der Linde, Gema Ramírez-Sánchez, editor = ”Calzolari Nicoletta Wiggins, Dion”, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, booktitle = ”Proceedings of the Thirteenth Language Resources Piperidis, Stelios”, and Evaluation Conference. 2022. The EuroPat corpus: A parallel corpus of European patent data. pages 732–740, Marseille, France. European Language Resources Association.
Hong et al. (2024) Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. 2024. MetaGPT: Meta programming for a multi-agent collaborative framework. In The Twelfth International Conference on Learning Representations.
Huot et al. (2024) Fantine Huot, Reinald Kim Amplayo, Jennimaria Palomaki, Alice Shoshana Jakobovits, Elizabeth Clark, and Mirella Lapata. 2024. Agents’ room: Narrative generation through multi-step collaboration. arXiv preprint arXiv:2410.02603.
Jiang et al. (2023) Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. 2023. Mistral 7b. arXiv preprint arXiv:2310.06825.
Jiang and Goetz (2024) Lekang Jiang and Stephan Goetz. 2024. Artificial intelligence exploring the patent field. arXiv preprint arXiv:2403.04105.
Jiang et al. (2024) Lekang Jiang, Caiqi Zhang, Pascal A Scherz, and Stephan Goetz. 2024. Can large language models generate high-quality patent claims? arXiv preprint arXiv:2406.19465.
Kim et al. (2024) Yubin Kim, Chanwoo Park, Hyewon Jeong, Yik Siu Chan, Xuhai Xu, Daniel McDuff, Hyeonhoon Lee, Marzyeh Ghassemi, Cynthia Breazeal, and Hae Won Park. 2024. Mdagents: An adaptive collaboration of llms for medical decision making. Advances in Neural Information Processing Systems, 37.
Knappich et al. (2024) Valentin Knappich, Simon Razniewski, Anna Hätty, and Annemarie Friedrich. 2024. Pap2pat: Towards automated paper-to-patent drafting using chunk-based outline-guided generation. arXiv preprint arXiv:2410.07009.
Lee (2020) Jieh-Sheng Lee. 2020. Controlling patent text generation by structural metadata. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, CIKM ’20, page 3241–3244, New York, NY, USA. Association for Computing Machinery.
Lee and Hsiang (2020) Jieh-Sheng Lee and Jieh Hsiang. 2020. Patent claim generation by fine-tuning openai gpt-2. World Patent Information, 62:101983.
Lin (2004) Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
Ni et al. (2024) Shiwen Ni, Minghuan Tan, Yuelin Bai, Fuqiang Niu, Min Yang, Bowen Zhang, Ruifeng Xu, Xiaojun Chen, Chengming Li, and Xiping Hu. 2024. MoZIP: A multilingual benchmark to evaluate large language models in intellectual property. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 11658–11668, Torino, Italia. ELRA and ICCL.
Ni and Yang (2024) Shiwen Ni and Min Yang. 2024. Educational-psychological dialogue robot based on multi-agent collaboration. arXiv preprint arXiv:2412.03847.
Papineni et al. (2002) Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
(26) Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners.
Sai et al. (2022) Ananya B. Sai, Akash Kumar Mohankumar, and Mitesh M. Khapra. 2022. A survey of evaluation metrics used for nlg systems. ACM Comput. Surv., 55(2).
Shao et al. (2024) Yijia Shao, Yucheng Jiang, Theodore Kanell, Peter Xu, Omar Khattab, and Monica Lam. 2024. Assisting in writing Wikipedia-like articles from scratch with large language models. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 6252–6278, Mexico City, Mexico. Association for Computational Linguistics.
Sharma et al. (2019) Eva Sharma, Chen Li, and Lu Wang. 2019. BIGPATENT: A large-scale dataset for abstractive and coherent summarization. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 2204–2213, Florence, Italy. Association for Computational Linguistics.
Souza et al. (2019) Cinthia M. Souza, Matheus E. Santos, Magali R. G. Meireles, and Paulo E. M. Almeida. 2019. Using summarization techniques on patent database through computational intelligence. In Progress in Artificial Intelligence: 19th EPIA Conference on Artificial Intelligence, EPIA 2019, Vila Real, Portugal, September 3–6, 2019, Proceedings, Part II, page 508–519, Berlin, Heidelberg. Springer-Verlag.
Sun et al. (2024a) Jingyun Sun, Chengxiao Dai, Zhongze Luo, Yangbo Chang, and Yang Li. 2024a. Lawluo: A chinese law firm co-run by llm agents. ArXiv, abs/2407.16252.
Sun et al. (2024b) Jingyun Sun, Chengxiao Dai, Zhongze Luo, Yangbo Chang, and Yang Li. 2024b. Lawluo: A chinese law firm co-run by llm agents. arXiv preprint arXiv:2407.16252.
Suzgun et al. (2023) Mirac Suzgun, Luke Melas-Kyriazi, Suproteem Sarkar, Scott D Kominers, and Stuart Shieber. 2023. The harvard uspto patent dataset: A large-scale, well-structured, and multi-purpose corpus of patent applications. In Advances in Neural Information Processing Systems, volume 36, pages 57908–57946. Curran Associates, Inc.
Toole et al. (2020) Andrew Toole, Nicholas Pairolero, Alexander Giczy, James Forman, Christyann Pulliam, Matthew Such, and B Rifkin. 2020. Inventing ai: Tracing the diffusion of artificial intelligence with us patents. US Patent and Trademark Office, Alexandria, 5:2020.
USTPO (2020) USTPO. 2020. Manual of patent examining procedure. pages 4 v. (loose–leaf). This resource was extracted from USPTO.gov.
Wang et al. (2024) Qiyao Wang, Jianguo Huang, Shule Lu, Yuan Lin, Kan Xu, Liang Yang, and Hongfei Lin. 2024. Ipeval: A bilingual intellectual property agency consultation evaluation benchmark for large language models. arXiv preprint arXiv:2406.12386.
Wirth et al. (2023) Matthias Wirth, Volker D Hähnke, Franco Mascia, Arnaud Wéry, Konrad Vowinckel, Marco del Rey, Raúl Mohedano del Pozo, Pau Montes, and Alexander Klenner-Bajaja. 2023. Building machine translation tools for patent language: A data generation strategy at the european patent office. In Proceedings of the 24th Annual Conference of the European Association for Machine Translation, pages 471–479.
World Intellectual Property Organization (2022) World Intellectual Property Organization. 2022. WIPO Patent Drafting Manual - Second Edition. World Intellectual Property Organization, 34, chemin des Colombettes, P.O. Box 18, CH-1211 Geneva 20, Switzerland. Attribution 4.0 International (CC BY 4.0). Photo credits: Getty Images.
Yang et al. (2024) An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, Tianhao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, and Zhihao Fan. 2024. Qwen2 technical report. arXiv preprint arXiv:2407.10671.
Yang (2024) Hongyang Yang. 2024. Finrobot: An open-source ai agent platform for financial applications using large language models. SSRN Electronic Journal.
Yu et al. (2023) Yangyang Yu, Haohang Li, Zhi Chen, Yuechen Jiang, Yang Li, Denghui Zhang, Rong Liu, Jordan W. Suchow, and Khaldoun Khashanah. 2023. Finmem: A performance-enhanced llm trading agent with layered memory and character design. In AAAI Spring Symposia.
Zhang et al. (2024) Wentao Zhang, Lingxuan Zhao, Haochong Xia, Shuo Sun, Jiaze Sun, Molei Qin, Xinyi Li, Yuqing Zhao, Yilei Zhao, Xinyu Cai, Longtao Zheng, Xinrun Wang, and Bo An. 2024. A multimodal foundation agent for financial trading: Tool-augmented, diversified, and generalist. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’24, page 4314–4325, New York, NY, USA. Association for Computing Machinery.
Zuo et al. (2024) You Zuo, Kim Gerdes, Éric Clergerie, and Benoît Sagot. 2024. PatentEval: Understanding errors in patent generation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 2687–2710, Mexico City, Mexico. Association for Computational Linguistics.

Appendix A Prompt for Data Collection

A.1 Five Questions

As shown in Table 5, we present the five questions for draft collection. These questions arise from our discussions with professional human patent agents and contain comprehensive information for patent drafting.

Table 5: Five key questions for obtaining a patent technical draft.

Question 1

q_{1}

: What is the technical problem that this patent aims to solve? Question 2

q_{2}

: What is the technical background of this invention, the most similar existing solutions, and its advantages over these solutions? Question 3

q_{3}

: What is the detailed technical solution of the invention? Question 4

q_{4}

: What are the key points of the invention, and which points are intended to be protected? Question 5

q_{5}

: What is the detailed description of each figure individually?

A.2 Draft Quality Review Prompt

As shown in Table 6, we assess the draft’s quality with the LLM-based examiner agent using this prompt.

Table 6: Prompt used by LLM-based examiner agent for quality review of patent technical drafts. Where

q_{i}

represents the

i

-th question and

a_{i}

represents the answer to the

i

-th question.

For Question 1

q_{1}

: # Draft: {

a_{1}

} # Requirements: The text of this draft section must include the technical problem solved by the invention. If it is included, just return ¡Result¿ Pass ¡/Result¿; if it is not included, return ¡Result¿ Fail ¡/Result¿, and provide a detailed explanation in ¡Reason¿ waiting for filling ¡/Reason¿. Please tell me if this section of the draft meets the quality standards. For Question 2

q_{2}

: # Draft: {

a_{2}

} # Requirements: The text of this draft section must include the background of the technology, the existing technical solutions, the shortcomings of the existing technology, and the advantages of the present invention. If it is included, just return ¡Result¿ Pass ¡/Result¿; if it is not included, return ¡Result¿ Fail ¡/Result¿, and provide a detailed explanation in ¡Reason¿ waiting for filling ¡/Reason¿. Please tell me if this section of the draft meets the quality standards. For Question 3

q_{3}

: # Draft: {

a_{3}

} # Requirements: The text of this draft section must include a detailed technical solution, which should describe the specific technical means for implementing the invention. If it is included, just return ¡Result¿ Pass ¡/Result¿; if it is not included, return ¡Result¿ Fail ¡/Result¿, and provide a detailed explanation in ¡Reason¿ waiting for filling ¡/Reason¿. Please tell me if this section of the draft meets the quality standards. For Question 4

q_{4}

: # Draft: {

a_{4}

} # Requirements: The text of this draft section must include the description of the drawings for the invention, where each figure must correspond to its respective drawing description one by one. If it is included, just return ¡Result¿ Pass ¡/Result¿; if it is not included, return ¡Result¿ Fail ¡/Result¿, and provide a detailed explanation in ¡Reason¿ waiting for filling ¡/Reason¿. Please tell me if this section of the draft meets the quality standards. For Question 5

q_{5}

: # Draft: {

a_{5}

} # Requirements: The text of this draft section must include a detailed technical solution, which should describe the specific technical means for implementing the invention. If it is included, just return ¡Result¿ Pass ¡/Result¿; if it is not included, return ¡Result¿ Fail ¡/Result¿, and provide a detailed explanation in ¡Reason¿ waiting for filling ¡/Reason¿ Please tell me if this section of the draft meets the quality standards.

A.3 PGTree Collection Prompt

As shown in Table 7, we use this prompt to obtain the PGTrees, which are then used to fine-tune the planning agent.

Table 7: Prompt for PGTree collection. Where

d

represents the detailed description.

Description: {

d

} Based on the provided patent description text, summarize the key parts and provide detailed guidance for drafting the content of each part. Please output in the following format, with each section described in a single paragraph: ¡Section-1¿ Main content and drafting points for this section ¡/Section-1¿ ¡Section-2¿ Main content and drafting points for this section ¡/Section-2¿ … ¡Section-n¿ Main content and drafting points for this section ¡/Section-n¿ Ensure that each section is specific and cohesive, covering the full content of the patent description. Please strictly adhere to the required format.

Appendix B Prompt for AutoPatent Framework

B.1 Short Component Writer Prompt

As shown in Figure 2 and Section 4.2, agents need to generate the corresponding short component, using the prompt in short components generation process, which as shown in Table 8.

Table 8: Prompt for generating short components used in the autoPatent framework. Where

\mathcal{D}

represents the patent technical draft.

Title Writer Prompt Draft: {

\mathcal{D}_{i}

} Based on the above patent draft, please generate a patent title that complies with legal and patent regulations, and follows the format below: ¡Title¿the title of patent¡/Title¿ Abstract Writer Prompt Draft: {

\mathcal{D}

} Based on the provided patent draft, please generate a patent abstract that complies with legal and patent regulations, following the format below: ¡Abstract¿the abstract of patent¡/Abstract¿ Background Writer Prompt Draft: {

\mathcal{D}

} Please generate the detailed background information for the patent based on the above patent draft. The background information should include the technical field of the patent, provide an objective introduction to the existing technology relevant to the invention, and point out any deficiencies or issues in the existing technology. Additionally, summarize the purpose or motivation of the invention without disclosing specific details. Please avoid negative comments about the existing technology or others’ patents. The content should be clear and concise, avoiding unnecessary complexity. Please output in the following format: ¡Background¿the background information of patent¡/Background¿ Summary Writer Prompt Draft: {

\mathcal{D}

} Please generate the summary for the patent based on the above patent draft. The summary should provide a detailed overview of the invention, including the technical field, the problems in the prior art that the invention addresses, and the key technical features of the invention. The summary should explain how the invention solves the identified problems without delving into specific implementation details. Ensure the summary is clear, concise, and focused on the invention’s main objectives and advantages. Please output in the following format: ¡Summary¿the summary of the patent¡/Summary¿ Claims Writer Prompt Draft: {

\mathcal{D}

} Based on the patent draft, please generate patent claims that comply with legal and patent regulations. The claims should be written in clear language, avoiding ambiguity or vague descriptions.The independent claims should cover the core technical features of the invention and should not rely on other claims. The dependent claims should supplement or limit the independent claims, referencing the relevant independent claims.The claims must focus on a single invention, ensuring the unity of the invention, and must be consistent with the content of the draft. Ensure that the described invention possesses novelty, inventive step, and industrial applicability.The claims should clearly define the scope of the invention’s protection through specific technical features (such as components, steps, or systems).Each claim should end with a complete sentence, be numbered sequentially, and have an appropriate scope—neither too narrow nor too broad. Please strictly adhere to these guidelines when generating the patent claims and following the format below: ¡Claims¿the claims of patent¡/Claims¿

B.2 Planner Agent Prompt

Figure 3 illustrates the detailed structure of the PGTree. The concrete prompt used by the planning agent is shown in Table 9.

B.3 Description Writer Prompt

As shown in Figure 3, the description writer is responsible for RRAG: executing the retrieval process, generating description subsections based on the PGTtrees, and refining the subsections based on feedback from the examiner agent.

Retrieval Reference Prompt.

As shown in Table 10, this is the prompt for finding useful content in reference $\mathcal{R}$ based on the guideline $n_{ij}$ .

Description Generation Prompt.

As shown in Table 11, this is the prompt for generating subsections of detailed descriptions based on PGTrees and useful content retrieved from references.

Refinement Content Prompt.

As shown in Table 12, this is the prompt for the description writer to refine the subsection based on feedback from the examiner agent.

B.4 Examiner Agent Prompt

As shown in Figure 2, when the description writer completes the generation of a subsection, the examiner agent actively intervenes to assess the quality of the content and provide feedback. The concrete prompt used by the examiner agent is shown in Table 13.

Appendix C More Prompts

C.1 Zero-Shot Prompt

We conduct experiments on GPT-4o, GPT-4o-mini, LLAMA3.1-8B, and LLAMA3.1-70B using a zero-shot prompt, as detailed in Table 14.

C.2 Human Expert Evaluation Standard

Since the experts we invited are all familiar with patent law and patent drafting, we only provide them with the dimensions and corresponding definitions they should focus on during the evaluation process, without providing additional knowledge. Meanwhile, we inform the experts to disregard any external factors, carefully read the patent texts to be evaluated, and finally choose between options document 1, document 2, or indicate a tie. Below are the dimensions the experts need to focus on:

•

Accuracy: The patent text must ensure that every technical detail is correct, avoiding vague expressions. Parameters, structures, and processes should be described specifically and clearly to ensure that the invention can be technically realized. The terms used should align with the standards of the technical field to avoid ambiguity and ensure consistent and accurate expression. When describing technical features, overly narrow or restrictive language should be avoided to ensure that the scope of patent protection is not unnecessarily limited.
•

Logic: The patent text must be structured in accordance with patent law requirements, ensuring that the application includes the necessary sections, such as abstract, background, summary, detailed description, and claims. The logical progression of the text should be natural, enabling the reader to gradually understand the background, the innovation, and the specific application of the invention. Each section should be logically connected to the preceding and following technical descriptions to ensure a clear and coherent presentation of the invention.
•

Comprehensiveness: The patent text should comprehensively disclose the invention so that a person skilled in the art can understand and implement it, avoiding any omissions or vague descriptions. In addition to detailing specific embodiments, the patent text should use broad terms that cover various modifications and alternatives to maximize the scope of legal protection. Given that patent texts are generally lengthy, combining full disclosure with broad protection ensures both the implementability of the invention and its legal strength.
•

Clarity: The patent text should strike a good balance between technical and legal aspects to ensure that the description of the technical solution is both clear and easy to understand. While the text needs to maintain a certain level of technical rigor and legal accuracy, the language should still be concise and clear, avoiding overly complex or lengthy sentence structures. Redundant descriptions should be avoided unless necessary to convey critical technical details. This clarity ensures that the patent examiner can quickly grasp the core content of the invention.
•

Coherent: The patent text must express the invention precisely to avoid ambiguity or the use of vague terms. Each technical feature should be clearly understood, avoiding terms like “almost” or “approximately” that create uncertainty. The sections and paragraphs should be organized logically, ensuring smooth progression of ideas. Each section should be clearly distinguished to allow the examiner to progressively understand the overall invention.
•

Consistency: The patent text must maintain consistency with the provided Real Patent, ensuring that the description of the technical solution is accurate and coherent. References to technical features should complement rather than contradict each other. Consistency across sections and in terminology should be preserved, ensuring a clear and coherent expression of the invention’s core technology. This consistency reduces the risk of invalidation and enhances the patent’s legal stability.

Table 9: Prompt for PGTrees generation by the Planner Agent in the AutoPatent Framework. Where

\mathcal{D}

represents the patent technical draft.

Draft: {

\mathcal{D}

} Based on the provided patent draft, I need you to help me write a detailed writing guide for the patent description. This guide should consist of multiple sections, with each section providing guidance for writing a part of the patent description and including key points to cover for that section. Please output in the following format: ¡Section-1¿ Main content and key points for writing this section ¡/Section-1¿ ¡Section-2¿ Main content and key points for writing this section ¡/Section-2¿ … ¡Section-n¿ Main content and key points for writing this section ¡/Section-n¿ Ensure that each part of the guide is clear, specific and cohesive, covering the entire content of the patent description. Please strictly adhere to the required format.

Table 10: Prompt for reference retrieval process. Where

\mathcal{R}

is the reference content, and

n_{ij}

is the writing guideline.

Reference Conetent: {

\mathcal{R}

} Writing Plan: {

n_{ij}

} According to the patent text writing plan, determine which of the following contents are necessary for writing this section, and copy the all relevant content without modifying or adding anything. Just output the needed information for drafting this subsection.

Table 11: Prompt for generating subsections of detailed description based on PGTree and retrieved reference content. Where

r_{ij}

is the reference content, and

n_{ij}

is the subsection of writing plan

\mathcal{W}

¡Reference¿{

r_{ij}

}¡/Reference¿ Writing Guideline Overview: {

\mathcal{W}

} Subsection Writing Guideline: {

n_{ij}

} Based on the content in ¡Reference¿¡/Reference¿ and the subsection writing guideline, please draft this subsection, ensuring that the description complies with legal and patent regulations. Just output this subsection of patent description, and don’t output other content.

Table 12: Prompt for refining the subsection already written

d_{ij}

based on the examiner agent’s feedback, where

n_{ij}

is the subsection of the writing plan

\mathcal{W}

Writing Guideline Overview: {

\mathcal{W}

} Subsection Writing Guideline: {

n_{ij}

} The subsection already written: {

d_{ij}

} Feedback from Patent Examiner: {Feedback} Based on the subsection writing guideline and the feedback, revise the subsection to ensure it complies with legal and patent regulations while addressing the examiner’s concerns. Do not say anything else. Only output the revised subsection.

Table 13: The prompt for reviewing the subsection already written

d_{ij}

using examiner agent, where

\mathcal{D}

is draft and

n_{ij}

is the subsection of the writing plan

\mathcal{W}

Draft: {

\mathcal{D}

} ¡WritingGuideline¿ {

n_{ij}

} ¡/WritingGuideline¿ ¡Content¿ {

d_{ij}

} ¡/Content¿ ¡Requirement¿ Accuracy should ensure that technical details are clear and precise, aligning with law and technical standards. Logic should follow a natural progression with a clear structure. Comprehensiveness should fully disclose all necessary information required by the writing guideline. Clarity should feature concise and easily understandable language, balancing technical and legal descriptions. Coherence should ensure smooth expression, avoiding any ambiguity or uncertainty. Consistency should maintain uniform terminology, align fully with the draft, and avoid any contradictions. ¡/Requirement¿ Refer to draft and evaluate whether the content meets the requirement provided, based on the given writing guideline. If it complies with the requirement and writing guideline, return ¡Result¿Pass¡/Result¿; if it does not comply, return ¡Result¿Fail¡/Result¿. And you must provide helpful and detailed advice in ¡Advice¿waiting for filling¡/Advice¿ regardless of whether the result is Pass or Fail.

Table 14: Prompt for Zero-Shot Experiment. Where

\mathcal{D}

represents the patent technical draft.

Draft: {

\mathcal{D}

} Format requirements: ¡Patent¿ ¡Title¿ the title of patent ¡/Title¿ ¡Abstract¿ the abstract of patent ¡/Abstract¿ ¡Background¿ the background of patent ¡/Background¿ ¡Summary¿ the summary of the patent ¡/summary¿ ¡Claims¿ the claims of patent ¡/Claims¿ ¡Full Description¿ the full description of patent ¡/Full Description¿ ¡/Patent¿ Please write a complete patent document based on the above patent draft, following the format requirements. The document should be professional, coherent, clear, and precise.

Appendix D Patent Case

D.1 Repetition Error Case

patents generated using SFT exhibit significant repetition errors, whereas our AutoPatent framework produces more coherent content, thanks to the examiner agent. As shown in Figure 5, we display an example of the repetition error in the patent generated by SFT. We can observe that the claims in the figure are meaningless, while in the corresponding real patent, there are only ten claims.

D.2 Comprehensiveness Case

patents generated using the AutoPatent framework are more comprehensive than those produced by other methods, which often fail to generate a complete patent. As shown in Figure 6, we display two examples where zero-shot prompting and SFT fail to generate some parts of the patent. We can observe that the zero-shot prompting method can generate all parts of the patent, but the content is shallow and lacks depth and comprehensiveness, simplifying complex ideas. And the patent generated using SFT even fails to generate the description and claims. As shown in Figure 7, we observe that patents generated using the AutoPatent framework can even include flowcharts, a capability attributed to the planning agent.