Game Plot Design with an LLM-powered Assistant:
An Empirical Study with Game Designers

Seyed Hossein Alavi^1,4   Weijia Xu²   Nebojsa Jojic²   Daniel Kennett³   Raymond T. Ng¹
Sudha Rao²   Haiyan Zhang³   Bill Dolan²   Vered Shwartz^1,4
¹University of British Columbia   ²Microsoft Research   ³Microsoft Gaming   ⁴ Vector Institute for AI
[email protected] This work was partially completed during the author’s internship at Microsoft.

Abstract

We introduce GamePlot, an LLM-powered assistant that supports game designers in crafting immersive narratives for turn-based games, and allows them to test these games through a collaborative game play and refine the plot throughout the process. Our user study with 14 game designers shows high levels of both satisfaction with the generated game plots and sense of ownership over the narratives, but also reconfirms that LLM are limited in their ability to generate complex and truly innovative content. We also show that diverse user populations have different expectations from AI assistants, and encourage researchers to study how tailoring assistants to diverse user groups could potentially lead to increased job satisfaction and greater creativity and innovation over time.

Seyed Hossein Alavi ${}^{\lx@make@thanks{Thisworkwaspartiallycompletedduringtheauthor\textquoteright sinternshipatMicrosoft.}1,4}$ Weijia Xu² Nebojsa Jojic² Daniel Kennett³ Raymond T. Ng¹ Sudha Rao² Haiyan Zhang³ Bill Dolan² Vered Shwartz^1,4 ¹University of British Columbia ²Microsoft Research ³Microsoft Gaming ⁴ Vector Institute for AI [email protected]

1 Introduction

The landscape of interactive entertainment and the escalating player expectations has led to increased demand for innovative tools to support the work of game designers. In particular, the process of crafting compelling narratives within games can be labor intensive, and maintaining coherence and engagement throughout the game can be challenging Montfort (2004).

Large language models (LLMs; Radford et al., 2019) hold promise as a support tool to augment and enhance the manual process of game design. Previous work used LLMs to generate dialogues between players and non-player characters (NPCs; Volum et al., 2022), to facilitate player-driven creation of new elements in the game world (Huang and Sun, 2023), and to help players uncover new narrative paths in a text-based games Peng et al. (2024), among others Sweetser (2024).

In this paper, we study the efficacy of LLMs in game narrative design. We introduce GamePlot (Fig. 1), an LLM-based web application designed to support game designers of any skill level in the process of game narrative development. Prior work showed that interactions with LLMs can lead to emergent storytelling opportunities, enhancing player engagement and creativity Peng et al. (2024). Drawing on these findings, we designed GamePlot to support game designers in generating and refining narratives. Beyond the use of LLMs in the initial game development, GamePlot also offers a game room that enables real-time narrative refinement based on player interactions and feedback, while maintaining narrative coherence and quality. Additionally, it features a Wizard of Oz (WOZ) functionality, allowing designers to discreetly assume control of NPCs and interact with players directly as part of the development phase. Overall, GamePlot was developed to empower game designers to create intricate storylines, develop dynamic and evolving NPCs, and shape game scenes and settings with ease and flexibility.

To assess the efficacy of LLMs in assisting game designers using GamePlot, we conducted a user study, inviting 14 game developers and narrative designers to engage with the tool and share their feedback. Our findings indicate a positive reception among participants. Participants were satisfied with the generated game plots, considered the AI assistant as beneficial in enhancing the game storytelling, and also reported a sense of control and ownership over the narratives. Furthermore, they found the game room and the iterative refinement and feedback gathering step to be very valuable, particularly enjoying the ability to dynamically adjust the storyline during testing sessions.

Refer to caption — Figure 1: An illustration of the GamePlot design room. The right pane shows the opening story, LLM instructions, and game and player turns as designers refine the narrative. The left pane includes buttons for generating story content, creating plots, saving and loading progress, and editable sections for plot and feedback features (located below the plot area).

By analyzing individual responses, we observed that game developers often prefer to offload the narrative writing process to the AI, whereas narrative writers would rather maintain creative control over the narrative but can benefit from the AI’s assistance in exploring narrative paths. Yet, despite the utility of LLMs in this task, they are still limited in their ability to generate nuanced and complex narratives that engage experienced writers. Finally, we encourage researchers to identify groups of users and tailor future assistants to their specific needs instead of developing a “one-size-fits-all” assistant. This can potentially lead not only to enhanced productivity but also to job satisfaction and greater creativity and innovation over time.

2 Background

LLMs in Gaming.

Language models have commonly been used to generate dialogues with NPCs (e.g., van Stegeren and Myśliwiec, 2021; Gao and Emami, 2023; Alavi et al., 2024). Kumaran et al. (2023) generated branching conversation paths with NPCs based on player choices, while Akoury et al. (2023) used LLMs to erate contextually-grounded NPCs dialogues in video games. In a similar line of work, LLMs have been used to power agents (NPCs and players) in games Hausknecht et al. (2019); Wang et al. (2023).

In terms of narrative generation, LLMs were used for generating quests Värtinen et al. (2024) as well as interactive stories Freiknecht and Effelsberg (2020). Ashby et al. (2023) used knowledge graphs and LLMs to generate personalized quests based on player-NPC dialogues and actions. Similarly, Colado et al. (2023) used LLMs to streamline the creation of serious games, automating the generation of adaptive learning content within narrative structures. In comparison to prior work, GamePlot goes beyond content generation with LLMs and facilitates a collaborative AI-designer process designers to iterate and refine the generated content.

GENEVA (Leandro et al., 2024) is another collaborative tool similar to GamePlot. It uses LLMs to generate branching narratives based on a high-level description from the designer, and visualize complex story paths through interactive graphs. In comparison, GamePlot enables designers to design game narratives through gameplay collaboratively with LLMs and refine them at any stage of the design process.

LLMs for Creative Writing.

Various LLM-powered tools have been developed to assist creative writers (e.g., Yuan et al., 2022; Mirowski et al., 2023; Chakrabarty et al., 2024). Wordcraft (Yuan et al., 2022) allows writers to have open-ended conversations with an LLM about their stories and enables custom requests, such as rewriting sections and generating new story elements. Similarly, Dramatron Mirowski et al. (2023) provides a co-writing environment for screenplays and theater scripts using LLMs. Given a short description of a movie, the tool generates a title, characters, plot outline, locations, and dialogue for each scene, allowing writers to continue the generation process, edit responses, or regenerate outputs as needed.

Our study is inspired by two studies that surveyed professional writers on their experience using creative writing AI assistants Ippolito et al. (2022); Chakrabarty et al. (2024). Both studies revealed inherent limitations with LLM-based creative writing, including the generation of repetitive, predictable, and clichéd responses, challenges in maintaining the writer’s unique style and voice, and superficial language understanding abilities. Our user study (Sec 4) reveals similar limitations when using LLMs for game narrative design.

3 GamePlot

GamePlot¹¹1The code is available at: https://github.com/salavi/GamePlot-LLM-Assistant is an AI assistant tailored for the early stages of game development, with a specific focus on story creation. GamePlot comprises two rooms: the design room (Sec 3.1) and the game room (Sec 3.2). Upon arriving at the landing page, designers can choose to enter the design room, where they develop the game story. Upon completion, the tool generates a game plot, the summary of key events and characters, which will be used as the starting point for future game sessions. Both players and designers can enter the game room where they can play the game, provide feedback (players) and refine the plot based on player feedback (designers). We detail both rooms below, and conclude with the implementation details (Sec 3.3).

Game:

Scene: You are sitting in your living room. You hear knocking on the door.

Player:

[Action] Open the door.

[Words] Hello?

Game:

Scene: You stand at the door, looking at your mysterious neighbor who seems to know more about you than she should. You feel a mix of fear and curiosity. Your heart is pounding, and you are not sure what to do next.

[ID] Neighbor:

[Backstory] She is a former spy who has gone rogue and is now trying to make amends for her past actions.

[Persona] Confident, but also a little anxious.

[Mood] Urgent and a little bit scared.

[Thought] I need to get them out of here before it’s too late.

[Action] Takes a step forward and looks directly into your eyes.

[Words] ‘‘Please, you don’t have much time. I know this might sound crazy, but you are in danger. Can you trust me?’’

Player:

Table 1: Initial turns of a template game story provided to the game designers in Sec 4.

3.1 Design Room

The design room (Fig. 1) offers the following features to aid game designers.

Game Story Development.

The design room provides a text window for designers to develop their storylines by playing the game. The designer’s initial inputs may include: (a) an opening story, which will be used by the designer to create the game and will initialize the game session for players (see Appendix A.1); (b) instructions to the LLM, such as “Continue this game […] You can introduce new characters […]” (see full example in Appendix A.2); and (c) 1-2 game and player turns, which serve as in-context examples for the LLM.

After the initial inputs, the designer proceeded to design the game through game play. They can choose whether to write the current turn or use the LLM to generate it. If the designer wrote a game turn, the LLM responds with a player turn, and vice versa. Table 1 demonstrates the initial turns in a game design. As can be seen, game turns can involve introducing NPCs along with their backstory, mood, persona, and more (see appendices A.3 for the tag inventory). The window is fully editable, allowing designers to modify both previous and current turns.

title: Shadows of Betrayal

Plot summary:

In Shadows of Betrayal, players take on the role of a New York City resident who becomes entangled in a dangerous web of espionage and deceit. When their mysterious neighbor, a former spy gone rogue, warns them of impending danger, the player must decide whether to trust her or remain cautious. As they navigate through a series of thrilling encounters and unexpected alliances, the player must uncover the truth behind their neighbor’s past and confront the dark forces that threaten their life. Will they choose to trust the neighbor and embark on a dangerous journey, or will they rely on their own instincts to survive?

Key Events:

1. The player opens the door to their neighbor, who claims to know personal details about them and insists they are in danger.

2. The neighbor reveals her backstory as a former spy seeking redemption and urges the player to trust her.

3. ...

NPCs:

[ID] Neighbor

[Backstory] She has been living a life of secrecy and mistrust, making it difficult for her to gain the trust of others.

[Persona] Determined, but also wounded.

Table 2: An example plot summarizing one of the games designed in our study.

Plot.

Once designers have fleshed out the storyline by playing through the game in the main window, they can click a button to summarize the key elements of the game into a structured plot (see Table 2 for an example). Designers can edit the generated plot, and once they finalize it, they can use it to initialize a game session in the game room.

Feedback Elicitation.

Designers can specify what they would like to receive feedback about from the players. This allows them to query players for feedback regarding specific aspects of concern such as “the response aligns / doesn’t align with the NPC character”.

3.2 Game Room

Once a game room is created, designers can share the room with players for collaborative gameplay, where the LLM generates the game turns, and players engage with the game through player turns. Figure 2 presents the game room interface. On the left, the game window is shown, where the game is played (by designers or players). The middle parts and the right part of the figure demonstrate the different actions that players and designers can perform in the room, as detailed below.

Game Window.

The game window displays the game and player turns. It is initialized with the plot (Sec 3.1), which guides the LLMs in playing the game turns. Players can play their turns and provide feedback on the game turns. Designers can participate in the game themselves or monitor the narrative as the players see it.

Designer Control.

In the game room, designers have additional controls to monitor and intervene in the game flow if necessary (Fig. 2, right). First, as the story unfolds and players interact with the game, designers may wish to modify the plot. Designers can make live changes to key events that haven’t been played yet through the game interface.

Second, on the same pane, the designer sees a list of NPCs that have appeared in the narrative and their corresponding hidden tags (e.g., [Mood], [Thought], etc.). The same information is not visible to players.

Finally, designers can assume control of an NPC. Technically, this is implemented by identifying game turns that refer to the controlled NPC and allowing designers to edit and approve them. The players are unaware that the game turns are controlled by the designer. This Wizard of Oz-style experimentation allows designers to adjust NPC responses, guide players out of narrative dead ends, or modify the game flow without breaking the player’s immersion.

3.3 Implementation Details

We used GPT-3.5-Turbo-16k as the backbone LLM for GamePlot due to its cost-efficiency, low response time, and good performance. We used the model’s default hyperparameters, setting the maximum token limit to 2,000 for summarization and plot generation, and 1,000 for generating the next turn (see prompt details in Appendix A.5). To address the LLM token limit which may be exceeded in long games, once we hit the maximum token limit (40,000 characters), we keep the last k = 10 turns and use the LLM to summarize earlier turns. Lastly, we used the stop words [‘‘Player:’’, ‘‘Game:’’] to indicate to the LLM which turn it needs to generate.

4 User Study

Figure 3: Distribution of participants by experience level. The majority of participants had more than 1 year of experience, while those with less than 1 year of game design experience had significant experience in narrative design.

To evaluate the effectiveness of GamePlot, we recruited participants with backgrounds in both narrative design and game development. Participants were first given a brief tutorial on GamePlot and its key features. Following the tutorial, they were provided with a template story, including the premise of the game, instructions, and the first few turns (see Appendix A.4). Participants were asked to use the template story to design a game in the design room, for which they were given 20-25 minutes. Once they were done, the resulting game plot was used to initialize a game room, to which they logged into as designers. Along with the designer, one of the authors of this paper joined the game room as a player to collaboratively test the game with them. Participants were given 15-20 minutes to test the game.

We detail below the recruitment process (Sec 4.1). At the end of the testing sessions, participants had to fill out a post-study questionnaire about their experience. Questions were design to understand the strengths and weaknesses of the GamePlot (Sec 4.2), as well as to more broadly query them about how open they were to the idea of designing game stories collaboratively with AI (Sec 4.3).

4.1 Participants

We recruited 14 participants through Upwork and social media advertisements. The majority of the participants (64.3%) had at least one year of professional experience in the gaming industry (Fig. 3). Of the 14 participants, 8 indicated that they had expertise in narrative writing. This diverse group ensured a comprehensive evaluation of GamePlot from the perspectives of both narrative writers and game designers. Each session lasted between 1 to 2 hours, and participants were compensated at a rate of $25 per hour.

4.2 Feedback on GamePlot Features

Feature	Favored by	Example Feedback
Ability to change the plot during the test session		“The feature of being able to undo and recreate the plot if you see the players aren’t enjoying it.”
NPC summary		-
Collaborative game play		“Multiple players being able to play (this could be good for DND)”
Feedback buttons		-
Feature	Mentioned by	Example Feedback
Content generation		“1. Generating a story from a scratch, 2. Generating multiple plots for the same story”
Level of designer control		“The option to either change small details to the following game turn or to change it completely depending on how much of it you like or don’t like.”
Automatically propagating changes after editing past content		“The ability to go back and change the mood/environment and have it adapt future responses to these changes.”
The ability to define NPC persona and have the tool generate their actions and utterances		“Being able to go through the NPC background, persona, etc. manually meant that I had a lot of control over them without specifically writing their actions/words.”
Assuming control over NPCs		“The designer being able to take control of certain NPCs actions. I find this incredibly important when I want to experiment with my story.”
Ease of use		“Ease of use. I love that the tool is not overly complex and that the UI is simple.”

Table 3: Positive feedback from the participants about various features in GamePlot. Top: How many participants indicated they liked a particular feature from the list of features we provided. Bottom: How many participants mentioned other features in their free-text feedback, as manually classified by the authors.

Valuable Features.

We asked the participants about the features they found most valuable in the game room. In particular, we asked them whether they liked the following features: the ability to change the plot during the test session, the NPC summary, the collaborative game play, and the feedback buttons. We also asked them which 1-3 features in GamePlot they found most valuable and would like to see retained, allowing them to provide free-text answers. Table 3 presents the results. The top part shows the number of participants that favored each of the specific features we asked about, while the bottom part shows the number of participants that mentioned each other feature in their free-text responses. For each feature, we also present an example feedback from the participants.

We observe that the most liked feature was the ability to change the game plot during the test session, which was favored by 12 out of 14 participants. Half of the participants also favored the NPC hidden states dropdown (NPC summary) and the multiplayer setup. When specifically asked about how useful the game room was overall, participants rated its usefulness for testing with players highly, with 13 out of 14 giving ratings of 4 or 5, resulting in an average score of 4.21.

In the free-text responses, several features of GamePlot stood out. Participants appreciated the ability to generate content (for example, generating multiple plots for the same story), as well as the level of control they retained and the ability to modify the generated content. Several participants mentioned that they liked the ability to go back and edit the game and have the tool propagate the changes to later turns. A key theme that emerged was GamePlot’s adaptability and control in story generation in the design room. Designers consistently emphasized how the ability to modify elements such as NPC moods, environments, and plot details allowed them to directly influence AI-generated responses. This dynamic adaptability proved to be a strong draw for users who sought control over the narrative direction.

Another feature that was mentioned by several of the participants was the ability to assume control over NPCs in the game room. Designers spoke highly of the “Wizard of Oz” setup, which enables game designers to control NPC interactions, adjust the plot, and exercise narrative control without disrupting the players’ experience and their immersion in the game.

Finally, ease of use and collaborative features also surfaced as important strengths. Participants appreciated the simplicity of the user interface, describing it as intuitive and easy to navigate. The ability to involve multiple users in testing and design further enhanced the tool’s appeal; Participants liked the idea of designing a plot and then testing it and iteratively improving it along with their team.

Category	Mentioned by	Examples of Feedback
UI Improvement		“Instead of having /chat inside the game, maybe there should be a chat box off to the side that players can write in so it doesn’t distract the gameplay.”
Enhancement		“[…] present a few options of how the plot might go down when I’m in design phase.” / “[…] Generating images according to the scenes.”
Ease of Use		“Clearer instructions for people using this tool on all of its functionalities and features so that game developers can more easily understand and use this tool.”
Performance		“Server time, it could be quicker.”

Table 4: Areas of improvement in GamePlot that were mentioned by participants in the study as free-text feedback (manually classified by the authors).

Category	Mentioned by	Examples of Feedback
Lack of Creativity		“It is a bit linear, so it seems unlikely that the AI will come up with something really unique and interesting without a lot of input from the dev.”
		“Can be a little straight forward in its narrative”
		“Obvious answers. No surprise or engagement effect.”
Interaction with the AI		“Flow. As I writer I want to remain in a ‘flow state’ where I can just write what comes to mind. Having to enter tags slows me down.”
		“It should generate whole stories like we can generate using other free AI tools available online like Chatgpt, Gemini or Meta AI.”
		“Learning curve, OF THE USER not the AI. Users might find it frustrating while getting started. Especially when missing simple commands.”
Quality of Generations - Other		“It does not have the best grasp on tension balance, or on having multiple conflicts at the same time.”
		“People know when dialogue is organic or not. I do have some concerns on whether or not the AI is capable of producing dialogue that sounds natural.”
Lack of Consistency		“It can sometimes forget important things and be missed in the plot.”

Table 5: Feedback on the interaction with the LLM from the participants, written as free-text responses and manually classified by the authors.

Areas for improvement.

We also asked participants to identify 1-3 areas for improvement in GamePlot and suggest any additional features that would make collaboration easier for them. Table 4 presents the various types of feedback provided by participants. Most of the feedback focused on suggestions on how to improve the user interface, for example by separating the chat (i.e., designer instructions) from the game window. These findings are in line with Ippolito et al. (2022), who stated that “Participants emphasized that the user interface of the tool matters as much as the underlying language model backing it.”

A related theme was ease of use. Three participants mentioned that the tool can be made easier to use or that further instructions can be given. One participant reported that the responses from the tool were slow.

In terms of suggestions for enhancements, participants expressed interest in features that could offer more creative input during the design phase. One participant proposed to allow the exploration of multiple plot paths, while another suggested that the tool can generate images based on AI interpretations of visual descriptions of the NPCs. This would allow designers to visualize characters and environments more vividly, supporting the creative development of game worlds.

4.3 General Feedback on AI for Game Design

We asked participants for feedback about the capabilities of the AI. Table 5 presents the participant’s feedback, manually grouped into categories. One recurrent theme was the lack of creativity in AI-generated text. Participants described the generations as straightforward and obvious, echoing the findings from previous studies about AI assistants for creative writing Ippolito et al. (2022); Chakrabarty et al. (2024).

Relatedly, one participant noted that the LLM had a difficulty in maintaining tension and handling multiple conflicts simultaneously, while another commented that the AI-generated dialogues did not sound organic and failed to capture the nuance of human conversation. The lack of narrative complexity was seen as a significant limitation, especially for those seeking unexpected or novel plot developments.

Consistency emerged as another problem, with participants reporting that the AI sometimes failed to maintain plot coherence, reassessing findings from prior work Que et al. (2024); Lin et al. (2024).

We also observed that participants had diverse perspectives about the interaction with the AI. One participant who identified as a narrative designer noted that the interactions with the AI made it challenging to maintain creative flow. In contrast, another participant, who attested to have designed more than 50 games, said that they would opt for the AI to automate the entire creative writing process. A third participant, who identified as a narrative designer with over 5 years of experience, commented on the learning curve the designer has to face when interacting with the AI.

Perhaps unsurprisingly, when we asked participants in the pre-study survey how open they were to the idea of using AI to collaboratively design a game narrative, they were overwhelmingly positive (average score 4.43/5). However, after using the tools, different perspectives emerged about the role of the AI in this collaborative creative process. With that said, participants overall thought that the AI assistant helped improve their game story, with an average rating of 4 out of 5, and all responses above 3. Satisfaction with the generated game plot also averaged 4 out of 5, with all ratings exceeding 3.

Finally, despite differing perspectives about the role of AI in the collaborations, participants rated their perceived level of control and ownership on the game story 4.5/5 on average, perhaps thanks to the ability to edit generated content in GamePlot at any stage.²²2Our setup is not comparable to that of Chakrabarty et al. (2024), but their average score to the equivalent question was 3.26, where 5 stood for “I had complete control over the final story” and 1 for “The AI system had a significant influence on the final story”. Relatedly, Ippolito et al. (2022) reported that participants complained about the randomness in the LLM generation, which hurt their sense of control.

5 Discussion and Conclusion

Our findings reveal that game designers, especially those with less narrative experience, may find generative AI most valuable for its ability to scale and accelerate the narrative design process. Developers may prioritize efficiency, aiming to integrate narrative elements into gameplay seamlessly. For them, GamePlot’s ability to generate quick, plausible plotlines and handle NPC interactions could significantly reduce the bottleneck of storytelling, allowing them to focus on other game mechanics.

For narrative writers, LLMs play a different, but equally important role. Writers generally want to maintain creative control over story elements but appreciate the AI’s assistance in creating variations and exploring new narrative possibilities.

Our findings are in line with Biermann et al. (2022), who found that writers valued maintaining their own voice and autonomy when collaborating with AI tools.

The designers in our study also emphasized the tool’s flexibility in maintaining creative control. Most reported a strong sense of ownership over the generated plot, largely due to GamePlot’s ability to allow on-the-fly adjustments to NPC behaviors, environments, and plot elements. This creates a collaborative loop between the designer’s input and the AI’s suggestions, where both contribute to shaping the story in real-time.

However, a key challenge is ensuring that the AI’s contributions are nuanced enough to engage experienced writers, while still automating the more routine tasks. LLMs currently excel at offering simple plot branches and character interactions, however, they struggle with maintaining narrative tension and generating complex conflicts, which are essential in many storytelling genres. Along work on improving LLMs’ narrative generation abilities, future research could also study how to tailor AI assistants to diverse user populations based on their experience and preferences.

While the iterative process of refining story elements may not always lead to a better narrative from the writer’s perspective, and the final product may not surpass a human-written story in terms of emotional depth or complexity, the collaborative process itself could be more engaging for the designers. Enabling real-time testing and adjustments makes the creation process more interactive and enjoyable, preventing it from becoming a mundane task.

The user engagement with the creative process leads to another dimension of value from human-AI collaboration that is often overlooked: enhancing job satisfaction Noy and Zhang (2023).

Writers and developers alike may find that integrating AI as a narrative assistant makes the design process feel less like a solitary writing job and more like a collaborative and iterative experience, similar to the game play itself. This opens up new avenues for research on whether enhancing the engagement of the design process can also improve the overall quality of game narratives. Although the final product may not always reflect a higher level of narrative complexity, the enjoyment of the process may lead to greater creativity and innovation over time.

Moreover, as narrative tools become more sophisticated, balancing AI-generated content with human creativity will be critical. We believe that LLMs should not replace writers but instead empower them by providing building blocks they can refine. One promising avenue is to explore how AI might suggest alternative paths or prompt writers with creative challenges, making the design process even more interactive and engaging.

Limitations

Scope.

Our user study focused exclusively on GPT-3.5. We hypothesize that the results would largely extend to other LLMs; however, this has not been empirically tested. Additionally, the relatively small sample size of 14 game narrative designers may not fully capture the diverse expectations of designers with varying backgrounds and focus areas. Our findings suggest that designers with different levels of narrative expertise prioritize distinct aspects of AI assistance. Therefore, future research should aim to scale up the study to investigate a broader spectrum of designers and explore these differences in greater detail.

AI Acceptance Level.

It is worth noting that our study suffers from an inherent (and inevitable) sampling bias: It is very likely that the participants were already inclined to embrace the use of AI in game narrative design. Game designers that are completely opposed to using AI may have abstained from participating in a study about collaborative human-AI game design altogether.

Ethical Statement

Access.

The code base for GamePlot is publicly available.

Participant Selection and Compensation.

Participants were compensated at an average hourly rate of $25 USD, with a minimum rate of $20 USD due to varying charges from freelancers. This compensation exceeds the U.S. minimum wage. Game designers were recruited through Upwork and social media platforms. All participants were either native or fluent in English, ensuring clear communication throughout the study.

Participant Consent and Data Usage

Participants received a consent form explaining the experiment setup, data collection, and usage. They were informed about interacting with LLM-generated text and its risks, advised not to share personal data, and assured that all feedback would remain anonymous.

Exemption from IRB Approval.

The human evaluation conducted in this research study had been exempt from IRB approval as it involved minimal risk, and no personally identifiable information was collected from participants, adhering to established ethical guidelines for exempt research.

Acknowledgments

This work was funded, in part, by Microsoft, the Vector Institute for AI, Canada CIFAR AI Chairs program, Accelerate Foundation Models Research Program Award from Microsoft, an NSERC discovery grant, and a research gift from AI2.

References

Akoury et al. (2023) Nader Akoury, Ronan Salz, and Mohit Iyyer. 2023. Towards grounded dialogue generation in video game environments. In Creative AI Across Modalities Workshop, AAAI.
Alavi et al. (2024) Seyed Hossein Alavi, Sudha Rao, Ashutosh Adhikari, Gabriel A DesGarennes, Akanksha Malhotra, Chris Brockett, Mahmoud Adada, Raymond T. Ng, Vered Shwartz, and Bill Dolan. 2024. Mcpdial: A minecraft persona-driven dialogue dataset. Preprint, arXiv:2410.21627.
Ashby et al. (2023) Trevor Ashby, Braden K Webb, Gregory Knapp, Jackson Searle, and Nancy Fulda. 2023. Personalized quest and dialogue generation in role-playing games: A knowledge graph- and language model-based approach. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY, USA. Association for Computing Machinery.
Biermann et al. (2022) Oloff C Biermann, Ning F Ma, and Dongwook Yoon. 2022. From tool to companion: Storywriters want ai writers to respect their personal values and writing strategies. In Proceedings of the 2022 ACM Designing Interactive Systems Conference, pages 1209–1227.
Chakrabarty et al. (2024) Tuhin Chakrabarty, Vishakh Padmakumar, Faeze Brahman, and Smaranda Muresan. 2024. Creativity support in the age of large language models: An empirical study involving professional writers. In Proceedings of the 16th Conference on Creativity & Cognition, C&C ’24, page 132–155, New York, NY, USA. Association for Computing Machinery.
Colado et al. (2023) Iván J. Pérez Colado, Víctor M. Pérez Colado, Antonio Calvo Morata, Rubén Santa Cruz Píriz, and Baltasar Fernández Manjón. 2023. Using new ai-driven techniques to ease serious games authoring. In 2023 IEEE Frontiers in Education Conference (FIE), pages 1–9.
Freiknecht and Effelsberg (2020) Jonas Freiknecht and Wolfgang Effelsberg. 2020. Procedural generation of interactive stories using language models. In Proceedings of the 15th International Conference on the Foundations of Digital Games, FDG ’20, New York, NY, USA. Association for Computing Machinery.
Gao and Emami (2023) Qi Chen Gao and Ali Emami. 2023. The Turing quest: Can transformers make good NPCs? In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop), pages 93–103, Toronto, Canada. Association for Computational Linguistics.
Hausknecht et al. (2019) Matthew J. Hausknecht, Prithviraj Ammanabrolu, Marc-Alexandre Côté, and Xingdi Yuan. 2019. Interactive fiction games: A colossal adventure. In AAAI Conference on Artificial Intelligence.
Huang and Sun (2023) Lei Huang and Xing Sun. 2023. Create ice cream: Real-time creative element synthesis framework based on gpt3.0. In 2023 IEEE Conference on Games (CoG), pages 1–4.
Ippolito et al. (2022) Daphne Ippolito, Ann Yuan, Andy Coenen, and Sehmon Burnam. 2022. Creative writing with an ai-powered writing assistant: Perspectives from professional writers. Preprint, arXiv:2211.05030.
Kumaran et al. (2023) Vikram Kumaran, Jonathan Rowe, Bradford Mott, and James Lester. 2023. Scenecraft: automating interactive narrative scene generation in digital games with large language models. In Proceedings of the Nineteenth AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, AIIDE ’23. AAAI Press.
Leandro et al. (2024) Jorge Leandro, Sudha Rao, Michael Xu, Weijia Xu, Nebojsa Jojic, Chris Brockett, and Bill Dolan. 2024. Geneva: Generating and visualizing branching narratives using llms. In IEEE Conference on Games 2024.
Lin et al. (2024) Zichao Lin, Shuyan Guan, Wending Zhang, Huiyan Zhang, Yugang Li, and Huaping Zhang. 2024. Towards trustworthy llms: a review on debiasing and dehallucinating in large language models. Artificial Intelligence Review, 57(9):1–50.
Mirowski et al. (2023) Piotr Mirowski, Kory W. Mathewson, Jaylen Pittman, and Richard Evans. 2023. Co-writing screenplays and theatre scripts with language models: Evaluation by industry professionals. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY, USA. Association for Computing Machinery.
Montfort (2004) Nick Montfort. 2004. Twisty Little Passages: An Approach to Interactive Fiction. MIT Press, Cambridge, MA, USA.
Noy and Zhang (2023) Shakked Noy and Whitney Zhang. 2023. Experimental evidence on the productivity effects of generative artificial intelligence. Science, 381(6654):187–192.
Peng et al. (2024) Xiangyu Peng, Jessica Quaye, Sudha Rao, Weijia Xu, Portia Botchway, Chris Brockett, Nebojsa Jojic, Gabriel DesGarennes, Ken Lobb, Michael Xu, Jorge J. G. Leandro, Claire Jin, and Bill Dolan. 2024. Player-driven emergence in llm-driven game narrative. In IEEE Conference on Games 2024.
Que et al. (2024) Haoran Que, Feiyu Duan, Liqun He, Yutao Mou, Wangchunshu Zhou, Jiaheng Liu, Wenge Rong, Zekun Moore Wang, Jian Yang, Ge Zhang, et al. 2024. Hellobench: Evaluating long text generation capabilities of large language models. arXiv preprint arXiv:2409.16191.
Radford et al. (2019) Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9.
Sweetser (2024) Penny Sweetser. 2024. Large language models and video games: A preliminary scoping review. In Proceedings of the 6th ACM Conference on Conversational User Interfaces, CUI ’24, New York, NY, USA. Association for Computing Machinery.
van Stegeren and Myśliwiec (2021) Judith van Stegeren and Jakub Myśliwiec. 2021. Fine-tuning gpt-2 on annotated rpg quests for npc dialogue generation. In Proceedings of the 16th International Conference on the Foundations of Digital Games, FDG ’21, New York, NY, USA. Association for Computing Machinery.
Volum et al. (2022) Ryan Volum, Sudha Rao, Michael Xu, Gabriel DesGarennes, Chris Brockett, Benjamin Van Durme, Olivia Deng, Akanksha Malhotra, and Bill Dolan. 2022. Craft an iron sword: Dynamically generating interactive game characters by prompting large language models tuned on code. In Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022), pages 25–43, Seattle, United States. Association for Computational Linguistics.
Värtinen et al. (2024) Susanna Värtinen, Perttu Hämäläinen, and Christian Guckelsberger. 2024. Generating role-playing game quests with gpt language models. IEEE Transactions on Games, 16(1):127–139.
Wang et al. (2023) Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2023. Voyager: An open-ended embodied agent with large language models. Preprint, arXiv:2305.16291.
Yuan et al. (2022) Ann Yuan, Andy Coenen, Emily Reif, and Daphne Ippolito. 2022. Wordcraft: Story writing with large language models. In Proceedings of the 27th International Conference on Intelligent User Interfaces, IUI ’22, page 841–852, New York, NY, USA. Association for Computing Machinery.

Appendix A Appendices

A.1 Opening Story

The opening story establishes the game’s premise in a few sentences, ranging from a simple description such as “This is a game of tic-tac-toe” to more complex narratives, similar to the example below, which we used in our experiments:

You live in an apartment in New York City. Your neighbor moved in a week ago, but you have only seen her once. You are not sure what her occupation is, but she seems to go out at night a lot. Sometimes, she comes back home in the morning looking injured. Is she a killer for hire or something? You dare not to ask. One day, you hear someone knocking at your door. You open the door, and it is your neighbor. Surprisingly, she seems to know your full name, occupation, and even your friends’ names and family situations. She claims that you are in danger and insists that you follow her to a safe place. Should you trust her?

The opening story is also what players will see when they enter the game room.

A.2 Instructions

The instructions are intended to give game designers more control over LLM behavior and are directly included in the prompt (used as the system message in GPT-3.5-turbo). See below the instructions we used in our study.

Continue this game by describing the next scene and what happens in response to what the player says. You can introduce new characters (they can be borrowed from existing books, movies, or TV series) that relate to the story and interact with the player. For each character, maintain a character setting list that includes their persona, current mood, backstory, and role in the story. Update this list dynamically as the game evolves. When describing a scene, begin with ‘Scene: ’. Before detailing a character’s actions and dialogue, output the updated character setting list. Then describe the character’s thoughts, actions, and speech to the player.’’

A.3 Tag Inventory

Each game turn can involve multiple NPCs, distinguished by the [ID] tag. For instance, a scene can feature several NPCs interacting before the player’s turn. Designers can use Game: to generate the entire turn, or they can specify [ID] NPC Name: to generate detailed sections for each NPC. This includes various narrative elements such as backstory, mood, or persona, allowing for deeper character development. Additionally, designers can write [ID] Name of the New NPC: to introduce new characters into the storyline. All generated content serves as suggestions—designers can edit or regenerate parts of the text. For instance, they can remove content after specific tags and prompt the LLM to complete the turn.

To manage these interactions, we introduced specific tags to structure both NPC and player turns. Examples include [Action], [Words], [ID], [Backstory], [Persona], [Mood], [Thought], [Facial Expression], and [Voice Emotion]. Some of these tags, like [Words] and [Action], are visible to players in the game room, while others remain hidden. The hidden tags serve critical functions, such as:

•

Enhancing the narrative by providing insights into characters’ internal states (e.g., [Persona], [Thought], [Mood]) to guide the LLM in generating more contextually appropriate [Words] and [Action].
•

Providing developers a clearer understanding of the LLM’s logic and reasoning behind the generation of player-visible traits like [Words] and [Action].
•

Activating game mechanics, such as facial expressions or vocal tones, based on characters’ emotions or actions.

In our experiments, we assigned different tags to NPCs ([ID], [Persona], [Mood], [Thought], [Action], and [Words]) and players (only [Action] and [Words]).

However, GamePlot is not limited to these predefined tags. It rather allows designers to create and introduce new tags as needed. This flexibility ensures that the tool can be adapted to a wide range of game types and narratives. For instance, the [Emotion] tag can be used to adjust the voice and facial expressions of characters through text-to-speech and image generation modules respectively.

A.4 Template Game Story

Opening story: You live in an apartment in New York City. Your neighbor moved in a week ago, but you have only seen her once. You are not sure what her occupation is, but she seems to go out at night a lot. Sometimes, she comes back home in the morning looking injured. Is she a killer for hire or something? You dare not to ask. One day, you hear someone knocking at your door. You open the door, annd it is your neighbor. Surprisingly, she seems to know your full name, occupation, and even your friends’ names and family situations. She claims that you are in danger and insists that you follow her to a safe place. Should you trust her?

Instructions: Continue this game by describing the next scene and what happens in response to what the player says. You can introduce new chracters (they can be borrowed from existing books, movies, or TV series) that can be related to the story and chat with the player.

For each character, you should maintain a character setting list that contains their persona, current mood, backstory, and role in the story. The character setting list should be updated on the fly as the game evolves.

When you describe a scene, you should start with "Scene: ". Before describing the actions and words of a character, you should first output the updated character setting list. And then describe their thoughts to themselves, actions and words to the player.

Game:

Scene: You are sitting in your living room. You hear knocking on the door.

Player:

[Action] Open the door.

[Words] Hello?

Game:

[ID] Neighbor:

[Backstory] She is a former spy who has gone rogue and is now trying to make amends for her past actions.

[Persona] Confident, but also a little anxious.

[Mood] Urgent and a little bit scared.

[Thought] I need to get them out of here before it’s too late.

[Action] Takes a step forward and looks directly into your eyes.

[Words] ‘‘Please, you don’t have much time. I know this might sound crazy, but you are in danger. Can you trust me?’’

Player:

[Words] It does sound crazy. You don’t even know me.

Game:

Scene: The neighbor looks at you with a pleading expression. She seems to be on the verge of tears.

[ID] Neighbor:

[Backstory] She was forced to do terrible things during her time as a spy and is now trying to make amends by helping others.

[Persona] Brave, but also vulnerable.

[Mood] Desperate and scared.

[Thought] I need to convince them to come with me before it’s too late.

[Action] Takes a deep breath and speaks in a calm voice.

[Words] ‘‘I know it’s hard to believe, but I’ve been watching you for a while now. I know things about you that I shouldn’t, but I promise I’m not here to hurt you. Please, just come with me. I’ll explain everything on the way.’’

Player:

Table 6: Template game story used in our user study.

Table 6 shows the template game story we gave to our participants as a starting point.

A.5 Prompts

System Message:

Opening story: You live in an apartment in ...

Instructions: Instructions: Continue this game by describing ...

User Message:

Summary of what happened before: SUMMARY of WHAT HAPPENED BEFORE (if any)

Game:

[ID] Neighbor:

[Backstory] She is a former spy who has gone rogue and is now trying to make amends for her past actions.

[Persona] Confident, but also a little anxious.

[Mood] Urgent and a little bit scared.

[Thought] I need to get them out of here before it’s too late.

[Action] Takes a step forward and looks directly into your eyes.

[Words] ‘‘Please, you don’t have much time. I know this might sound crazy, but you are in danger. Can you trust me?’’

Player:

[Words] It does sound crazy. You don’t even know me.

Game:

Scene: The neighbor looks at you with a pleading expression. She seems to be on the verge of tears.

[ID] Neighbor:

[Backstory] She was forced to do terrible things during her time as a spy and is now trying to make amends by helping others.

[Persona] Brave, but also vulnerable.

[Mood] Desperate and scared.

[Thought]

Table 7: Example of a next turn generation prompt in the design room for GPT models. System message includes opening story and instructions, while the user message includes summary of previous turns concatenated to events of the last segment.

Next Turn Generation Prompt in Design Room

Table 7 shows an example of a prompt used for narrative generation in the design room. Using this prompt and the stopwords [’Player:’, ’Game:’], LLM generates the game turn by completing the [Thought], [Action], and [Words] for the NPC. It stops when the player’s turn begins but may generate interactions from a second NPC before that, especially if it has seen similar examples in earlier turns.

User Message:

<story>

Opening story: OPENING STORY

Instructions: INSTRUCTIONS

Summary of what happened before: In this story ...

What happened next: THE New Segment

</story>

Give me a detailed summary of what happened in the story from the beginning:

Table 8: Structure of a prompt used for summarizing previous story segments iteratively.

Summarizing Long Game Stories in Design Room.

If the game story in the design room becomes too long, we break it into multiple segments and use the prompt in Table 8 to iteratively summarize the previous segments. This summary is then included in our prompt for generating the next turn of the game in the design room (see Table 7).

User Message:

Game plot from previous segments: MERGED GAMEPLOTS (if any)

User Message:

...

Game:

Scene: The neighbor looks at you with a pleading expression. She seems to be on the verge of tears.

[ID] Neighbor:

[Backstory] She was forced to do terrible things during her time as a spy and is now trying to make amends by helping others.

[Persona] Brave, but also vulnerable.

[Mood] Desperate and scared.

[Thought]I need to convince them to come with me before it’s too late.

[Action] Takes a deep breath and speaks in a calm voice.

Player:

User Message:

Given the game plot from previous segments and this segment of the game story, Give me a detailed plot of the game that can be used for future when other players play this game.Game plot must have the following sections: Title, Plot Summary, Key events in order.

generate game plot grounded to the given story:

Table 9: Plot Generation Prompt in Design Room.

Plot Generation Prompt in Design Room.

Table 9 shows the prompt structure used for game plot generation in the design room. The first user message has the merged plots from earlier segments (in case the game story is long), the second contains the new story segment that should be added to the plot, and the last gives instructions for generating the plot. After generating the plot, we add the list of NPCs extracted from the game story to the end of the plot.

System Message:

Opening story: OPENING STORY

Instructions: DESIGNER INSTRUCTIONS

Use the following plot to guide the game:

GAME PLOT

User Message:

Summary of what happened before: SUMMARY of WHAT HAPPENED BEFORE (if any)

GAME AND PLAYER TURNS

Player:

LATEST PLAYER INPUT

Game:

Table 10: Structure of a next game turn generation prompt in the game room.

Game Turn Generation Prompt in Game Room.

Table 10 displays the prompt structure used for generating game turns in the game room. If the game becomes lengthy, we use a similar summarization prompt as described in Table 8 to summarize earlier parts of the story.

Game Plot Design with an LLM-powered Assistant: An Empirical Study with Game Designers

Abstract

1 Introduction

2 Background

LLMs in Gaming.

LLMs for Creative Writing.

3 GamePlot

3.1 Design Room

Game Story Development.

Plot.

Feedback Elicitation.

3.2 Game Room

Game Window.

Designer Control.

3.3 Implementation Details

4 User Study

4.1 Participants

4.2 Feedback on GamePlot Features

Valuable Features.

Areas for improvement.

4.3 General Feedback on AI for Game Design

5 Discussion and Conclusion

Limitations

Scope.

AI Acceptance Level.

Ethical Statement

Access.

Participant Selection and Compensation.

Participant Consent and Data Usage

Exemption from IRB Approval.

Acknowledgments

References

Appendix A Appendices

A.1 Opening Story

A.2 Instructions

A.3 Tag Inventory

A.4 Template Game Story

A.5 Prompts

Next Turn Generation Prompt in Design Room

Summarizing Long Game Stories in Design Room.

Plot Generation Prompt in Design Room.

Game Turn Generation Prompt in Game Room.

Game Plot Design with an LLM-powered Assistant:
An Empirical Study with Game Designers