This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

What if we have Meta GPT? Content Singularity and Human-Metaverse Interaction in AIGC Era

Lik-Hang Lee Hong Kong Polytechnic University, Hong Kong SAR    Pengyuan Zhou University of Science and Technology of China, China    Chaoning Zhang Kyung Hee University, South Korea    Simo Hosio University of Oulu, Finland
Abstract

The global metaverse development is facing a “cooldown moment”, while the academia and industry attention moves drastically from the Metaverse to AI Generated Content (AIGC) in 2023. Nonetheless, the current discussion rarely considers the connection between AIGCs and the Metaverse. We can imagine the Metaverse, i.e., immersive cyberspace, is the black void of space, and AIGCs can simultaneously offer content and facilitate diverse user needs. As such, this article argues that AIGCs can be a vital technological enabler for the Metaverse. The article first provides a retrospect of the major pitfall of the metaverse applications in 2022. Second, we discuss from a user-centric perspective how the metaverse development will accelerate with AIGCs. Next, the article conjectures future scenarios concatenating the Metaverse and AIGCs. Accordingly, we advocate for an AI-Generated Metaverse (AIGM) framework for energizing the creation of metaverse content in the AIGC era.

1 Retrospect: Experimental Metaverse

We have witnessed a surge of investment and rigorous discussion regarding the Metaverse since 2021. Many believe a fully realized metaverse is not far off, so tech firms, e.g., Meta, Niantic, Roblox, Sandbox, just to name a few, have started creating their immersive cyberspaces with diversified visions and business agendas. After the metaverse heat wave in 2022, all of us are still as vague about what the Metaverse is. At the same time, the hype surrounding the metaverse shows a sign of slowing down anytime soon, primarily due to multiple metrics reflecting constantly low numbers of active daily users, the decreasing volume of projects, and the high uncertainty of return on investment.

When the tech giants dipped their toes into the experimentation pool in 2022, they brought a few playful tasks to their self-defined virtual venues, giving users something to do. The fascinating difficulty is that the metaverse is already fundamentally split among the forwarding-thinking firms establishing their metaverse realm. Due to limited time and resources, these firms tried hard to resolve technical issues that shape their immersive cyberspace, such as developing efficient infrastructure that supports unlimited numbers of users in the same virtual venues or offering a decentralized transaction ecosystem driven by blockchain technology.

Nonetheless, content development is delegated to third parties and thus goes beyond the firms’ core concerns. Tech firms commonly leave content creation to the designers and creators, having an unattainable hope that designers and creators can fill up the rest of the metaverse. As a result, one can argue the current virtual spaces have become aimless, primarily caused by the lack of content and, therefore, activities, while users cannot find good reasons to spend time at such venues daily. Moreover, the experimental metaverses of 2022 often neglect usability issues leading to user experiences far from satisfactory. A prominent example is that first-time users struggle to understand the interaction techniques with their avatars in 3D virtual environments. Even worse, after hours of practice, these unskillful users can still not master such interaction techniques, causing poor usability entirely. Without addressing the gaps in content and usability, the firms’ ambition exceeds what is practically feasible. Their ambition refers to the massive uses of the Metaverse, i.e., the immersive cyberspace [1]. The core values surrounding the users are not there yet to make the Metaverse a reality.

We can briefly look back at the transition from the static web (Web 1.0) to its interactive counterpart (Web 2.0) in the 2D-UIs era, characterized by the empowerment of content creation. Among the static webpages in Web 1.0, limited people with relevant skills can publish information online. At the same time, users can only read the information but have no way to make a two-way interaction. Accordingly, Web 2.0, also known as social networks (SNS), offers participatory and dynamic methods and empowers two-way user interaction, i.e., reading and writing information in 2D UIs. The critical transition from Web 1.0 to 2.0 is that users, regardless of their technology literacy, can freely contribute content on SNS, such as text and images, and then put the content online. We must note that we are good at writing a message on a (soft-)keyboard and taking photos or videos with cameras. Also, most 2D UIs follow certain design paradigms, requiring only simple yet intuitive interactions like clicks, taps, swipes, drags, etc., to accomplish new content creation.

In contrast, although the metaverse supposedly allows everyone to access many different virtual worlds, three unprecedented barriers arise. First, the current users have extensive experience with 2D UIs but not their 3D counterparts. As the entire experiences proceed with 3D UIs, the users in the Metaverse have to deal with unfamiliar virtual worlds with increasing complexity. More importantly, 3D objects do not explicitly show user interaction cues. As the Metaverse claims to be digital twins of our physical environment [2], a user encounters a virtual chair and employs analogies between the virtual and physical worlds. A simple question could be: Can the user’s virtual hands lift or push the chair? As such, users, in general, may not be aware of how virtual objects interact in the Metaverse and thus rely on educated guesses and trial-and-error approaches. The above question can be generalized into sub-questions, including but not limited to: What are the available interaction techniques? When to activate the user-object interaction? How does the user understand the related functions mapped to the object? How can we manage the user expectation after a particular click? Which visual and audio effects impact the user’s task performance?

Refer to caption
Figure 1: AIGCs can prevent us from falling into another ‘Web 1.0’ in the metaverse era – the layman end-users suffer from the missing capability of creating unique content. We are natively skilful at texting and photo-taking in social networks but not editing 3D content in virtual 3D spaces. AIGCs may serve as a saviour to enable the general users freely express themselves, while owners of the platforms or virtual spaces can still delegate the content creation tasks to the peer users.

Second, the current interaction techniques allow users to manipulate a virtual object, such as selecting, rotating, translating, etc. Still, user efforts in object manipulation are a big concern. Commercial input hardware for headsets (e.g., controllers or joysticks) or even hand gestural inputs are barely sufficient for simple point-and-select operations on 2D UIs in virtual environments [3] but largely insufficient for 3D models, especially those with irregular shape causing intolerably long editing time and high dissimilarity with the intended shape [4]. Therefore, users with the current techniques, primarily point-and-select or drag-and-drop, can only manipulate objects with low granularity. However, content creation involves careful manipulation of a 3D object, i.e., modifying the vertex positions in great detail. Even though nowadays users engage in an immersive 3D environment, most can only create 2D texts and select some standard 3D objects from an asset library. The creation of metaverse content is not fully leveraged by the current authoring tools and the existing techniques supporting user interaction with the Metaverse. In the past two decades, the human-computer interaction community has attempted to alleviate the ease of user interaction in diversified virtual environments. Nonetheless, usability gaps still exist, resulting in low efficiency and user frustration [5]. We see that such gaps will not be overcome if we purely rely on investigating user behaviours with alternative interfaces and interaction techniques, especially since the tasks inside virtual 3D spaces grow more complicated.

Third, creating large objects, e.g., a dragon floating in mid-air, requires a relatively spatial environment. Users unavoidably encounter a lot of distal operations between the user position and the virtual creation. It is worth mentioning that users are prone to errors during such distal operations. A prior work [6] provides evidence that users with headsets achieve lower pointing accuracy to a distal target. Considering such complicated operations in content creation, typical metaverse users can not immediately create objects except those already in the asset library. In other words, metaverse users have no appropriate approaches to unleash the full potential of creating content in the endless canvas of the Metaverse. Instead, they hire professionals to draw and mould virtual instances on traditional desktops. For virtual space owners, a team of professionals, e.g., unity developers, may spend days or weeks creating virtual environments. Further change requests (e.g., adding a new 3D model) for such environments may take additional hours or days. Without time or skills, general users can only experience the contents being built by virtual space owners. As shown in Figure 1, this rigid circumstance is analogous to the ‘read mode’ in Web 1.0. Creating unique metaverse content has become highly inconvenient and demanding. We will likely face the circumstance of ‘Web 1.0’ in 3D virtual worlds, with some features inherited from Web 2.0, such as making new texts and uploading photos.

To alleviate the barriers mentioned above, this article argues for using AI-generated content (AIGCs) in both content generation in the metaverse and AI-mediated user interaction in the metaverse. The article has a vision that the GPT-alike model can trigger content singularity in the Metaverse and assist interaction between human users and virtual objects in the Metaverse. Before we move on to the main discussion, we provide some background information regarding the Metaverse and AIGCs, as follows.

Refer to caption
Figure 2: Generating a vessel that fits the context of Victoria Habour, Hong Kong. As a result, a junk boat appears: Original view (Left), Sketching (middle), the generated vessel on top of the physical world (Right).

Metaverse: The Metaverse refers to the NEXT Internet featured with diversified virtual spaces and immersive experiences [1]. Similar to existing cyberspace, we can regard the Metaverse as a gigantic application that simultaneously accommodates diverse types of countless users. The application comprises computer-mediated worlds under the Extended Reality (XR) spectrum and emerging derivatives like Diminished Reality (DR). Ideally, users will create content and engage in activities surrounding such content. Multitudinous underlying technologies serve as the backbone of the Metaverse, including AI, IoT, mobile networks, edge and cloud servers, etc. Among the technologies, we can view AI as the fuel to support the automation of various tasks and content creation. Our discussion in this article goes beyond the well-known applications, including creating avatars, virtual buildings, virtual computer characters and 3D objects, automatic digital twins, and personalized content presentation [2].

AI-Generated Content (AIGC): Apart from the analytical AI focusing on traditional problems like classification, AIGC can leverage high-dimensional data, such as text, images, audio, and video, to generate new content. For instance, OpenAI announces its conversational agent, ChatGPT [7], of which the latest GPT-3 and GPT-4 can create texts and images, respectively. Moreover, the generated content can support the generation of metaverse objects, such as speech for in-game agents, 3D objects, artist artefacts and background scenes in many virtual worlds. The most popular techniques, including GANs, Diffusion models, and transformer architectures, support the challenging context-to-content task. It is important to note that generative AI and AIGC differ subtly [8]. AIGC focuses on content production problems, whereas generative AI analyzes the underlying technological underpinnings that facilitate the development of multiple AIGC activities.

2 Content Singularity

The most widely used metaverse applications have appeared in industrial applications in the past two decades [9]. The firms have the resources to build up proprietary systems and prepare the content of their interested domain. The work content drives the adoption of AR/VR applications in industrial sectors, with the following two examples. First, labour at warehouse docks and assembly lines can obtain helpful information (e.g., the next step) through the lens of AR [10]. Second, personnel at elderly caring centres can nurture compassion from perspective-taking scenarios of virtual reality (VR) [11]. Content is one of the incentives, and end-users achieve enhanced abilities or knowledge, perhaps resulting in better productivity.

As we discussed the main three barriers in Retrospect, users have limited ability and resources to create unique content in the Metaverse. The general users can only draw some simple yet rough sketch to indicate an object in Extended Reality. Nonetheless, such expressiveness is insufficient for daily communication or on-site discussion for specific work tasks. We may expect the content on AR devices to be no worse than what we have in Web 2.0. To alleviate the issue, AIGCs can play an indispensable role in lowering the barriers and democratizing content creation. Figure 2 illustrates a potential scenario where users can effectively create content in virtual-physical environments. For instance, a user with an AR device is situated in a tourist spot and attempts to show the iconic vessels to explain the cultural heritage of Hong Kong’s Victoria Harbour. First, the AIGC model can understand the user’s situation and context through sensors on the AR device, for instance, depth cameras. Second, the users can make a dirty and speedy sketch to indicate the shape and position of the generated object. In addition, the prompt which contains the user’s description, i.e., a prompt of ‘a vessel fits this view’, is sent to the AIGC model through methods like speech recognition. It is important to note that our speech always involves ‘this’ or ‘that’ to indicate a particular scene and object. The AIGC model can employ the user’s situation and context in such a scenario. Finally, a junk boat appears in the Victoria Habour through the lens of AR.

Singularity can refer to a point in time or a condition in which something undergoes a significant and irreversible milestone, depending on the context of such changes. Also, it is frequently used in technology and artificial intelligence (AI) to describe the hypothetical moment when robots or AI transcend human intellect and become self-improving or perhaps independent [12]. This notion is also known as technological singularity or AI singularity. It is a contentious issue to the Metaverse when AIGCs are widely adopted by end users. We believe the occurrence of AI-generated content might have far-reaching consequences for cyberspace. Next, the concept of content singularity refers to the belief that we are reaching a time when there will be abundant virtual material available on the Internet that people will consume as their daily routine. This is owing to the demand for immersive cyberspace and related technological ability, perhaps AIGCs, to pave the path towards the exponential proliferation of virtual 3D content. This is similar to the social network in which people contribute and consumer content.

Since the launch of ChatGPT111https://openai.com/blog/chatgpt, pioneering prototypes shed light on the daily uses of GPT-driven intelligence on AR wearables, such as generating simple 3D contents using WebAR (a.frame) by entering prompts222https://www.youtube.com/watch?v=J6bSCVaXoDs&ab_channel=ARMRXR and providing suggested answers for conversations during datings and job interviews333https://twitter.com/bryanhpchiang/status/1639830383616487426?cxt=HHwWhMDTtfbC7MEtAAAA. These examples go beyond the industrial scenarios, implying that AIGC-driven conversational interfaces can open new opportunities for enriching virtual-physical blended environments [13]. Generative AI models can recognise the user context using the sensors on mobile devices (e.g., cameras on AR headsets or smartphones) to generate appropriate objects according to given prompts. In this decade, general users will treat generative AI models as utilities like water, electricity, and mobile network. Meanwhile, the metaverse is an endless container to display AI-generated content so users can read and interact with the AI utility mid-air. Users can make speech prompts to generative AI models to create characters, objects, backdrop scenes, buildings, and even audio feedback or speeches in virtual 3D environments. These content generations should not pose any hurdle or technical difficulties to the general users. It will be as simple as posting a new photo on Instagram, typing a tweet on Twitter, or uploading a new video on Tiktok. The lowered barrier will encourage people to create content, and more content consumers will follow, eventually leading to a metaverse community. In addition, rewarding schemes should be established when the content singularity arrives to sustain the content creation ecosystem. AIs, the data owners behind them, and users become the primary enablers and principal actors, respectively. The way of splitting the reward among them is still unknown, and ongoing debates will continue.

Generative AI models are obviously drivers of content generation. But we should not neglect its potential of removing contents, primarily physical counterparts, through the lens of XR devices, also known as Diminished Reality (DR). It is important to note that the naive approach of overlaying digital content on top of the physical world may hurt the user experience. A virtual instance may not match the environmental context, and it may be necessary to change the context to show better perceptions when the metaverse application strongly relates to daily functions. We may accept a virtual Pokémon appearing on top of a physical rubbish bin. However, we feel weird when a virtual table overlaps the physical table being disposed of. Therefore, AIGCs may serve as a critical step of DR to smoothen the subsequent addition of digital overlays (AR). In this sense, the demands of AIGCs will penetrate throughout the entire process of metaverse content generation. More importantly, the diminished items should comply with the user’s safety or ethical issues. Hiddening a warning sign may put the users in danger. Also, putting off a person’s clothes may show inappropriate content, i.e., a naked body. It is essential to reinforce regulation and compliance when generative AI models are widely adopted in the content-generation pipeline.

On the other hand, content singularity can also refer to the challenges of information overload in a virtual-physical blended environment, in which people are assaulted with so much information that it is impossible to digest and make sense of it all [14]. The sheer volume of online information, including text, photos, videos, and music, is already daunting and rapidly increasing. As such, the virtual-physical blended environment may cause a lot of disturbance to users if we neglect such exponential proliferation of 3D content.

Information or knowledge in the tangible world can indeed be thought limitless, whereas augmentation within the relatively limited field of view of headsets is complex. Consequently, we must optimise the presentation of digital content. Typically, metaverse users with a naive approach to virtual content delivery will experience information inundation, thereby requiring additional time to consume the augmentation. Context awareness, such as the users, environments, and social dynamics, is a prominent strategy for managing the information display. The AIGCs at the periphery, with the assistance of recommendation systems, can interpret user context and provide the most pertinent augmentation [14].

Although we foresee a rise in content volume when AIGCs are fully engaged as a utility in the Metaverse, two significant issues should be addressed. First, content uniqueness raises concerns about the quality and relevancy of the material provided. With so much material accessible, users are finding it increasingly difficult to identify what they seek and discern between high-quality and low-quality content. To address the issues of content singularity, additional research studies should have been made to create new tools and methodologies that will assist users in filtering, prioritizing, and personalizing the material they consume. Current solutions in Web 2.0 include search engines, recommendation algorithms, and content curation tools. Yet, the issue of content singularity remains a complicated and continuing one that will undoubtedly need further innovation and adaptation as the volume and diversity of digital information increase in the Metaverse.

Second, contemporary conversational interfaces have long been criticized for lacking transparency as a ‘black box’ [15]. In other words, conversational AIs do not show a complete list of their ability, while the general users usually have no clues about what the AI can achieve. Significantly, users with low AI literacy cannot quickly master the interaction with GPT-like AI agents through a conversational interface. Exploring the perfect fit between the generative AI models and the XR environment is necessary. For instance, the AI models can suggest some potential actions to the users by putting digital overlays on top of the user’s surroundings. As such, the user can understand the AI’s ability and will not make ineffective enquiries or wasted interactions with the generative AI model. In addition, more intuitive clues should be prepared, according to the user context, to inform the user about ‘what cannot be done’ with a generative AI model.

Refer to caption
Figure 3: An example pipeline of content creation and human-metaverse interaction supported by AIGCs: (a) brainstorming with conversational agents (collecting requirements simultaneously); (b) auto-generation of the contents; (c) start manual edition but huge pointing errors exist; (d) following (c), AI-assisted pointing for selecting vertex; (e) following (d), AI-assisted vertex editing; (f) manual editing of subtle parts; (g) AI-assigned panel and user interaction on the virtual objects; (h) user reviews of the objects while AIGCs attempt to understand the user perceptions; (I) content sharing, e.g., educational purpose in a classroom. Photos are extracted and modified from [4] for illustration purposes.

3 Human-Metaverse Interaction

Besides generating virtual content, AIGC can be considered an assistive tool for user interaction in the metaverse. From other users’ perspectives, a user’s movements and interaction with virtual objects can be a part of the content in virtual worlds. The difficulties of controlling an avatar’s movements and interacting with virtual objects can negatively impact an individual’s workload and the group’s perceptions of a metaverse application. For example, a group awaits an individual to finish a task, causing frustration.

Before discussing how the prompt should be extended in the Metaverse for easier interaction between users and metaverse instances, some fundamentals are considered in human-computer interaction (HCI) and prompt engineering [16]. Prompts have different concerns in HCI and NLP. From the HCI perspective, Effective prompts are clear, concise, and intuitive. Users have to design prompts for an interactive system, and users’ workloads exist to take specific actions or provide relevant input. Once the user’s needs and goals have been identified, the next step is to craft effective prompts that guide the user towards achieving those goals. And the AI-generated results provide users with the information they need to take action in a particular context. Therefore, prompt engineering is an essential aspect of designing interactive systems that are easy to use and achieve high levels of user satisfaction.

Prompt engineering, in NLP and particularly LLMs, refers to the methods for how communicating with LLM to steer their behavior for desired outcomes. The traditional chatbot (e.g., ChatGPT) considers primarily text prompts. In contrast, the prompts from the metaverse users can become more diverse by considering both the context as discussed above and multiple user modalities, including gaze, body movements, and psychological and physiological factors. In addition, perhaps employing certain personalization techniques, prompts should be tested and refined iteratively to ensure that they effectively guide LLMs towards the desired output. As such, metaverse-centric prompt engineering requires a new understanding of the user’s needs and goals, as well as their cognitive abilities and limitations. This information can be gathered through user testing, A/B testing, user surveys and usability testing in many virtual worlds.

The prompt design can be extended to the subtle interaction between virtual objects and users. VR sculpting is a popular application where users can freely mould their virtual objects in virtual spaces. The usability of VR, inaccurate pointing to vertex, becomes a hurdle [4]. It is still far away from being the main tool of creativity due to its efficiency. A hybrid model can be considered: generative AI models like LLMs can first generate a model of 3D content, and then we customize the model with manual editing in VR. In this sense, an important issue arises – we cannot get rid of manual operations with virtual instances. AIGCs, in the future, should assist human users in virtual tasks that inherit the nature of complexity and clumsiness, under hardware constraints, such as limited Field-of-view (FOV). AIGCs can parse the user actions in virtual environments, for instance, limb movements and gazes towards a virtual object, to provide appropriate work done from the manual editing. As such, AIGCs can serve as assistants for metaverse users. It is important to note that AI-assisted tasks always happen on the daily ubiquitous devices, i.e., smartphones. A prevalent example of 2D UIs is typing text on soft keyboards. Users tap on the key repetitively and make typos if triggered by adjacent keys. Such an erroneous task can be assisted by auto-correction. Users can tap the mistyped word and select the correct spelling from the suggested words. To achieve this, an AI model learns the words in the English dictionary and then understands user habits by recording the user’s word choice.

Typing on a soft keyboard is a good example of an AI-assisted task. In virtual environments, the interaction tasks, including dragging an object to a precise position and editing an object of irregular shapes, can be challenging to the users. AIGCs open opportunities to help human users accomplish the task. Nonetheless, the typing tasks on soft keyboards can be manageable because the dictionary is a reasonable search space. In contrast, AIGC-driven assistance can encounter a much larger room. In the editing task, a user can first select a vertex at a rabbit’s tail. The next action can be changing the vertex property and then moving to another vertex. The next vertex can happen on the head, bottom, etc. With the current technology, predicting the user’s following action at a highly accurate rate is very unlikely. However, if available, AIGCs may leverage the prior users’ behaviours from a dataset containing user interaction footprint, and accordingly recommend several ‘next’ edits to facilitate the process. Eventually, the user can choose one of them and accomplish the task without huge burdens.

Refer to caption
Figure 4: AIGM framework showing the relationship between human users, AIGCs and virtual-physical cyberspace (i.e., the Metaverse).

In a broader sense, diversified items exist in many virtual worlds, and a virtual item can have many possible relationships with another. As such, the user interaction with AIGCs’ predictions becomes complicated. For instance, I pick up an apple and then lift a tool to cut it. Other possible actions include putting down the apple, grabbing an orange, etc. It is also important to note that building an ontology for unlimited items in the Metaverse is nearly impossible. One potential tactic is to leverage the user’s in-situ actions. Generative AI models can read the user’s head and hand movements to predict the user’s interested regions and, thus, upcoming activities. Ideally, a user may give a rough pointing location to a particular item. Then, Generative AI models can make personalized and in-situ suggestions for the user’s subsequent interactions with virtual objects, with sufficient visualization to ensure intuitiveness. We believe that the above examples are only the tip of the iceberg but sufficient to illustrate the necessity of re-engineering the ways of making metaverse-ready prompts for Generative AI models.

Then, there is the issue of how natural people will feel in the metaverse environments built, or in some cases hallucinated, with AIGCs. Urban designers and architects are now looking into what factors of our ordinary environments matter most when attempting to translate the environments into digital ones, beyond the 3D replication. Here, issues such as subjective presence (comfort, feeling, safety, senses) or active involvement (activities taking place, other people’s presence), in addition to the traditionally considered structural aspects (colour, furnishing, scale, textures), will play a pivotal role in how the metaverse experience will feel like for its users (see, e.g., [17]). And so the questions to solve will include such as to what degree will we want generative AI to be able to spawn experiences that feel safe, or should the spaces more closely reflect the world as we know it, outside the metaverse, where different even adjacent spaces will have a lot of different perceived human characteristics to them.

The technical capability of AIGCs only opens a landscape of generating metaverse content, regardless of adding backdrops (AR) or removing objects causing strong emotions (DR). But we know very little about user aspects once AIGCs can be scaled. As the metaverse moves beyond the sole digital interfaces, i.e., 2D UIs, the AIGC can be embedded in the physical worlds and alter the user’s situated environments for fulfilling users’ subjective presence that can be abstract. It can vary greatly due to the user’s beliefs (norms, customs, ego, and so on) and their environment. A machine may not truly interpret the meaning of ‘calm’, especially if multiple subjective presences are underlying, e.g., ‘safe and calm’. A user makes a simple prompt of ‘calm’ to an AIGC model. Consequently, the results are unsatisfactory, as the user does not make effective prompts, for example, adding words like ‘meditation, wellness and sleep’ if the users are inside a bedroom. It is worth noting users with headsets may expect quick and accurate feedback, instead of requesting the generative AI models to revise the content with multiple iterations. In addition, subjective presence does not limit to a single user. Multiple users will interact with metaverse content in a shared space, potentially causing co-perception and communication issues. Generating the right content at the right time leads to a challenging problem more than technical aspects. AIGC in the Metaverse will lead to a novel niche of understanding the dynamics among metaverse content, physical space, and users.

4 Towards AIGM Framework

We argue AIGM is a must – should we aim to unleash all of the latent potentials in the metaverse concept. This is, regardless of who is the leading developer, the metaverse must be built for humans, and as humans, everything we do is embodied in the space around us [17]. The leading developers do not have the authority to arrange what content we should have on the Next Internet, as we have seen in the Metaverse of 2022, in which the virtual spaces are office-like environments. We usually spend eight work hours at the physical office, and it is insane to spend the rest of 8 hours in the virtual office. Ironically, except for the standard item given in asset libraries, we don’t have the right to decorate such office space with our unique creations. It is eventually the user’s call for the popular trend in the Metaverse. When Google’s Image searches were conducted since Q3 2021, it was evident that the creators had always defined the metaverse with blue, dark, and purple colours. We believe the trend of popular content is ever-changing. Driven by the vital role of AIGCs in democratizing content creation, everyone in the Metaverse can decide, (co-)create, and promote their unique content. To scale up the use of AIGCs, we propose a framework for an AI-Generated Metaverse (AIGM) that depicts the relationships among AIGCs, virtual-physical blended worlds, and human users, see Figure 4. AIGC is the fuel to spark the content singularity, and Metaverse content is expected to surround everyone like the atmosphere. This creates an entire creation pipeline in which AIGCs are the key actors. First, the users can talk to generative AI models to obtain inspiration during human-AI conversations (Human-AI collaboration). Consequently, generative AI models provide the very first edition of the generated content (AI-Generation). It then supports subtle editing during content creation (AI-Assistance). Some precise details can be done manually (Human users); if necessary, multiple users can be involved in the task (Multi-user collaboration). In addition, it is important to note that AIGCs can assign properties of how the users and virtual instances will interact, e.g., through a tap on a panel, and accordingly, AIGC-driven evaluations will perform to understand the user performance and their cognitive loads [18]. Eventually, content sharing and the corresponding user interaction can be backed by AIGCs.

5 Concluding Remarks

During a deceleration of global metaverse development, the author contends that AIGCs can be a critical facilitator for the Metaverse. This article shares some perspectives and visions of when AIGCs meet the Metaverse. Our discussion started with a look back at the key flaws of metaverse applications in 2022. We also highlight the fundamental difficulties the metaverse encountered. Accordingly, we examine how AIGCs will speed up metaverse development from a user standpoint. The article eventually speculates on future possibilities that combine the Metaverse with AIGCs. We call for a conceptual framework of AIGM that facilitates content singularity and human-metaverse interaction in the AIGC era. We also hope to provide a more expansive discussion within the HCI and AI communities.

References

  • [1] L.-H. Lee, P. Zhou, T. Braud, and P. Hui, “What is the metaverse? an immersive cyberspace and open challenges,” ArXiv, vol. abs/2206.03018, 2022.
  • [2] L.-H. Lee, T. Braud, P. Zhou, L. Wang, D. Xu, Z. Lin, A. Kumar, C. Bermejo, and P. Hui, “All one needs to know about metaverse: A complete survey on technological singularity, virtual ecosystem, and research agenda,” 2021.
  • [3] L. H. Lee, T. Braud, F. H. Bijarbooneh, and P. Hui, “Ubipoint: Towards non-intrusive mid-air interaction for hardware constrained smart glasses,” in Proceedings of the 11th ACM Multimedia Systems Conference, ser. MMSys ’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 190–201. [Online]. Available: https://doi.org/10.1145/3339825.3391870
  • [4] K. Y. Lam, L.-H. Lee, and P. Hui, “3deformr: Freehand 3d model editing in virtual environments considering head movements on mobile headsets,” in Proceedings of the 13th ACM Multimedia Systems Conference, ser. MMSys ’22.   New York, NY, USA: Association for Computing Machinery, 2022, p. 52–61. [Online]. Available: https://doi.org/10.1145/3524273.3528180
  • [5] L.-H. Lee, T. Braud, S. Hosio, and P. Hui, “Towards augmented reality driven human-city interaction: Current research on mobile headsets and future challenges,” ACM Comput. Surv., vol. 54, no. 8, oct 2021. [Online]. Available: https://doi.org/10.1145/3467963
  • [6] A. U. Batmaz, M. D. B. Machuca, D.-M. Pham, and W. Stuerzlinger, “Do head-mounted display stereo deficiencies affect 3d pointing tasks in ar and vr?” 2019 IEEE Conference on Virtual Reality and 3D User Interfaces (VR), pp. 585–592, 2019.
  • [7] C. Zhang, C. Zhang, C. Li, S. Zheng, Y. Qiao, S. K. Dam, M. Zhang, J. U. Kim, S. T. Kim, G.-M. Park, J. Choi, S.-H. Bae, L.-H. Lee, P. Hui, I. S. Kweon, and C. S. Hong, “One small step for generative ai, one giant leap for agi: A complete survey on chatgpt in aigc era,” researchgate DOI:10.13140/RG.2.2.24789.70883, 2023.
  • [8] C. Zhang, C. Zhang, S. Zheng, Y. Qiao, C. Li, M. Zhang, S. K. Dam, C. M. Thwal, Y. L. Tun, L. L. Huy, D. kim, S.-H. Bae, L.-H. Lee, Y. Yang, H. T. Shen, I.-S. Kweon, and C.-S. Hong, “A complete survey on generative ai (aigc): Is chatgpt from gpt-4 to gpt-5 all you need?” ArXiv, vol. abs/2303.11717, 2023.
  • [9] S. Büttner, M. Prilla, and C. Röcker, “Augmented reality training for industrial assembly work - are projection-based ar assistive systems an appropriate tool for assembly training?” in Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems, ser. CHI ’20.   New York, NY, USA: Association for Computing Machinery, 2020, p. 1–12. [Online]. Available: https://doi.org/10.1145/3313831.3376720
  • [10] A. C. C. Reyes, N. P. A. Del Gallego, and J. A. P. Deja, “Mixed reality guidance system for motherboard assembly using tangible augmented reality,” in Proceedings of the 2020 4th International Conference on Virtual and Augmented Reality Simulations, ser. ICVARS 2020.   New York, NY, USA: Association for Computing Machinery, 2020, p. 1–6. [Online]. Available: https://doi.org/10.1145/3385378.3385379
  • [11] V. Paananen, M. S. Kiarostami, L.-H. Lee, T. Braud, and S. J. Hosio, “From digital media to empathic reality: A systematic review of empathy research in extended reality environments,” ArXiv, vol. abs/2203.01375, 2022.
  • [12] T. J. Prescott, “The ai singularity and runaway human intelligence,” in Living Machines, 2013.
  • [13] P. Zhou, “Unleasing chatgpt on the metaverse: Savior or destroyer?” arXiv preprint arXiv:2303.13856, 2023.
  • [14] K. Y. Lam, L. H. Lee, and P. Hui, “A2w: Context-aware recommendation system for mobile augmented reality web browser,” in Proceedings of the 29th ACM International Conference on Multimedia, ser. MM ’21.   New York, NY, USA: Association for Computing Machinery, 2021, p. 2447–2455. [Online]. Available: https://doi.org/10.1145/3474085.3475413
  • [15] A. B. Arrieta, N. D. Rodríguez, J. D. Ser, A. Bennetot, S. Tabik, A. Barbado, S. García, S. Gil-Lopez, D. Molina, R. Benjamins, R. Chatila, and F. Herrera, “Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai,” ArXiv, vol. abs/1910.10045, 2019.
  • [16] V. Liu and L. B. Chilton, “Design guidelines for prompt engineering text-to-image generative models,” Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, 2021.
  • [17] V. Paananen, J. Oppenlaender, J. Goncalves, D. Hettiachchi, and S. Hosio, “Investigating human scale spatial experience,” Proceedings of the ACM on Human-Computer Interaction, vol. 5, no. ISS, pp. 1–18, 2021.
  • [18] Y. Hu, M. L. Yuan, K. Xian, D. S. Elvitigala, and A. J. Quigley, “Exploring the design space of employing ai-generated content for augmented reality display,” 2023.