A Unified Industrial Large Knowledge Model Framework in Industry 4.0 and Smart Manufacturing

Jay Lee, Hanqi Su
Center for Industrial Artificial Intelligence, Department of Mechanical Engineering,
A. James Clark School of Engineering, University of Maryland, College Park,
Maryland, United States of America
{leejay, hanqisu}@umd.edu
Corresponding author

Abstract

The recent emergence of large language models (LLMs) demonstrates the potential for artificial general intelligence, revealing new opportunities in Industry 4.0 and smart manufacturing. However, a notable gap exists in applying these LLMs in industry, primarily due to their training on general knowledge rather than domain-specific knowledge. Such specialized domain knowledge is vital for effectively addressing the complex needs of industrial applications. To bridge this gap, this paper proposes a unified industrial large knowledge model (ILKM) framework, emphasizing its potential to revolutionize future industries. In addition, ILKMs and LLMs are compared from eight perspectives. Finally, the “6S Principle” is proposed as the guideline for ILKM development, and several potential opportunities are highlighted for ILKM deployment in Industry 4.0 and smart manufacturing.

1 Introduction

In the era of Industry 4.0 [9], a paradigm shift is unfolding in the manufacturing sector, driven by the advent of smart manufacturing practices. This revolution has been fueled by advancements in industrial big data analytics [23], industrial artificial intelligence (AI) [11], machine learning (ML) and deep learning [16, 20], cyber-physical system [10], and industrial internet of things [18]. These technologies aim to enhance efficiency, productivity, and flexibility in manufacturing processes.

Recent advances in large language models (LLMs) [26, 2] have showcased extraordinary capabilities in natural language processing, including understanding, interpreting, and generating human language. However, a gap exists in the application of these LLMs within smart manufacturing, primarily because LLMs are predominantly trained on general knowledge, not domain-specific knowledge, which may not be entirely suitable for the specific and complex needs of industrial applications. Therefore, there is an urgent need for the development of an advanced foundation model leveraging the powers of LLMs and domain-specific knowledge to address complex challenges in Industry 4.0 and smart manufacturing.

Recognizing this gap, an industrial large knowledge model (ILKM) framework is proposed for domain-driven, data-centric industrial systems in Industry 4.0 and smart manufacturing. In addition, the “6S Principle” is proposed as a guideline for the development of ILKMs. The role of ILKMs and their comparison with LLMs are discussed in detail. Through this exploration, this paper aims to provide a comprehensive understanding of the transformative power of ILKMs in the modern manufacturing landscape, highlighting their significance and opportunities as a cornerstone of the ongoing industrial revolution.

Refer to caption — Figure 1: General process in Industry 4.0 and smart manufacturing using industrial large knowledge model. Abbreviations: AI: Artificial intelligence; LKL: Large knowledge library; LLM: Large language model; ML: Machine learning.

2 The role of ILKMs in Industry 4.0 and smart manufacturing

In Industry 4.0 and smart manufacturing, the deployment of ILKMs emerges as a pivotal element. Figure 1 shows the general process of how ILKM works within Industry 4.0, where ILKM functions at the core of this advanced manufacturing paradigm. The process begins with the acquisition and management of a vast array of industrial data, derived from diverse industrial products [13, 15]. This data are categorized into two primary forms: humaninterpretable data and structured machine-generated data. Leveraging technologies such as LLMs, a comprehensive large knowledge library (LKL), along with various ML and AI techniques [7], ILKMs serve as artificial general intelligence [6], which plays a vital role in enabling advanced and sophisticated data analytics and problem-solving. These advanced analytical capabilities, therefore, pave the way for more insightful and informed decision-making processes. Beyond this, ILKMs can also interface with and improve supply chain management [24], leading to more efficient, resilient, and customer-focused operations. In addition, the solutions generated by ILKMs undergo evaluation by subject matter experts, who play a crucial role in validating and refining the relevant solutions, thereby aiding in the continual optimization of ILKM outputs. This iterative process of assessment and feedback is integral to ensuring the relevance and effectiveness of ILKM solutions. Overall, ILKMs underscore the transformative potential of data-driven approaches, offering detailed and comprehensive optimization and enhancement directions for industrial products in Industry 4.0 and smart manufacturing.

3 ILKM framework

The proposed ILKM framework, shown in Figure 2, provides a step-by-step guideline for developing and deploying ILKMs using industrial data to enhance manufacturing capabilities in areas such as predictive maintenance, process optimization, quality control, engineering design, question-answering(QA) platforms, and data analytics. The ILKM framework consists of four pivotal steps: (i) the construction of an LKL categorized by human-interpretable and structured machine-generated data; (ii) the preparation of domain-specific instruction data; (iii) the development of a domain-specific knowledge LLM based on the domain-specific data and domain instruction data; and (iv) the establishment of an intelligent domain expert ML system. As illustrated in Figure 2, the details of the ILKM framework are outlined in the subsequent Sections 3.1–3.4.

3.1 Large knowledge library construction

The initial step in constructing an ILKM involves the creation of an LKL. This library is pivotal for accommodating the breadth and diversity of industrial data, thus serving as a foundational resource for subsequent analytical tasks. During this phase, it is essential to categorize the data into domain-specific categories systematically. Such an organization enables researchers and data scientists to streamline their efforts, allowing for efficient retrieval of domain-specific data to inform the development of ML models tailored to address distinct industry-related challenges. Within these categories, based on the usage and nature of the data, industrial data can be divided into two primary types: human-interpretable and structured machine-generated. Human-interpretable data, inherently designed for human cognition, comprise elements such as text documents, annotated images, coding scripts crafted by programmers, and multimedia content. This type of data can be seen as insightful information or knowledge and is used for the later development of domain-specific knowledge ML models. On the other hand, structured machine-generated data comprises sensor readings, machine logs, operational parameters, and more. This data type can be leveraged for analytical purposes in technical and industrial contexts.

3.2 Domain instruction data preparation

In the second step of the ILKM framework, the focus shifts to transforming domain-specific data (human-interpretable data from LKL) into structured domain instruction sets. This transformation is crucial for enhancing the performance of LLMs in targeted domains by generating domain-centric knowledge and achieving multi-modal data fusion [8]. These structured instructions, vital for finetuning the LLM and retrieving domain knowledge, ensure that the model is proficient in addressing domain-specific challenges and enhancing problem-solving capabilities [25, 5]. As depicted in Figure 2, the domain instruction data are organized into three parts: first, the domain instruction, which identifies the problem’s domain and may include sub-tags for refined categorization; second, the input, which clearly outlines the current problem; and third, the output, which presents the corresponding solution.

3.3 Domain knowledge LLM development

The third step of the ILKM framework entails an initial pre-training of the base LLM with domain-specific data sourced from LKL. This pre-training imbues the LLM with rich domain-specific knowledge. Following this, the pre-trained LLM undergoes a fine-tuning process, guided by domain instructions, transforming it into a domain knowledge LLM proficient in the designated field. To refine the LLM’s expertise, several common techniques to enhance and train LLMs can be summarized as follows: reinforcement learning from human feedback [4], instruction tuning [12, 21], mixture of experts [17], prompt engineering [1, 14], retrieval-augmented generation [5], and leveraging attention mechanism [19], The objective of this step is to build a robust LLM that possesses extensive domain knowledge and expertise. This LLM can then guide the development of new ML models capable of addressing complex challenges and real industrial problems.

Table 1: Comparison between ILKMs and LLMs. Abbreviations: ILKM: Industrial large knowledge model; LLM: Large language model.

ILKM

LLM

Data

Industrial domain-specific data (human-interpretable data

& structured machine-generated data); private closed source

Vast,diverse, and unstructured text data; public open source

Purpose

Designed for specific industrial tasks; Provide specialized

solutions in respective domains

Designed for language-related tasks; focus on understanding

and generating human language

Domain-specific

knowledge

Specialiized: in-depth, domain knowledge relevant to specific

industries

General: may lack deep, industry-specific insights

Data privacy

and security

Offer greater control over data privacy and security as they can

be hosted within the company’s secure environment

Potential concerns with data privacy and security as

researchers often use licensed pre-trained models developed by

other private companies to fine-tune LLMs

Integration and

customization

Tailored and integrated into a growing and evolving industrial

ecosystems, aligning with industry-specific needs

Need additional resources for integration and customization to

fit specific requirements

Scalability

Adapt and expand based on specific industrial requirements and

environment, but need to be balanced with the cost

Scalable across platforms but also require significant

computational resources

Real-time

decision making

Better suited for real-time decision-making in industrial settings,

leveraging specific industry data

Limited in handling real-time, complex industrial decision

due to generic training

Application

Process optimization, predictive maintenance, quality control,

prognostic and health management, material and design, data

analytics, decision-making, question-answering platform, etc

Text generation, content creation, conversation, language

translation, summarization, etc., Not domain specific

3.4 Intelligent domain expert machine learning system

Upon the successful training of the domain knowledge LLM, the fourth step involves utilizing it as a domain expert for subsequent specialized model development. In this step, domain instruction data serves as the prompt, propelling the LLM to address specific analytical problems. The domain-specific knowledge LLM, acting on these instruction inputs, proposes targeted solutions. In addition, human experts may interact and intervene, offering strategic guidance to refine the LLM’s outputs. These solutions are then transferred to a coding-focused LLM [3, 22], which incrementally develops code aligned with the domain knowledge LLM’s insights, thereby creating a new ML model for specific problems. In addition, the structured machine-generated data serves as the dataset for new ML model training and testing. Finally, this step culminates in the generation of actionable solutions, ready to be integrated into decision-making workflows.

4 Discussion

This section discusses the comparison between ILKMs and LLMs and introduces the “6S Principle” as a guideline for future ILKM development. It also highlights several potential opportunities for ILKM deployment in Industry 4.0 and smart manufacturing.

4.1 Comparison between ILKMs and LLMs

The main difference between ILKMs and LLMs lies in their purpose and functionality. ILKMs are designed to handle specific industrial tasks, utilizing relevant structured industrial data and domain-specific knowledge to provide precise, expert-level solutions. In contrast, LLMs are more generalized, leveraging extensive training on diverse textual data to solve language-related tasks, such as text generation, conversation, and language translation. To better illustrate the characteristics of ILKMs, a detailed comparison between ILKM and LLM is presented in Table 1. They are compared and explained from eight perspectives: “Data,” “Purpose,” “Applications,” “Data Privacy and Security,” “Domain-Specific Knowledge,” “Integration and Customization,” “Scalability,” and “RealTime Decision-Making.”

4.2 Foundational principles for ILKM development

As illustrated in Figure 3, the “6S Principle” is proposed as a guideline for the future development of ILKMs. The “6S Principle” encompasses six key components: “Specialized Domain Knowledge,” “Scrutability,” “Safety,” “Scalability,” “Sustainability,” and “Systematization and Standardization.” The details of the purpose, challenges, and opportunities for each principle are presented in Figure 3. All of these principles are crucial for the successful application of ILKMs in industrial settings, ensuring that ILKMs can address specific needs and challenges faced in Industry 4.0 and smart manufacturing.

4.3 Prospective and opportunity

There are several opportunities for developing ILKMs in the future of Industry 4.0 and smart manufacturing. In the field of material discovery and synthesis, ILKMs can analyze historical data from the literature to summarize design guidelines and principles as domain-specific instructions. These instructions can then guide the design of new experiments, ultimately facilitating the discovery of new materials. In the area of engineering design, ILKM can assimilate knowledge from multimodalities (such as text, 2D images, 3D shapes, and sound) gathered from historical products and provide possible optimization directions for designers and engineers to enhance new product performance. In the realm of prognostics and health management, ILKMs can analyze data from historical failures and maintenance strategies to aid in the diagnosis and prognosis of complex industrial machine systems, ultimately enabling predictive maintenance and lifecycle management. Furthermore, intelligent QA platforms can be developed across various industrial sectors with the revolution of ILKM. Instead of manual information searches performed by humans, ILKM can automatically retrieve relevant information and generate responses, thus assisting employees with their queries.

5 Conclusion

This paper presents a unified ILKM framework to address the complex needs of industrial applications by integrating advanced AI, ML, and LLM technologies with specialized industrial knowledge. The “6S Principle” serves as a foundational guideline for ILKM development, aiming to create a domain-specific, interpretable, secure, scalable, and sustainable ILKM that meets the demands and challenges in Industry 4.0 and smart manufacturing. Moving forward, future research should focus on further integrating cutting-edge AI and ML technologies, continuously refining the framework and its guiding principles based on real-world applications, and leveraging this framework to develop innovative approaches for broader adoption across various industrial sectors. In summary, the ILKM framework shows significant potential for enhancing the intelligence, efficiency, and resilience of future industries.

References

[1] L. Beurer-Kellner, M. Fischer, and M. Vechev. Prompting is programming: A query language for large language models. Proceedings of the ACM on Programming Languages, 7(PLDI):1946–1969, 2023.
[2] Y. Chang, X. Wang, J. Wang, Y. Wu, L. Yang, K. Zhu, H. Chen, X. Yi, C. Wang, Y. Wang, et al. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology, 15(3):1–45, 2024.
[3] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, et al. Evaluating large language models trained on code. arXiv preprint arXiv:2107.03374, 2021.
[4] P. F. Christiano, J. Leike, T. Brown, M. Martic, S. Legg, and D. Amodei. Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
[5] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, and H. Wang. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997, 2023.
[6] B. Goertzel. Artificial general intelligence: concept, state of the art, and future prospects. Journal of Artificial General Intelligence, 5(1):1, 2014.
[7] Z. Jan, F. Ahamed, W. Mayer, N. Patel, G. Grossmann, M. Stumptner, and A. Kuusk. Artificial intelligence for industry 4.0: Systematic review of applications, challenges, and opportunities. Expert Systems with Applications, 216:119456, 2023.
[8] D. Lahat, T. Adali, and C. Jutten. Multimodal data fusion: an overview of methods, challenges, and prospects. Proceedings of the IEEE, 103(9):1449–1477, 2015.
[9] H. Lasi, P. Fettke, H.-G. Kemper, T. Feld, and M. Hoffmann. Industry 4.0. Business & information systems engineering, 6:239–242, 2014.
[10] J. Lee, B. Bagheri, and H.-A. Kao. A cyber-physical systems architecture for industry 4.0-based manufacturing systems. Manufacturing letters, 3:18–23, 2015.
[11] J. Lee, H. Davari, J. Singh, and V. Pandhare. Industrial artificial intelligence for industry 4.0-based manufacturing systems. Manufacturing letters, 18:20–23, 2018.
[12] L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
[13] T. P. Raptis, A. Passarella, and M. Conti. Data management in industry 4.0: State of the art and open challenges. IEEE Access, 7:97052–97093, 2019.
[14] L. Reynolds and K. McDonell. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems, pages 1–7, 2021.
[15] S. I. Shafiq, E. Szczerbicki, and C. Sanin. Proposition of the methodology for data acquisition, analysis and visualization in support of industry 4.0. Procedia computer science, 159:1976–1985, 2019.
[16] M. Sharp, R. Ak, and T. Hedberg Jr. A survey of the advancing use and development of machine learning in smart manufacturing. Journal of manufacturing systems, 48:170–179, 2018.
[17] N. Shazeer, A. Mirhoseini, K. Maziarz, A. Davis, Q. Le, G. Hinton, and J. Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
[18] E. Sisinni, A. Saifullah, S. Han, U. Jennehag, and M. Gidlund. Industrial internet of things: Challenges, opportunities, and directions. IEEE transactions on industrial informatics, 14(11):4724–4734, 2018.
[19] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
[20] J. Wang, Y. Ma, L. Zhang, R. X. Gao, and D. Wu. Deep learning for smart manufacturing: Methods and applications. Journal of manufacturing systems, 48:144–156, 2018.
[21] Z. Wang, F. Yang, P. Zhao, L. Wang, J. Zhang, M. Garg, Q. Lin, and D. Zhang. Empower large language model to perform better on industrial domain-specific question answering. arXiv preprint arXiv:2305.11541, 2023.
[22] F. F. Xu, U. Alon, G. Neubig, and V. J. Hellendoorn. A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming, pages 1–10, 2022.
[23] J. Yan, Y. Meng, L. Lu, and L. Li. Industrial big data in an industry 4.0 environment: Challenges, schemes, and applications for predictive maintenance. Ieee Access, 5:23484–23491, 2017.
[24] V. Yandrapalli. Revolutionizing supply chains using power of generative ai. International Journal of Research Publication and Reviews, 4(12):1556–1562, 2023.
[25] S. Zhang, L. Dong, X. Li, S. Zhang, X. Sun, S. Wang, J. Li, R. Hu, T. Zhang, F. Wu, et al. Instruction tuning for large language models: A survey. arXiv preprint arXiv:2308.10792, 2023.
[26] W. X. Zhao, K. Zhou, J. Li, T. Tang, X. Wang, Y. Hou, Y. Min, B. Zhang, J. Zhang, Z. Dong, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.