Generative manufacturing systems using diffusion models and ChatGPT

Xingyu Li Fei Tao Wei Ye Aydin Nassehi John W. Sutherland

Abstract

In this study, we introduce Generative Manufacturing Systems (GMS) as a novel approach to effectively manage and coordinate autonomous manufacturing assets, thereby enhancing their responsiveness and flexibility to address a wide array of production objectives and human preferences. Deviating from traditional explicit modeling, GMS employs generative AI, including diffusion models and ChatGPT, for implicit learning from envisioned futures, marking a shift from a model-optimum to a training-sampling decision-making. Through the integration of generative AI, GMS enables complex decision-making through interactive dialogue with humans, allowing manufacturing assets to generate multiple high-quality global decisions that can be iteratively refined based on human feedback. Empirical findings showcase GMS’s substantial improvement in system resilience and responsiveness to uncertainties, with decision times reduced from seconds to milliseconds. The study underscores the inherent creativity and diversity in the generated solutions, facilitating human-centric decision-making through seamless and continuous human-machine interactions.

keywords:

Manufacturing system , Machine Learning , Generative AI

^†^†journal: …

\affiliation

[inst1]organization=School of Engineering Technology,addressline=Purdue University, city=West Lafayette, postcode=47907, state=MI, country=USA \affiliation[inst2]organization=School of Automation Science and Electrical Engineering,addressline=Beihang University, postcode = 100191, city=Beijing, country=China

\affiliation

[inst3]organization=Digital Twin International Research Center, International Research Institute for Multidisciplinary Science,addressline=Beihang University, postcode = 100191, city=Beijing, country=China \affiliation[inst4]organization=School of Electrical, Electronic, and Mechanical Engineering,addressline=University of Bristol, postcode = BS8 1TH, city=Bristol, country=United Kingdom \affiliation[inst5]organization=School of Environmental and Ecological Engineering,addressline=Purdue University, city=West Lafayette, postcode=47907, state=MI, country=USA

1 Introduction

Manufacturing systems confront persistent uncertainties with varying forms, urgencies, and impacts. Firstly, the advent of mass personalization [1] coupled with regulatory and standard changes adds complexity to production requirements, mandating systems to adeptly navigate evolving demands and obligations. Secondly, production disruptions [2, 3], like natural disasters, pandemics, financial crises, and geopolitical conflicts trigger resource scarcities and shifts in consumer behavior. 20–30% of firms and businesses are compelled to close following a major disruption [4]. Lastly, emerging manufacturing initiatives driven by sustainable, societal, and ecological goals have the potential to reshape production objectives, necessitating a thorough reassessment of existing systems [5]. Future manufacturing systems require the flexibility to promptly adapt to uncertainties and maintain a nuanced balance between emerging initiatives and constraints.

The first integration of flexibility into MSs traces back to the 1960s with the inception of flexible manufacturing systems [6]. Despite efforts to improve hardware and software flexibility, the inherent NP-hard complexity of centralized control in these manufacturing systems, especially with a growing number of assets and planning horizons, impedes system responsiveness to uncertainties. Increasing autonomy of manufacturing assets, including robots, vehicles, and mobile manipulators, poses an opportunity to address this challenge by delegating decision authority to each asset. Manufacturers like Audi have shifted from rigid line production to segmented workstations with autonomous assets [7]. Assets tailored for specific manufacturing tasks, such as Little Helper, OMRON MoMa, and KMR IIWA, have demonstrated efficacy across automotive and aerospace industries [8]. Through strategic task assignment and routing, these assets have the potential to realize adaptable layouts and schedules, anticipating up to 30% increase in worker utilization and output levels [9].

Emerging manufacturing systems, exemplified by agent-based manufacturing [10], matrix production systems [11], and anarchic manufacturing [12], incorporate asset autonomy via decentralized or distributed control. However, as autonomous assets become more complex and flexible, facilitated by open interfaces and universal standards [13, 14], these control approaches encounter challenges as well. Each asset often lacks comprehensive awareness of the entire system and its constraints [15], resulting in conflicts in aligning individual plans and impeding the attainment of optimal solutions. More importantly, optimal solutions are contingent upon effectively balancing diverse objectives and stakeholders’ preferences [16, 17], which may not be completely and explicitly modeled. To fully exploit the benefits of asset autonomy, a revolutionary approach is imperative but missing—one that efficiently manages a multitude of diverse assets for various production objectives under uncertainties, all while ensuring the centrality of humans in decision-making. Generative models provide a transformative opportunity to address these challenges through their distinctive generative capabilities, probabilistic modeling, and interactive decision-making. Herein, we proposed a GMS, signifying a fundamental transition from traditional explicit model of now to implicit knowledge of future. Drawing inspiration from the envisions of the daydreaming factory [18], our approach involves exploring diverse combinations of decisions and uncertainties to generate numerous potential futures. Through utilizing the generative models, including diffusion models and ChatGPT, GMS proficiently captures underlying patterns and distributions of the decisions from the future experience, facilitating creative decision-making even for scenarios beyond the initial scope of exploration.

2 Generative Manufacturing Systems

We envisage a synergistic integration of stationary machinery, autonomous assets, and diverse human workforces in the futuristic manufacturing systems. Considering the growing asset autonomy and mobility, we propose that autonomous assets and humans can dynamically relocate and self-organize across various workstations to enhance manufacturing operations and streamline material flows. GMS is designed to skillfully adjust configuration and schedules to handle uncertainties and production objectives, all while under human supervision. As illustrated in Fig. 1, humans contribute concerns and suggestions for various objectives to the autonomous assets, fostering collaborative expertise under a unified production floor.

Refer to caption — Figure 1: Schematic representation of GMS. Left: assets receiving human inquiries; Middle: GMS models get trained from explorations of futures and process human inquiries to sample new decisions; Right: GMS responds to human inquiries by providing diverse options for configurations and schedules.

To facilitate human interaction with assets, GMS employs large language models like ChatGPT, XLNet, and Turning-NLP to convert human inquiries into machine language. Image generation models, including diffusion models, BigGAN, and DALL-E, are then adopted to generate system configurations (humans and assets at each station) in response to human inquiries. Further granularity in decisions is achieved through operational schedules and task allocations, distributing tasks across stations and between humans and robots while considering material and process constraints.

Unlike existing approaches relying on explicit models and their convergence to find optimal decisions (model-optimum), GMS employs a training-sampling approach - by extensively exploring future scenarios, GMS implicitly learns probabilistic distributions of good decisions, assembling these distributions in accordance with human desires and production objectives for decision sampling. This shift from a model-optimum to a training-sampling approach not only addresses computational challenges in existing manufacturing systems but also introduces the following benefits:

Creativity: The incorporation of noises during sampling enables a broader spectrum of potential decisions. Additionally, generative models can innovate novel decisions through purposeful combinations of learned distributions, a critical aspect of aligning with emerging human inquiries and unforeseen scenarios.

Resilience: Training-sampling boosts system resilience in two folds: firstly, sampling decisions prove substantially more efficient as compared to optimization convergence, which enhances system responsiveness amidst uncertainties; secondly, sampling provides varied solutions for a wide range of scenarios, equipping the GMS with a diverse set of potential responses to enhance resilience.

Human-centricity: The implicit knowledge of GMS seamlessly integrates with human inquiry, knowledge, and expertise, allowing humans to tap into the nuanced insights within generative models. This synergy enables a more cohesive and effective collaboration between humans and autonomous assets, where humans can harness the capabilities of GMS to augment their decision-making while instilling a sense of ownership and job satisfaction.

3 Generative Models

In this section, we present two exemplary generative models for dynamically managing assets in GMS: 1) ChatGPT to extract system requirements from human inquiries, and 2) diffusion models to create configurations to meet those requirements. A configuration is encoded as a matrix $\mathbf{x}=\{x_{ij}\}$ , defines the quantity of assets of type $i\in I$ in station $j\in J$ . For scheduling, we leverage models from [19] to establish a mapping between the configuration and its corresponding optimal scheduling. For simplicity, we consider human heterogeneity only in skill levels, and machine health and production quality are maintained at consistently high levels.

3.1 ChatGPT

Utilizing OpenAI’s ChatGPT API in Python with the gpt-3.5-turbo model variant, we create a named entity recognition task to generate key requirements from human inquiries. For example, when presented with the query ”I need a production line with a minimal capacity of 240 part/hour, using no more than 9 machines.” the response is a class $c=$ ’(240, None, 9)’, where ’None’ functions as a placeholder for human skills not explicitly mentioned.

3.2 Diffusion models

For the decision generation, we adopt diffusion models to learn the underlying patterns, features, and distributions in the training data of envisioned configurations. The diffusion model sets itself apart from other machine learning models by iteratively refining noise-corrupted data to generate new samples, which involves two processes as shown in Fig. 2: 1) forward process - adding noises $\mathbf{\epsilon}_{t}$ at each step until the data $\mathbf{x}_{0}$ is destroyed, and 2) reverse process – sampling new $\mathbf{x}_{0}$ by iteratively removing estimated noises.

We denote latent variables $\mathbf{z}_{t}$ as noisy data in the forward process, which is calculated by introducing Gaussian noise $\bm{\epsilon}\sim\mathcal{N}(0,\mathbf{I})$ to the input data $\mathbf{x}_{0}$ at each step $t\in T$ , with weights determined by the forward process variances $\beta_{t}$ , namely,

\mathbf{z}_{t}=\alpha_{t}\mathbf{x}_{0}+\sigma_{t}\bm{\epsilon}

(1)

where, $\alpha_{t}=\sqrt{\prod_{s=1}^{t}(1-\beta_{s})}$ , and $\sigma_{t}=\sqrt{1-\alpha_{t}^{2}}$ , are derived from the Markov Chain [20]. In the reverse process, a learning model $h_{\theta}$ is utilized to estimate the noise $\bar{\bm{\epsilon}}_{t}^{c}$ given $\mathbf{z}_{t}$ to restore the original data $\mathbf{x}_{0}$ . Here, we parameterize the added noise $\bar{\bm{\epsilon}}_{t}^{c}$ as a function of $\mathbf{z}_{t}$ and the current step $t$ , and the class label $c$ of $\mathbf{x}_{0}$ , namely,

\bar{\bm{\epsilon}}_{t}^{c}=h_{\theta}(\mathbf{z}_{t},t,c)

(2)

Training diffusion model involves minimizing the disparity between the estimated and true noises to maximize the likelihood of the generated samples aligning with the distribution of training data. Detailed training process is delineated in Algorithm 1. The sampling process involves using the following linear combinations to integrate estimated noises $\tilde{\bm{\epsilon}}_{t}$ from both unconditional and conditional data of target class $c$ from human inquiries, namely,

Algorithm 1 Diffusion Model Training:

1: repeat until converged

2: select

t\in[1,T]

\triangleright

Sample step value

3: random select

\mathbf{x}_{0}

\triangleright

Sample training data

4: obtain the class

c

of data

\mathbf{x}_{0}

\triangleright

Obtain the data class

c\leftarrow\emptyset

with probability

p_{u}

\triangleright

Randomly discard class info

6: sample

\mathbf{z}_{t}=\alpha_{t}\mathbf{x}_{0}+\sigma_{t}\bm{\epsilon}

\triangleright

Obtain latent variable of

t

7: take gradient descent on:

\triangledown_{\theta}J=\triangledown_{\theta}\lVert h_{\theta}(\mathbf{z}_{t},t,c)-\bm{\epsilon}\rVert^{2}_{2}

\triangleright

Gradient calculation

\tilde{\bm{\epsilon}}_{t}\leftarrow(1+w)\bar{\bm{\epsilon}}_{t}^{c}-w\bar{\bm{\epsilon}}_{t}^{\emptyset}

(3)

where $w$ denotes the guidance strengths to control the blend of these two types of noises during sampling. Provided the estimated noise $\tilde{\bm{\epsilon}}_{t}$ and a sample with noisy $\mathbf{x}_{t}$ , the sample at the preceding step $\mathbf{x}_{t-1}$ can be attained from the following distribution:

\mathbf{x}_{t-1}\sim\mathcal{N}(\frac{1}{\sqrt{1-\beta_{t}}}(\mathbf{x}_{t}-\frac{\beta_{t}}{\sigma_{t}}\tilde{\bm{\epsilon}}_{t}),\beta_{t}\mathbf{I})

(4)

By randomly initializing noise matrix $\mathbf{x}_{T}\sim\mathcal{N}(0,\mathbf{I})$ and iteratively sampling using Eqn. 4, a new configuration $\mathbf{x}_{0}$ can be attained.

3.3 Learning model

The learning model $h_{\theta}$ aims to infer the noise estimate $\bm{\epsilon}_{t}$ from the latent variable $\mathbf{z}_{t}$ , both in dimensions $I\times J$ . To obtain a natural symmetry, padding is implemented, resulting in squared matrices of size $p=\max\{I,J\}$ . Fig. 3 shows the proposed learning model in a U-Net structure, which is utilized to facilitate information flow between pooling and transposed-convolution pathways. Residual convolutional blocks are tailored to enhance hierarchical feature extraction and pattern recognition for data in matrix format.

The introduction of skip connections seamlessly integrates learned features and contextual information across diverse levels of U-Net. With identical input and output sizes, skip connections effectively facilitate the direct transfer of information across layers and preserve fine details and contextual information throughout the network. The U-Net architecture, enriched by skip connections, effectively retains spatial features in configuration and noise matrices, offering significant advantages in the generation process.

3.4 Daydreaming Process

To effectively capture implicit knowledge, this study integrates the daydreaming process [18] with meta-heuristics to explore potential decisions in anticipated future scenarios. The process initiates with the generation of random future scenarios, including structured randomness in demands, humans, and autonomous asset capabilities, and corresponding decisions in configurations and schedules. Decision quality is enhanced by integrating selection, crossover, and mutation operations inspired by the genetic algorithm, which provides two advantages: 1) guiding the generation of diverse and adequate configurations for efficient learning, and 2) accelerating data accumulation by storing populations from each generation. Termination of daydreaming occurs after a predefined number of iterations rather than model convergence, ensuring a balanced dataset. Each explored configuration is evaluated based on multiple objectives of interest.

4 Result

We implemented and simulated GMS in the industrial use case for part processing, following [19]. The system assumes 9 types of assets and operations/operation setups, distributed across 7 stations to facilitate flexible collaborations. Human skill levels were randomized as high/moderate/low (120/60/0 parts/hour) across different operations. The daydreaming process involved the randomization of worker skills over 25 generations, each including 40 potential configurations. Cplex was used to obtain the mapping between configurations and optimal schedules. The simulation spanned 120 runtime units, generating 120,000 data over 15 hours for training purposes. The diffusion process and learning model were implemented in Python using PyTorch. Based on the optimal tuning results, the process variance was set at $\beta_{0}=10^{-4}$ and $\beta_{T}=0.02$ with total steps of $T=400$ and guidance strength of $w=2$ .

Fig. 4 shows the sampling process for generating configurations with a specified target capacity. As the step decreases, sampled configurations demonstrate increased rationality, yielding distinct layouts. The rational generation relies on the adept accumulation of implicit knowledge of key features and patterns. For instance, configurations with $0$ capacity predominantly display light colors in later parts of the matrix, signifying minimal asset utilization in few types. A system in such uniform composition lacks the ability to perform all operations, resulting in $0$ capacity. With capacity escalates, more assets (darker colors) in diverse types are included to enhance parallel production and operation efficiency.

A comprehensive analysis of decision time was performed to assess the efficacy of GMS in responsiveness compared to existing methods. The comparative approaches applied widely utilized meta-heuristic algorithms for configuration optimization. Table 1 records the average decision time taken to obtain a configuration with the requisite capacity over five runs of each capacity.

Table 1: Comparison of decision time (in seconds) to other algorithms.

Algorithm	0	60	120	180	240	300
Particle Swam Optimization	7.66	8.14	7.84	8.33	7.93	8.083
Genetic Algorithm	18.51	16.07	15.66	15.32	14.95	$>$ 300
Differential Evolution	32.50	31.11	28.98	28.66	30.11	$>$ 300
Simulated Annealing	20.20	21.87	21.37	20.93	21.60	36.189
Imperial Competitive Algorithm	20.14	19.57	19.63	20.56	20.13	$>$ 300
Diffusion Models	0.009	0.016	0.011	0.016	0.013	0.009

Diffusion models maintain low decision times, ranging from 9‰ to 16‰ seconds per decision, across specified capacities. This consistent efficiency signifies a quantitative improvement compared to other algorithms, which typically exceed 10 seconds, and, at times, failing to attain the target capacity even after 300 seconds. The consistent efficiency of diffusion models underscores a pivotal advancement in algorithmic efficacy of training-sampling approach as opposed to the model-optimum approach, markedly enhancing responsiveness and resilience of GMS to uncertainties.

To comprehensively assess the quality of generated samples, we randomly sample 1000 configurations and evaluate them by three metrics: 1) precision - accuracy (Accu) and the mean squared errors (MSE) of matching the requisite capacity, 2) diversity – duplication rate (DR) of the generated configurations that exist in the training data, and 3) fidelity - Fréchet Inception Distance (FID) measures perceptual quality and fidelity of generated samples as compared to the distribution of training data. The performance of diffusion models with and without guidance is listed in Table 2.

Table 2: Model performance with (top) and without (bottom) guidance.

Metric	0	30	60	90	120	150	180	240	270	300
Accu (%)	100	72.1	56.3	75.1	80.7	39.6	54.6	66.5	91.3	98.5
MSE	0	15.8	27.8	14.4	19.5	27.9	34.2	23.3	10.5	3.6
DR (‰)	0	1	0	0	0	0	0	2	2	13
FID ( $10^{-6}$ )	6.7	11.7	23.0	32.7	26.4	27.8	12.8	5.3	1.9	2.0
Accu (%)	98.1	38.4	70.4	36.6	59.6	17.1	49.7	52.1	55.2	50.8
MSE	7.9	24.2	28.5	27.5	37.0	42.1	44.7	44.8	40.3	45.8
DCR (‰)	2	1	0	0	0	0	0	0	2	13
FID ( $10^{-6}$ )	22.7	26.3	26.6	26.3	20.7	20.8	22.5	21.6	17.7	23.5

In contrast to the unguided model, the proposed model exhibits notably enhanced accuracy (7/10 above 65%) and low MSE (9/10 below 30), highlighting its high precision in generating samples to specified requirements. Notably, both models yield a low DR of less than 13‰, emphasizing decision diversity by randomly sampling from implicit knowledge. Low FID underscores the model’s ability to closely match the patterns and distributions of the training data and reproduce key features to generate realistic decisions. Comparatively, the FID score in scenarios with guidance is much lower in the extreme capacities, easily distinguishable in patterns, but higher in capacity 90-150 due to high similarities in the corresponding configurations. Overall, these precise, high-fidelity, and diverse decisions showcase the resilience and creativity of GMS in accommodating uncertainties and diverse objectives.

Fig. 5 illustrates the dynamic interaction between humans and assets in GMS, refining system configurations (marked by shapes) and schedules (marked by colors). Using the ChatGPT API, human textual inquiries were transformed into class label $c$ to guide the diffusion model in the sampling process. Analyzing the top 5 decisions based on the best fitness, our findings underscore the remarkable capability of GMS to harmonize the system with diverse objectives, including capacity, human skills, and involved assets. This interactive process centralizes humans in decision-making, cultivating a synergistic collaboration between humans and autonomous assets for continuous exploration and refinement, ultimately shaping GMS to align with varied objectives, constraints, and human desires in real-time.

5 Conclusions

The study introduces GMS to harness the increasing autonomy in manufacturing assets to address uncertainties, human desires, and emerging production objectives. GMS signifies a paradigm shift in decision-making from model-optimization to training-sampling. In an industrial use case, our findings highlight that GMS consistently outperforms existing approaches in decision times, diversity, and quality, highlighting its resilience and creativity. GMS adeptly adjusts configuration and schedule to human inquiries and additional objectives, fostering human-centric decision-making for collaborative exploration and continuous refinement. Future studies could explore diverse scenarios, including decisions (e.g., diagnosis, quality control) and performance metrics (e.g., carbon emissions, human well-being) while incorporating more complex human inquiries through embeddings rather than fixed classes.

References

[1] S. J. Hu, Evolving paradigms of manufacturing: From mass production to mass customization and personalization, Procedia Cirp 7 (2013) 3–8.
[2] X. Li, B. Wang, C. Liu, T. Freiheit, B. I. Epureanu, Intelligent manufacturing systems in covid-19 pandemic and beyond: framework and impact assessment, Chinese Journal of Mechanical Engineering 33 (2020) 1–5.
[3] K. Katsaliaki, P. Galetsi, S. Kumar, Supply chain disruptions and resilience: A major review and future research agenda, Annals of Operations Research (2022) 1–38.
[4] D. Balla-Elliott, Z. B. Cullen, E. L. Glaeser, M. Luca, C. T. Stanton, Business reopening decisions and demand forecasts during the COVID-19 pandemic, no. w27362, National Bureau of Economic Research, 2020.
[5] S. Huang, B. Wang, X. Li, P. Zheng, D. Mourtzis, L. Wang, Industry 5.0 and society 5.0—comparison, complementation and co-evolution, Journal of manufacturing systems 64 (2022) 424–428.
[6] A. KUSTAK, Flexible manufacturing systems: a structural approach, International Journal of Production Research 23 (6) (1985) 1057–1073.
[7] Modular assembly — audi-mediacenter.com, https://www.audi-mediacenter.com/en/audi-techday-smart-factory-7076/modular-assembly-7078, [Accessed 16-03-2024].
[8] N. Ghodsian, K. Benfriha, A. Olabi, V. Gopinath, A. Arnou, Mobile manipulators in industry 4.0: A review of developments for industrial applications, Sensors 23 (19) (2023) 8026.
[9] A. Hottenrott, M. Schiffer, M. Grunow, Flexible assembly layouts in smart manufacturing: An impact assessment for the automotive industry, IISE Transactions 55 (11) (2023) 1144–1159.
[10] H.-P. Wiendahl, V. Ahrens, Agent-based control of self-organized production systems, CIRP Annals 46 (1) (1997) 365–368.
[11] M. C. May, L. Kiefer, A. Kuhnle, N. Stricker, G. Lanza, Decentralized multi-agent production control through economic model bidding for matrix production systems, Procedia Cirp 96 (2021) 3–8.
[12] A. Ma, A. Nassehi, C. Snider, Anarchic manufacturing, International Journal of Production Research 57 (8) (2019) 2514–2530.
[13] S. B. Liu, M. Althoff, Optimizing performance in automation through modular robots, in: 2020 IEEE International Conference on Robotics and Automation (ICRA), IEEE, 2020, pp. 4044–4050.
[14] A. Kusiak, Universal manufacturing: Enablers, properties, and models, International Journal of Production Research 60 (8) (2022) 2497–2513.
[15] Z. Qin, Y. Lu, Self-organizing manufacturing network: A paradigm towards smart manufacturing in mass personalization, Journal of Manufacturing Systems 60 (2021) 35–47.
[16] Y. Lu, H. Zheng, S. Chand, W. Xia, Z. Liu, X. Xu, L. Wang, Z. Qin, J. Bao, Outlook on human-centric manufacturing towards industry 5.0, Journal of Manufacturing Systems 62 (2022) 612–627.
[17] J. Leng, X. Zhu, Z. Huang, X. Li, P. Zheng, X. Zhou, D. Mourtzis, B. Wang, Q. Qi, H. Shao, et al., Unlocking the power of industrial artificial intelligence towards industry 5.0: Insights, pathways, and challenges, Journal of Manufacturing Systems 73 (2024) 349–363.
[18] A. Nassehi, M. Colledani, B. Kádár, E. Lutters, Daydreaming factories, CIRP Annals 71 (2) (2022) 671–692.
[19] X. Li, A. Nassehi, B. Wang, S. J. Hu, B. I. Epureanu, Human-centric manufacturing for human-system coevolution in industry 5.0, CIRP Annals 72 (1) (2023) 393–396.
[20] J. Ho, A. Jain, P. Abbeel, Denoising diffusion probabilistic models, Advances in neural information processing systems 33 (2020) 6840–6851.