Facilitating AI and System Operator Synergy: Active Learning-Enhanced Digital Twin Architecture for Day-Ahead Load Forecasting
Abstract
In this paper, we introduce a synergistic approach between artificial intelligence and system operators through an innovative digital twin architecture, integrated with an active learning framework, to enhance short-term load forecasting. Central to this architecture is the incorporation of sophisticated data pipelines, facilitating the real-time ingestion, processing and analysis of grid-related data. Utilizing a recurrent neural network architecture, our model generates day-ahead load forecasts together with prediction confidence intervals, strengthening system operator trust in the model’s predictive reliability and enhancing their ability to respond to evolving grid conditions effectively. The active learning framework iteratively refines the predictions by incorporating real-time feedback based on forecast uncertainty, utilizing newly available data to continuously enhance forecasting accuracy and confidence. This AI-assisted strategy is exemplified in a case study of the Greek transmission system. It demonstrates the potential to transform short-term load forecasting, thereby increasing the reliability and operational efficiency of modern power grids. This approach marks a significant step forward in the digitalization and intelligent management of power systems.
I Introduction
The rapid evolution of power systems, driven by the rapid digitalization and the shift towards renewable energy, poses significant challenges in grid management. This transformation necessitates the integration of more intelligent and responsive frameworks to ensure stable and efficient grid operations. The role of Artificial Intelligence (AI) in assisting grid operators has become increasingly crucial in navigating these complexities [1]. AI’s potential to enhance decision-making under uncertainty and its application in grid management is a growing area of interest.
Highlighting the complexities of power grid operations in the digital age, recent studies stress the need for advanced AI-driven Human-Machine Interfaces (HMI) [2], promoting new frameworks for grid management assistance and discuss the changing roles of human operators in control rooms [3], emphasizing the cognitive challenges and decision support systems needed in highly automated power systems.
Innovations in Digital Twin (DT) technology are revolutionizing smart grids. Study [4] introduces a DT framework for electrical distribution systems, emphasizing its practical application in addressing the integration of imperfect data, which is crucial for realistic DT implementations in distribution networks. Complementing this, [5] details a DT framework for power grid online analysis, focusing on its integration with an actual power grid’s Energy Management System (EMS) and highlighting features like in-memory computing and machine learning. This framework demonstrates the potential of DTs in enhancing decision-making and operational efficiency in power grid management. Together, these studies highlight the transformative role of DT technology in advancing smart grid operations.
Probabilistic Load Forecasting (PLF) has become a vital tool for electricity market participants and system operators, particularly for anticipating grid challenges like power imbalances and congestions. [6] explores this field with a Recurrent Neural Network (RNN) designed for day-ahead forecasting of residual loads. Their approach includes both parametric and non-parametric models, ensuring reliable forecasts. Key to their methodology is the use of probabilistic evaluation metrics like the ignorance score and quantile score, enhancing the model’s accuracy and facilitating its comparison with other forecasting methods. Complementing this, [7] introduces ProbCast, a versatile tool for generating probabilistic forecasts, especially in energy forecasting. ProbCast supports advanced techniques like parametric and non-parametric density forecasting, making it instrumental in managing uncertainties in power system operations.
In the smart grid domain, Active Learning (AL) is enhancing the adaptability and the accuracy of forecasting models. [8] developed a deep ensemble learning model for short-term load forecasting (STLF), which employs an AL framework. This model integrates a Long Short-Term Memory (LSTM) network with a multi-layer perceptron to accurately capture the complex load patterns affected by various factors like weather. The AL component selectively trains the model using similar load segments, effectively addressing data imbalances and enhancing forecasting performance. [9] explored an AL strategy for building energy forecasting, efficiently generating informative training data while considering weather impacts. Their approach successfully addresses data bias problems common in building operation data, leading to improved model accuracy and extendibility, thus showcasing the potential of AL in energy management and forecasting.
This paper introduces an innovative approach that facilitates the synergy between AI and system operators through a novel DT architecture integrated with an AL framework for enhanced STLF. Our approach not only aligns with the trajectory of the papers reviewed but also extends their individual contributions into a comprehensive solution for solving modern grid challenges, distinct in its integration of these concepts into a cohesive system that advances beyond the individual solutions presented in the literature. While this paper focuses on the integration of PLF as a key service, the proposed DT architecture is designed to support various AI-driven services, thus offering a versatile platform for intelligent grid management.
The rest of this paper is structured as follows: Section II elaborates on the DT architecture, Section III discusses the PLF, Section IV explains the AL framework, Section V presents a detailed case study with results, and Section VI concludes with a summary of our findings and future research directions.
II Digital Twin Architecture
The DT architecture presented in this paper is a combination of data management and computational modeling and simulation of power grid networks to improve the operational decision-making processes. It brings together real-time and historical data with advanced computational analytics, providing a robust digital replica of the power grid for enhanced management and decision-making. As illustrated in Figure 1, the architecture consists of several key components, each serving a distinct role within the system.

Data sources act as inputs, varying from CSV files to real-time data streams provided by APIs. These sources encompass various time-series data, including day-ahead load and generation forecasts, actual load and generation, weather-related data, and static data, such as grid topologies and infrastructure specifications. Dagster [10], a modern framework that orchestrates the flow of data from these disparate sources, serves as the cornerstone for data ingestion, pre-processing and storage. Dagster’s primary role is to streamline the data lifecycle processes, namely extraction, transformation, and loading to facilitate timely and organized data delivery to the TimescaleDB [11]. This time-series database is suitable for managing large-scale data with intrinsic temporal attributes, ensuring data fidelity and query efficiency.
At the core of the DT architecture is PyPSA [12], a comprehensive tool for power system analysis that enables network modeling and power flow solving. PyPSA is the computational engine that enables the simulation of power grid behavior under different operational scenarios.
OperatorFabric [13], a state-of-the-art HMI that presents system operators with intuitive access to real-time insights and simulation outputs, acts as the visualization engine. This interface is crucial for the operators to make informed decisions based on the simulations and analytics results provided by PyPSA.
The DT architecture is designed to support various services on top of its core structure. These services, in our case Load Forecasting and Active Learning, interact with the data repository for data storage and retrieval, providing their outputs to the DT for scenario simulation and to the HMI for visualization. The system operator supervises the results through the HMI and sets parameters for the AL based on their experience, establishing a human-in-the-loop approach.
III Probabilistic Load Forecasting
The basis of our approach towards AI-assisted decision-making to enhance grid management is the implementation of PLF utilizing RNNs. RNNs, particularly effective in forecasting time-series data due to their ability to handle variable-length sequences and share weights across time steps, form the backbone of our probabilistic model.
III-A Model Architecture
Our PLF model leverages an advanced RNN architecture [14], specifically designed to handle the intricacies of load forecasting with a particular emphasis on the probabilistic part enforced through a loss function that takes into account both forecast accuracy and uncertainty. By adopting an encoder-decoder implementation, as can be seen in Figure 2, our approach processes sequentially time-series data, capturing temporal dependencies essential for accurate forecasting.

The encoder sequentially processes past input data (), such as historical load profiles and various meteorological conditions, extracting important patterns. Each input at time step generates a hidden state through the RNN cell. This information is encapsulated within the RNN cell state constituting a compact internal representation of historical data insights. The final hidden state from the encoder, known as the encoder vector, summarizes all the input information up to the current time step. Subsequently, the decoder, informed by the encoder’s state, combines this knowledge with additional inputs () to predict future loads. These supplementary inputs include time-based variables, such as the hour of the day and the month, providing to the model precise temporal context. The decoder’s predictive performance is enhanced by a fully-connected neural network incorporating dropout, which adds robustness to the forecasting task. The features are normalized using MinMax scaling to ensure that the model inputs have a consistent scale, which can aid in the convergence and performance of the model.
Targeting accurate load forecasts and confidence estimation, our model employs a Gaussian Negative Log Likelihood (GNLL) loss function, defined as:
(1) |
where is the number of time steps, and represent the forecasted mean and variance respectively, and denotes the true load at time . This loss function rigorously quantifies the model’s performance in capturing the distribution of load forecasts, facilitating the generation of reliable and accurate prediction intervals.
Moreover, the RNN model’s implementation in PyTorch [15] leverages dynamic computational graphs, allowing for flexible coding and efficient training via GPU acceleration. This adaptability is essential for iterative model refinement contributing to the development of a robust PLF model capable of addressing the challenges posed by the evolving energy landscape.
III-B Probabilistic Forecast Metrics
Probabilistic forecasting extends beyond simple point predictions by providing a comprehensive statistical distribution of future events, characterized by Probability Density Functions (PDF) or Cumulative Distribution Functions (CDF), allowing system operators to evaluate and manage risks. These methods can be categorized into non-parametric and parametric approaches. Non-parametric methods derive a set of quantile values by minimizing quantile/pinball loss without assuming a predefined distribution shape. Parametric methods, on the other hand, assume a specific distribution form, such as normal or log-normal, and optimize its parameters by minimizing losses like negative log-likelihood.
The evaluation of probabilistic forecasts involves several key metrics, each providing unique insights into the forecast performance. In our paper, the evaluation of probabilistic forecasts specifically employs Prediction Interval Coverage Probability (PICP) and Sharpness. PICP evaluates the percentage of observations that fall within the predicted intervals, serving as a crucial metric for forecast reliability and ensuring that forecasts accurately represent the uncertainty in predictions:
(2) |
where is the dataset size, is the actual value, and and are the upper and lower bounds of the prediction interval. Sharpness measures the concentration of predictive distributions, highlighting the precision of the forecasts independent of their actual accuracy. It is quantified by the average width of the central prediction intervals, with a narrower width indicating more precise forecasts. This metric is crucial as it demonstrates the model’s capacity to provide detailed forecasts while maintaining reliability:
(3) |
Incorporating these metrics into our evaluation framework ensures that our probabilistic forecasts are accurate and provide meaningful uncertainty estimates, aligning with the real-world complexities of forecasting in power grids. These metrics serve as the foundation for robust model assessment, guiding both the refinement and deployment of our predictive models.
IV Active Learning Framework
The current study introduces an AL framework specifically designed to improve the accuracy and reliability of load forecasting models, particularly the RNN model capable of generating confidence intervals, as presented in Section III. This iterative framework enhances the model’s adaptability to the dynamic nature of power grid management, influenced by renewable energy integration and fluctuating consumption patterns. The AL framework includes a sequence of steps, as depicted in Fig. 3, beginning with the initial training phase, which establishes a benchmark for subsequent iterative improvements. The model then starts making short-term load predictions on new data, quantifying the uncertainty of these predictions and the predictions together with the mean and the standard deviation are stored in the data repository.

There are several strategies for the query mechanism in AL, including uncertainty sampling, query by committee, expected model change, expected error reduction, and diversity sampling [16]. Uncertainty sampling identifies the data points on which the model is least certain. Query by committee uses a committee of models and selects data points where there is the most disagreement among the models. Expected model change selects data points that would cause the most significant change to the current model if labeled. Expected error reduction chooses data points that are expected to reduce the overall error of the model the most. Diversity sampling ensures that the selected data points are diverse and cover different regions of the data space.
In our case, we use uncertainty sampling and the uncertainty is quantified through the standard deviation . The query mechanism selects data points for acquisition based on the function:
(4) |
where is set based on empirical analysis by the system operator. The value of is crucial as it determines the threshold of uncertainty above which data points are queried. Initially, is set based on historical analysis of the model’s performance on past data, and this is often an empirical decision made by the system operator.
This human-in-the-loop approach ensures that the model’s learning process is guided by expert knowledge and operational priorities. The query mechanism identifies data points with high uncertainty and then the framework initiates an automated query process for the actual load values corresponding to these uncertain predictions from the data repository. This step is vital, as it supplies the model with real, observed data that was previously marked by significant predictive uncertainty. With the newly acquired and augmented dataset, the model undergoes a retraining process. This process, also described in Algorithm 1, allows the model to incorporate the new information, adjust its predictions, and improve its ability to recognize emerging patterns and trends in the data.
Through this iterative cycle of predicting, querying, and updating, the AL framework ensures continuous improvement in the model’s performance, enhancing its adaptability to the dynamic nature of grid operations. The system operator reviews and adjusts after each cycle, ensuring continuous improvement in the model’s performance. Also, the DT architecture gives the ability to the system operators to identify through the HMI data points that correspond to rare events, such as extreme weather events, that conventional prediction methods might misinterpret, and this information can be used to enhance the training dataset. Each refinement phase aims to reduce the forecast error and enhance the confidence, iteratively improving the model’s performance.
V Case Study - Results
This section presents a practical examination of our AI-assisted DT system, enhanced by AL, using the Greek transmission network as a benchmark. We chose this network because it reflects the complexities and challenges that modern power systems face. The case study is designed to demonstrate the effectiveness of our DT architecture in making accurate and reliable predictions. .
V-A Case Study
Within the scope of this case study, the Greek transmission network was modeled to establish the core of our DT. Using the PyPSA-Eur tool [17], we were able to extract a detailed topological representation of the network, including buses—indicative of substations and generators—and transmission lines. The DT, enhanced with active learning, turns into a powerful and flexible tool for system operators to manage the power system more effectively, allowing them to use the interface to run different scenarios and supervise the forecasts.
To gather the necessary data for the DT, we used a straightforward approach. We gather near real-time and historical data through API calls from ENTSO-E [18] and the Greek TSO IPTO [19]. ENTSO-E gives us a wide view of the European grid, while IPTO provides details specific to the Greek system. We also obtain weather data from OpenWeatherMap [20], recognizing its impact on the grid’s energy demand and production.
V-B Results
In our analysis, we developed separate RNN models for each substation within the Greek transmission network, in addition to a model dedicated to forecasting the total load. It is the latter, focusing on the aggregate load across the network, that we showcase here to illustrate the effectiveness of our PLF approach.
Initially, the RNN model was trained using a comprehensive dataset encompassing historical load data and weather conditions across the Greek transmission network. The historical data used for training and testing spans from 2021-01-01 to 2023-12-31. Specifically, the first two years of data (2021-2022) were used for training the RNN model, while the last year (2023) was reserved for testing. This training set included data on temperature, wind direction, speed, and precipitation, reflecting significant correlations with load patterns as identified through rigorous time-series analysis. The training process for the RNN model took approximately 1 hour on a laptop with the Intel(R) Core(TM) i7-9850H CPU @ 2.60GHz and 16.0 GB RAM.
The feature set, coupled with the model’s architecture—including LSTM layers and dropout for regularization—facilitated a nuanced capture of temporal dependencies. The model uses an LSTM as its core network, with one LSTM layer followed by a fully connected layer. The dropout rates are 0.4 for the fully connected layers and 0.3 for the LSTM layer. The activation function used is Leaky ReLU with a leak of 0.1. The training parameters include a maximum of 50 epochs and a batch size of 32. The model employed a history horizon of 168 hours to inform its predictions, with a forecast horizon set to 24 hours ahead, aligning with the operational requirements for day-ahead planning. Fig. 4 provides a visual representation of the model’s day-ahead forecasting capabilities, showcasing the precision with which the RNN model can predict total network load alongside the associated 95% confidence intervals.

We benchmarked the RNN model against traditional forecasting methods such as ARIMA, SARIMA [21], and Prophet [22] over the entire test set to underline its comparative superiority. This qualitative and quantitative comparison highlights the RNN model’s enhanced accuracy, reliability, and the ability to produce actionable forecasts, as evidenced by its performance metrics including lower Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and Mean Absolute Error (MAE), alongside improved sharpness and PICP, as can be seen in Table I, where the values represent the mean of the metrics over the entire test set.
MSE | RMSE | MAE | Sharpness | PICP | |
ARIMA | 0.0234 | 0.153 | 0.1282 | N/A | N/A |
SARIMA | 0.0107 | 0.1034 | 0.0862 | N/A | N/A |
Prophet | 0.024 | 0.155 | 0.1357 | 0.3469 | 80.1666 |
RNN | 0.002 | 0.0452 | 0.034 | 0.208 | 97.6774 |
Notably, the traditional forecasting methods such as ARIMA and SARIMA do not typically provide probabilistic outputs, which is why ’N/A’ (not applicable) is listed under the sharpness and PICP columns for these models in Table I. This highlights their limitations in providing uncertainty estimates, which are crucial for developing the system operator’s trust in AI.
The comparative analysis of forecasting models demonstrates the RNN’s superior performance over traditional methods like ARIMA, SARIMA, and Prophet across several key metrics. Qualitatively, the RNN model exhibits significantly higher accuracy and reliability in predicting day-ahead loads, underpinned by its effectiveness in capturing complex temporal dependencies within the data. Furthermore, the RNN model ensures a high level of forecast confidence, as indicated by its competitive sharpness and notably high PICP. This suggests that the RNN model not only predicts with greater accuracy but also provides forecasts with reliable uncertainty estimates, making it a more dependable choice for grid management and operational planning. The integration of an AL framework with the RNN model is anticipated to enhance these attributes further, leveraging real-time data and operator insights for continuous improvement in forecasting performance.
The AL process began with the system operator setting the uncertainty threshold to 1000 based on retrospective analysis, ensuring forecasts with the highest uncertainty were flagged for improvement. The framework then queried actual load values from the data repository, augmented the training set with this data, and retrained the model. After the retraining, the system operator can review and adjust as needed based on the new confidence intervals. Sensitivity analysis showed that a high might miss improvement opportunities, since fewer data points will be queried, while a low could lead to unnecessary computational overhead without significant accuracy gains, since too many data points will be queried.
Subsequent re-training of the RNN model with these targeted queries led to observable improvements in forecast accuracy and confidence. For instance, incorporating real-time data corresponding to high-uncertainty predictions enabled the model to adjust to emerging patterns, reducing the overall prediction error and tightening the confidence intervals around forecasts, as can be seen in Fig. 5.

To illustrate the enhancements brought about by the AL framework, we present a comparative analysis of the RNN model’s performance before and after applying AL. This analysis reveals a marked reduction in forecast error and uncertainty, substantiating the effectiveness of integrating real-time operational feedback into the forecasting process. Table II provides a detailed comparison of key forecasting metrics, presented as the mean of the metrics over the entire test set, before and after AL enhancement, highlighting the framework’s contribution to improved load forecasting accuracy and reliability.
MSE | RMSE | MAE | Sharpness | PICP | |
---|---|---|---|---|---|
RNN (before AL) | 0.002 | 0.0452 | 0.034 | 0.208 | 97.6774 |
RNN (after AL) | 0.001 | 0.0343 | 0.0321 | 0.1823 | 98.3543 |
After the AL intervention, we observe a decrease in MSE and RMSE, suggesting a tighter fit of the RNN predictions to the true load values. A lower MAE indicates improved average accuracy, and a higher PICP value reflects better coverage of actual loads within the predicted confidence intervals. The decrease in the sharpness value suggests that the confidence intervals have become narrower, indicative of increased precision in the forecasts.
VI Conclusion and Future Steps
This study has demonstrated the significant benefits in the accuracy and reliability of day-ahead load forecasts by integrating AI with human system operators through an innovative AL framework and DT architecture. By focusing on the synergy between AI and system operators, we have not only enhanced decision-making processes but also begun to address operator reluctance towards AI adoption. The key to this progress lies in explainability, which builds trust and understanding in AI-generated insights.
Indicative future work includes the expansion of the DT architecture’s services to encompass broader aspects of power grid optimization, including stochastic, security-constrained, and multi-period OPF problems. Furthermore, significant potential is seen in incorporating Large Language Models (LLMs), like ChatGPT for power grid visualization, an example of which is the pioneering ChatGrid platform developed by PNNL [23]. ChatGrid represents an exciting advancement in generative AI for power grid visualization, offering intuitive, AI-driven insights into grid dynamics, operational constraints, and optimization opportunities. The inclusion of such tools in the presented DT architecture could revolutionize the visualization and interpretation of complex grid data, providing operators with unprecedented clarity and foresight in their decision-making processes.
Acknowledgment
This work was supported by European Union’s funded Project HUMAINE [grant number 101120218].
References
- [1] Yousu Chen, Xiaoyuan Fan, Renke Huang, Qiuhua Huang, Ang Li, and Kishan Prudhvi Guddanti. Artificial intelligence/machine learning technology in power system applications. Technical report, Pacific Northwest National Laboratory (PNNL), Richland, WA (United States), 2024.
- [2] Antoine Marot, Alexandre Rozier, Matthieu Dussartre, Laure Crochepierre, and Benjamin Donnot. Towards an AI assistant for human grid operators. In Hybrid Human Artificial Intelligence (HHAI), Amsterdam, France, June 2022.
- [3] Sudip K Mazumder, Mohammad Shadmand, H Alan Mantooth, Chris Farnell, Salam Baniahmed, Arif I Sarwat, Mohd Tariq, Manimaran Govindrasu, Jay Johnson, and Gab-Su Seo. Power grid resilience. In Power Electronics Handbook, pages 1015–1033. Elsevier, 2024.
- [4] Matthew Deakin, Marta Vanin, Zhong Fan, and Dirk Van Hertem. Smart energy network digital twins: Findings from a UK-based demonstrator project. arXiv preprint arXiv:2311.11997, 2023.
- [5] Mike Zhou, Jianfeng Yan, and Donghao Feng. Digital twin framework and its application to power grid online analysis. CSEE Journal of Power and Energy Systems, 5(3):391–398, 2019.
- [6] Gonca Gürses-Tran, Hendrik Flamme, and Antonello Monti. Probabilistic load forecasting for day-ahead congestion mitigation. In 2020 International Conference on Probabilistic Methods Applied to Power Systems (PMAPS), pages 1–6. IEEE, 2020.
- [7] Jethro Browell and Ciaran Gilbert. Probcast: Open-source production, evaluation and visualisation of probabilistic forecasts. In 2020 International Conference on Probabilistic Methods Applied to Power Systems (PMAPS), pages 1–6. IEEE, 2020.
- [8] Zengping Wang, Bing Zhao, Haibo Guo, Lingling Tang, and Yuexing Peng. Deep ensemble learning model for short-term load forecasting within active learning framework. Energies, 12(20):3809, 2019.
- [9] Liang Zhang and Jin Wen. Active learning strategy for high fidelity short-term data-driven building energy forecasting. Energy and Buildings, 244:111026, 2021.
- [10] Elementl. Dagster. Software available at https://github.com/dagster-io/dagster.
- [11] TimeScale, Inc. TimescaleDB. Software available at https://github.com/timescale/timescaledb.
- [12] T. Brown, J. Hörsch, and D. Schlachtberger. PyPSA: Python for Power System Analysis. Journal of Open Research Software, 6(4), 2018.
- [13] RTE France. OperatorFabric. Software available at https://github.com/opfab/operatorfabric-core.
- [14] Alex Sherstinsky. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404:132306, 2020.
- [15] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
- [16] Alaa Tharwat and Wolfram Schenck. A survey on active learning: state-of-the-art, practical challenges and research directions. Mathematics, 11(4):820, 2023.
- [17] Jonas Hörsch, Fabian Hofmann, David Schlachtberger, and Tom Brown. Pypsa-eur: An open optimisation model of the european transmission system. Energy strategy reviews, 22:207–215, 2018.
- [18] Lion Hirth, Jonathan Mühlenpfordt, and Marisa Bulkeley. The ENTSO-E transparency platform–a review of Europe’s most ambitious electricity data platform. Applied energy, 225:1054–1067, 2018.
- [19] IPTO. IPTO API. https://www.admie.gr/en/market/market-statistics/file-download-api.
- [20] OpenWeather. OpenWeatherMap API. https://openweathermap.org/api.
- [21] Dima Alberg and Mark Last. Short-term load forecasting in smart meters with sliding window-based arima algorithms. Vietnam Journal of Computer Science, 5:241–249, 2018.
- [22] Sean J Taylor and Benjamin Letham. Forecasting at scale. The American Statistician, 72(1):37–45, 2018.
- [23] PNNL. ExaGO. Software available at https://github.com/pnnl/ExaGO.