This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Harnessing Scalable Transactional Stream Processing for Managing Large Language Models [Vision]

Shuhao Zhang, Xianzhi Zeng, Yuhao Wu, Zhonghao Yang
Singapore University of Technology and Design
Abstract

Large Language Models (LLMs) have demonstrated extraordinary performance across a broad array of applications, from traditional language processing tasks to interpreting structured sequences like time-series data. Yet, their effectiveness in fast-paced, online decision-making environments requiring swift, accurate, and concurrent responses poses a significant challenge. This paper introduces TStreamLLM, a revolutionary framework integrating Transactional Stream Processing (TSP) with LLM management to achieve remarkable scalability and low latency. By harnessing the scalability, consistency, and fault tolerance inherent in TSP, TStreamLLM aims to manage continuous & concurrent LLM updates and usages efficiently. We showcase its potential through practical use cases like real-time patient monitoring and intelligent traffic management. The exploration of synergies between TSP and LLM management can stimulate groundbreaking developments in AI and database research. This paper provides a comprehensive overview of challenges and opportunities in this emerging field, setting forth a roadmap for future exploration and development.

1 Introduction

Large language models (LLMs) have become increasingly influential, propelling numerous advancements not just in natural language understanding and generation, but also in areas such as time-series analysis, structured sequence interpretation, and artificial intelligence overall [3, 5, 33]. Their unprecedented scale and complexity allow them to excel at zero-shot and few-shot learning tasks [3, 26], opening up diverse applications across a multitude of domains. However, the promising capabilities of LLMs come with their own set of challenges.

Continuous Model Updates (£AA):

The success of LLMs hinges on significant resource consumption and a heavy reliance on the pre-training process [36, 14]. As a result, there exists a knowledge cutoff for LLMs. While the world continually evolves with new concepts, events, and trends [30, 25], LLMs stay static after their pre-training. Therefore, keeping them updated and maintaining their relevance and accuracy pose significant challenges [19].

Concurrent Model Updates and Usage (£BB):

The demand for real-world applications that require reliable and prompt responses amidst intensive concurrent model updates and usage presents another layer of complexity. Addressing the requirement for concurrent model updates and usage is not only critical but also inevitable, as potential conflicts and dependencies among multiple services may arise.

Optimization and Acceleration (£CC):

Various techniques have been developed to accelerating model train and inference, such as mixed precision training [29], distillation [15], pruning [13], and quantization [18]. Additionally, the exploitation of novel hardware architectures [4] can enhance the performance of LLMs without significantly sacrificing their accuracy. However, adapting these methods for real-time operation and ensuring their compatibility with other concurrent services presents a significant challenge.

To address these issues, we introduce a visionary approach in this paper: TStreamLLM. This innovative framework aims to achieve ultra-scalability and low latency in managing concurrent LLM updates and usage. The key concept behind TStreamLLM is the integration of transactional stream processing (TSP) techniques [27] into LLM management. TSP, an emerging data stream processing paradigm, offers real-time adaptation, data consistency, fault tolerance, and fine-grained access control [16]—qualities that make it suitable for managing LLMs under intensive concurrent stream processing scenarios [37].

By leveraging TSP’s scalability, fault-tolerance, and streaming semantics, TStreamLLM empowers LLM management to substantially improve upon existing solutions. For instance, it reduces the best achievable long-run latency to a linear function of the single-user-single-run model manipulation overhead. These innovations could expand the potential of LLMs across a multitude of AI applications. Furthermore, the TSP-empowered LLM management system presents the database research community with flexible, adaptive methods for data ingestion, manipulation, and mining.

In summary, this paper makes the following contributions: We start by illustrating two practical use cases of LLMs, highlighting the pressing need for a system that can effectively manage continuous model updates, handle concurrent model updates and usage, and optimize and accelerate model operation in a real-time, scalable, and efficient manner (Section 2). Next, we introduce our novel solution to these challenges: the TStreamLLM framework. TStreamLLM integrates TSP techniques into LLM management, offering potential improvements in efficiency, scalability, and adaptability (Section 3). Lastly, we explore the challenges and open research questions in this emerging field (Section 4). Our discussion sets a foundation for future research aimed at developing novel LLM architectures and management strategies leveraging TSP, thereby propelling advancements in AI and database research (Section 5).

2 Use Cases

In this section, we delve into two significant real-world applications of TStreamLLM, namely Real-time Patient Monitoring in Healthcare and Traffic Management in Smart Nation, showcasing how TStreamLLM effectively tackles the three main challenges of LLM management (£AA, £BB, and £CC).

Use Case 1: Real-time Patient Monitoring in Healthcare:

Real-time patient monitoring has gained substantial relevance in the rapidly evolving field of healthcare [21, 22]. A patient monitoring system implemented on TStreamLLM enables the processing of a wide range of data, including electrocardiogram reports for patients under observation and medical condition descriptions from remote patients. By learning and analyzing these input data using the LLM, TStreamLLM generates real-time health monitoring outputs and offers diagnostic assistance to doctors, as depicted in Figure 1.

To stay updated on the latest health condition of patients (£AA), TStreamLLM continuously fine-tunes the LLM to incorporate the most recent health data. By leveraging stream processing, the system efficiently carries out noise removal, feature extraction, and identification of key health indicators on input data. It concurrently updates LLM states (model parameters and metadata) using parallel executors, effectively meeting the real-time operational requirements (£CC).

However, ensuring consistency in the LLM during concurrent model updates and queries poses a notable challenge (£BB) due to the intricate dependencies involved in model access requests. TStreamLLM successfully addresses this challenge by employing transactional concurrency control mechanisms. This allows for real-time querying and seamless access to the dynamically evolving LLM without impeding its ongoing training process, ensuring the efficient provision of diagnostic assistance to doctors.

Refer to caption
Figure 1: TStreamLLM applied in real-time patient monitoring in healthcare.
Refer to caption
Figure 2: TStreamLLM’s role in online traffic management within a smart nation framework.

Use Case 2: Intelligent Traffic Management in Smart Cities:

In the context of smart city traffic management, the optimization of city-wide traffic flow and response times necessitates an intelligent solution [31, 6]. However, there are challenges (£AA and £BB) posed by the dynamic nature of traffic data. These challenges involve maintaining model consistency and facilitating continuous learning in the face of data from diverse sources, such as road sensors, traffic cameras, and user-reported incidents.

TStreamLLM excels in managing concurrent data streams, ensuring the LLM is consistently updated with real-time traffic conditions (Figure 2). Additionally, it collaborates with manual monitoring to handle complex traffic queries and offer context-aware recommendations (£CC).

During emergency situations like ambulance requests, TStreamLLM effectively demonstrates its real-time capabilities by promptly notifying the nearest ambulance, identifying the optimal route to the hospital, and simultaneously generating traffic control signals to facilitate the ambulance’s movement. In more complex scenarios involving concurrent emergency calls (£BB), TStreamLLM efficiently learns and generates optimal traffic control strategies. It effectively allocates resources and prevents further damage.

3 Harnessing Transactional Stream Processing for LLM Management

This section provides an overview of how TStreamLLM harnesses the power of TSP to manage LLMs effectively. As illustrated in Figure 3, TStreamLLM uniquely integrates TSP techniques into LLM management, marking a pioneering framework that opens up avenues for future research.

TStreamLLM is designed around four critical components: (1) Stream Processing that efficiently processes real-time data streams and user inference requests, (2) Real-time Adaptation and Learning that facilitates dynamic adaptation of the LLM based on incoming data, (3) Transaction Management that guarantees model consistency and efficient update propagation, and (4) LLM State Management that ensures the LLM remains up-to-date, managing the storage of LLM parameters and metadata. These components not only interlink to form the integrated TStreamLLM, but also function independently, offering TStreamLLM remarkable versatility across various scenarios.

Refer to caption
Figure 3: Architectural overview of TStreamLLM.

3.1 Stream Processing

The Stream Processing component is at the core of TStreamLLM, designed to efficiently handle and process real-time data streams, supporting the optimization and acceleration of LLM under concurrent services (£CC). As a plethora of data from user interactions, device logs, or sensor readings continuously flow in, this component acts as a dynamic dispatcher. It preprocesses the data, filters out irrelevant content, transforms raw data into a format digestible for LLMs, and performs various aggregations to distill meaningful insights.

The Stream Processing component utilizes advanced techniques to effectively manage high-velocity and high-volume data streams. To optimize the handling of incoming data, data stream compression [40] is implemented, reducing storage and computational demands. Additionally, parallel processing [41] enables simultaneous management of multiple data streams, enabling TStreamLLM to keep up with the constant influx of data.

The Stream Processing component goes beyond its role of handling incoming data streams for model updates and also addresses real-time user inference requests using transactional semantics. It efficiently processes and models user requests as transactions and delivers real-time responses based on the adapting LLM, facilitating seamless interaction between users and the Transaction Management component (Section 3.3).

3.2 Real-time Adaptation and Learning

The Real-time Adaptation and Learning component plays a crucial role in the continuous fine-tuning of LLM (£AA). It integrates with the Transaction Management component (Section 3.3) to consistently retrieve the latest version of LLM parameters and metadata, and refine these states based on the insights derived from the processed data streams. This continuous learning mechanism allows the LLM to persistently enhance its performance and accuracy, maintaining relevance in the ever-evolving data landscape.

To efficiently perform real-time adaptation and improvement on the model, the Real-time Adaptation and Learning component utilizes concepts from Online Learning (OL), which is a machine learning technique that allows models to be incrementally updated as new data arrives without waiting for large batches [1]. OL enables the LLM to adapt to real-time changes in the continuous data stream, making it highly responsive to the shift in data stream patterns with minimal usage of computation resources, and supports rapid deployment in real-time decision-making scenarios.

However, ensuring the consistency of LLM states in the presence of concurrent inferences and model updates presents a significant challenge (£BB). To address this, the Real-time Adaptation and Learning component models upstream state access operations as transactions, each transaction encapsulates a series of model update operations that must be performed jointly as an atomic unit. These transactions are subsequently handed over to the Transaction Management component (Section 3.3) for reliable execution.

3.3 Transaction Management

The Transaction Management component of TStreamLLM plays a crucial role in ensuring data consistency and enabling efficient update propagation within a transactional stream processing framework [27]. It is responsible for guaranteeing the correctness of LLM states in the presence of concurrent model updates and usages (£BB). By incorporating transactional semantics into LLM state access management, TStreamLLM ensures isolation among concurrent transactions, enabling their execution without interference. Furthermore, it ensures the durability of state updates, making them permanent even in the face of unexpected system failures.

To manage the execution of transactional requests received from upstream components (Section 3.1 and 3.2), the Transaction Management component employs various concurrency control techniques, aiming to allow multiple transactions to proceed without locking any shared states. It carefully analyzes and resolves dependencies among state access operations within transactions. Subsequently, it adaptively schedules state access workloads to parallel executors, which then interact with the LLM State Management component for execution.

3.4 LLM State Management

The LLM State Management component manages the storage of shared stateful objects in LLMs, including parameters (e.g., word embeddings, weights, and biases) and metadata (e.g., training history, model hyperparameters). These states are continuously updated through transactions propagated from the Transaction Management component, ensuring that the LLM remains aligned with the latest insights derived from incoming data streams.

Scalability and efficiency are prioritized by the LLM State Management component, which is crucial for handling large language models that can comprise billions of parameters. To achieve this, TStreamLLM employs a distributed storage strategy, where the LLM states are partitioned and distributed across multiple nodes. This approach harnesses the power of parallel computing, enabling the system to effectively manage and update LLM states while enhancing scalability.

Additionally, the LLM State Management component incorporates efficient indexing strategies to facilitate rapid retrieval and updates of model states. Techniques such as hashing and trie-based index structures are employed to expedite access to state objects, particularly in highly concurrent environments. These indexing techniques contribute to improved performance and efficient handling of LLM states within the system.

4 Open Challenges and Opportunities

While TStreamLLM demonstrates promising potential, there are still challenges and opportunities for research.

4.1 Scalable Stream Processing

To effectively handle high-velocity data streams and update LLMs with minimal latency under high levels of parallelism and heavy workloads, it is crucial to enhance the scalability of TStreamLLM. This challenge opens several avenues for future research:

Data Partitioning and Load Balancing:

Effective data partitioning strategies can evenly distribute language model training data across parallel processing units, resulting in efficient resource utilization and minimized processing bottlenecks [34]. Moreover, designing custom accelerators, GPUs, and multicore processors optimized for parallel processing and stream management can substantially enhance the scalability of stream processing. Future research should also investigate dynamic load balancing mechanisms that can adapt resource allocation in real-time according to fluctuating data rates and computational demands of the language models.

Domain-Specific Stream Processing:

Integrating domain-specific knowledge [9, 11] into the stream processing pipeline enhances the efficiency of LLM management. Research can target developing bespoke stream processing operators and algorithms tailored to particular applications. Machine learning approaches could inform adaptive query optimization techniques that adjust execution plans based on incoming data stream characteristics, language model requirements, and resource availability. A critical challenge lies in managing the volume of training data processed, stored, and transmitted. Implementing domain-specific data stream compression techniques [40] and approximation algorithms could economize resource consumption by accepting a trade-off between accuracy and reduced processing time.

Fault Tolerance and System Reliability:

Maintaining system robustness is vital for TStreamLLM, given the complexity and high volume of data processed by LLMs. Efficient recovery techniques like checkpointing, logging, and rollback mechanisms are essential to minimize disruptions, ensure system availability, and handle transaction failures. Approximate Fault Tolerance (AFT) [17, 35] offers a promising approach, balancing error bounds and backup overheads while bolstering system stability. Future research should explore the potential of emerging hardware technologies and domain-specific fault tolerance strategies to improve system performance and ensure the scalability and reliability of TStreamLLM in managing concurrent LLM updates.

4.2 Real-time Adaptation and Learning

Ensuring the relevance and accuracy of LLMs in the face of dynamic data streams and rapidly changing application requirements necessitates real-time adaptation and learning capabilities in TStreamLLM. Future research can address this challenge by focusing on the following aspects:

Stream Data Selection:

In an environment with large-scale, high-velocity data streams, LLMs face the challenge of selecting the most pertinent data for training [43, 24]. Traditional learning scenarios provide pre-defined datasets for incremental or transfer learning [20, 32], but this approach becomes unfeasible in a dynamic data stream environment [28]. Instead, the model needs to make knowledgeable decisions about data selection based on its existing knowledge. This challenge becomes evident during a newsworthy event, where the model is inundated with redundant information from various media outlets and online comments. In such cases, the model must adopt appropriate data selection techniques to balance the training volume with its understanding of the event, all the while maintaining its neutrality and objectivity.

Continual and Transfer Learning:

In the realm of continual learning, catastrophic forgetting presents a significant challenge [39, 8], whereas, in transfer or stream learning, the model’s adaptability is of greater concern. In TStreamLLM, both these issues co-exist, implying a need for models to possess both forward and backward transfer capabilities. This duality presents new challenges for existing methods. Given the continuous data stream, storing new knowledge becomes difficult for the model. When the model undergoes a training cycle, it can struggle to retain current factual knowledge, a problem exacerbated by random masking mechanisms employed during the training of models like BERT [12]. In an online setting, with continuous data streams and no distinct task or domain boundaries, most offline continual learning methods fall short. Moreover, implementing a core set for replay through methods like gradient computation is particularly challenging for LLMs, leading to potentially high costs.

Adaptive and Efficient Model Training:

The traditional static nature of LLMs post-pre-training presents challenges in dynamic real-world scenarios. The TStreamLLM framework emphasizes the need for frequent, accurate model updates with reduced latency. Classic model updating, involving steps like forward propagation, loss computation, back propagation, and parameter updates, can introduce system latency due to its sequential nature. To address this, we suggest predictive and concurrent model training methods. These would include predicting upcoming loss values based on previous ones, enabling continuous updates even before the completion of prior ones. Another promising direction involves preemptive identification and updating of parameters requiring significant changes post-loss computation, aiming to avoid potential conflicts.

4.3 Streaming Transaction Processing

The TStreamLLM framework hinges on the effective transactional management of LLMs in response to high-velocity data streams. The dynamic nature of data streams and the necessity for real-time response pose exciting research challenges in the realm of streaming transaction processing:

Transactional Model Updates:

Incorporating concurrent and predictive model training methodologies introduces several challenges, including maintaining ACID properties during concurrent updates, especially with high-velocity data streams. Concurrent updates can also create potential conflicts and dependencies among services, adding complexity. Therefore, future research should develop efficient conflict detection and resolution strategies specific to LLMs. Despite the challenges, these strategies, transactional guarantees, and conflict resolution mechanisms could significantly enhance model training efficiency and concurrent update management in the TStreamLLM framework, improving its effectiveness and reliability.

Scalability and Performance Trade-offs:

As the demand for LLMs in real-time applications grows, the TStreamLLM framework must be capable of processing transactions efficiently under high loads. Future research could investigate strategies for scaling streaming transaction processing capabilities [27] to accommodate growing volumes of data streams. This could involve exploring innovative parallel processing techniques, distributed computing solutions, or the use of emerging hardware technologies to accelerate transaction processing. Furthermore, there may be trade-offs between transaction processing speed, system consistency, and model accuracy. Understanding these trade-offs, and developing strategies to balance these conflicting demands could be another crucial area of exploration.

4.4 LLM State Management

The ability to manage the state of LLMs effectively within the TStreamLLM framework forms a critical component of maintaining updated and consistent LLMs. This state management plays a pivotal role in ensuring the framework’s real-time response capabilities. The areas of investigation worth delving into within this domain include:

State Storage and Version Control:

Storage efficiency in LLM state management demands the exploration of innovative methods for compression and storage optimization. Techniques such as delta encoding [7] and sparse representations [42] could minimize storage requirements, thus enhancing the scalability of the TStreamLLM framework. Moreover, contemplating the future integration of vector data management systems [38, 10] into the LLM State Management could further optimize storage and retrieval operations, despite the inherent challenges in handling high-dimensional data. In addition, managing different versions of LLM states efficiently in TStreamLLM, while minimizing the overhead of maintaining these versions, is of paramount importance. This calls for efficient versioning and snapshotting techniques enabling access to and querying of previous LLM states, which in turn contributes to the robustness and reliability of the system in various use cases.

Optimization and Security Assurance:

LLM state management significantly impacts system performance and resource utilization. Optimizing elements such as memory hierarchies, storage systems, and processing resources can significantly enhance LLM performance and the overall scalability of the TStreamLLM. The balance between resource utilization and system performance should remain a priority. Security and privacy constitute another critical facet of LLM state management, as they prevent unauthorized access and protect the model from potential damage. Future research should concentrate on devising privacy-preserving techniques for data processing and LLM adaptation, such as federated learning [23] and differential privacy [2]. These methods could protect sensitive data while enabling LLMs to learn from various data sources, thereby ensuring the TStreamLLM framework complies with privacy standards without compromising its learning capabilities.

5 Conclusion

In this paper, we introduce a novel perspective on merging transactional stream processing with LLMs management, setting forth a promising research trajectory. We outline key challenges and potential solutions in areas such as scalable stream processing, real-time adaptation and learning, streaming transaction processing, and LLM state management. This integration aims to solve challenges related to data selection, continual learning, and efficient model training in a high-velocity data stream environment. We propose new strategies for transactional model updates, emphasizing concurrent and predictive model training to mitigate system latency and conflict resolution issues. We emphasize the necessity to respect ACID properties and tackle potential service conflicts in high-load applications. We also spotlight the importance of fault tolerance and system reliability for the TStreamLLM framework to handle high-volume data processed by LLMs effectively. Our vision presents the possibility of revolutionizing the management of LLMs. By addressing the open challenges we’ve outlined, we hope to inspire further innovation, leading to the development of robust, efficient, and scalable solutions in this rapidly evolving field.

References

  • [1] R. Aljundi, M. Lin, B. Goujaud, and Y. Bengio. Gradient based sample selection for online continual learning. Advances in neural information processing systems, 32, 2019.
  • [2] E. Bao, Y. Yang, X. Xiao, and B. Ding. Cgm: An enhanced mechanism for streaming data collection with local differential privacy. Proc. VLDB Endow., 14(11):2258–2270, jul 2021.
  • [3] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. arXiv preprint arXiv:2005.14165, 2020.
  • [4] M. Davies, N. Srinivasa, T.-H. Lin, G. Chinya, Y. Cao, S. H. Choday, G. Dimou, P. Joshi, N. Imam, S. Jain, et al. Loihi: A neuromorphic manycore processor with on-chip learning. In 2018 IEEE International Solid - State Circuits Conference (ISSCC), pages 50–51. IEEE, 2018.
  • [5] J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2019.
  • [6] S. Djahel, R. Doolan, G.-M. Muntean, and J. Murphy. A communications-oriented perspective on traffic management systems for smart cities: Challenges and innovative approaches. IEEE Communications Surveys & Tutorials, 17(1):125–151, 2014.
  • [7] F. Douglis and A. Iyengar. Application-specific delta-encoding via resemblance detection. In USENIX annual technical conference, general track, pages 113–126. San Antonio, TX, USA, 2003.
  • [8] J. Gama, R. Sebastiao, and P. P. Rodrigues. On evaluating stream learning algorithms. Machine learning, 90:317–346, 2013.
  • [9] Y. Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, and H. Poon. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23, 2021.
  • [10] R. Guo, X. Luan, L. Xiang, X. Yan, X. Yi, J. Luo, Q. Cheng, W. Xu, J. Luo, F. Liu, et al. Manu: a cloud native vector database management system. arXiv preprint arXiv:2206.13843, 2022.
  • [11] S. Gururangan, M. Lewis, A. Holtzman, N. A. Smith, and L. Zettlemoyer. DEMix layers: Disentangling domains for modular language modeling. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5557–5576, Seattle, United States, July 2022. Association for Computational Linguistics.
  • [12] K. Guu, K. Lee, Z. Tung, P. Pasupat, and M.-W. Chang. Realm: Retrieval-augmented language model pre-training, 2020.
  • [13] S. Han, J. Pool, J. Tran, and W. Dally. Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pages 1135–1143, 2015.
  • [14] P. Henderson, J. Hu, J. Romoff, E. Brunskill, D. Jurafsky, and M. Mitchell. Towards the systematic reporting of the energy and carbon footprints of machine learning. arXiv preprint arXiv:2002.05651, 2020.
  • [15] G. Hinton, O. Vinyals, and J. Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015.
  • [16] M. Hirzel and R. Soule. A catalog of stream processing optimizations. In ACM Computing Surveys (CSUR), volume 46, pages 1–34. ACM, 2014.
  • [17] Q. Huang and P. P. C. Lee. Toward high-performance distributed stream processing via approximate fault tolerance. Proc. VLDB Endow., 10(3):73–84, nov 2016.
  • [18] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio. Quantized neural networks: Training neural networks with low precision weights and activations. The Journal of Machine Learning Research, 18(1):6869–6898, 2017.
  • [19] J. Jang, S. Ye, S. Yang, J. Shin, J. Han, G. Kim, S. J. Choi, and M. Seo. Towards continual knowledge learning of language models. In ICLR, 2022.
  • [20] X. Jin, D. Zhang, H. Zhu, W. Xiao, S.-W. Li, X. Wei, A. Arnold, and X. Ren. Lifelong pretraining: Continually adapting language models to emerging corpora. arXiv preprint arXiv:2110.08534, 2021.
  • [21] P. Kakria, N. Tripathi, and P. Kitipawang. A real-time health monitoring system for remote cardiac patients using smartphone and wearable sensors. International journal of telemedicine and applications, 2015:8–8, 2015.
  • [22] M. Kang, E. Park, B. H. Cho, and K.-S. Lee. Recent patient health monitoring platforms incorporating internet of things-enabled smart devices. International neurourology journal, 22(Suppl 2):S76, 2018.
  • [23] Q. Li, Z. Wen, Z. Wu, S. Hu, N. Wang, Y. Li, X. Liu, and B. He. A survey on federated learning systems: vision, hype and reality for data privacy and protection. IEEE Transactions on Knowledge and Data Engineering, 2021.
  • [24] Y. Li, Y. Shen, and L. Chen. Camel: Managing data for efficient stream learning. In Proceedings of the 2022 International Conference on Management of Data, SIGMOD ’22, page 1271–1285, New York, NY, USA, 2022. Association for Computing Machinery.
  • [25] K. Luu, D. Khashabi, S. Gururangan, K. Mandyam, and N. A. Smith. Time waits for no one! analysis and challenges of temporal misalignment. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5944–5958, Seattle, United States, July 2022. Association for Computational Linguistics.
  • [26] B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, et al. Language models are few-shot learners, 2020.
  • [27] Y. Mao, J. Zhao, S. Zhang, H. Liu, and V. Markl. Morphstream: Adaptive scheduling for scalable transactional stream processing on multicores. In Proceedings of the 2023 International Conference on Management of Data (SIGMOD), SIGMOD ’23, New York, NY, USA, 2023. Association for Computing Machinery.
  • [28] K. Margatina, S. Wang, Y. Vyas, N. A. John, Y. Benajiba, and M. Ballesteros. Dynamic benchmarking of masked language models on temporal concept drift with multiple views, 2023.
  • [29] P. Micikevicius, S. Narang, J. Alben, G. Diamos, E. Elsen, D. Garcia, B. Ginsburg, M. Houston, O. Kuchaiev, G. Venkatesh, and H. Wu. Mixed precision training, 2018.
  • [30] OpenAI. Gpt-4 technical report, 2023.
  • [31] A. S. Putra and H. L. H. S. Warnars. Intelligent traffic monitoring system (itms) for smart city based on iot monitoring. In 2018 Indonesian Association for Pattern Recognition International Conference (INAPR), pages 161–165. IEEE, 2018.
  • [32] Y. Qin, J. Zhang, Y. Lin, Z. Liu, P. Li, M. Sun, and J. Zhou. Elle: Efficient lifelong pre-training for emerging data. arXiv preprint arXiv:2203.06311, 2022.
  • [33] C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y. Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv preprint arXiv:1910.10683, 2020.
  • [34] S. Rajbhandari, J. Rasley, O. Ruwase, and Y. He. Zero: Memory optimizations toward training trillion parameter models, 2020.
  • [35] S. Schelter, S. Ewen, K. Tzoumas, and V. Markl. ”all roads lead to rome”: Optimistic recovery for distributed iterative data processing. In Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, CIKM ’13, page 1919–1928, New York, NY, USA, 2013. Association for Computing Machinery.
  • [36] E. Strubell, A. Ganesh, and A. McCallum. Energy and policy considerations for deep learning in nlp. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 3645–3650, 2019.
  • [37] N. Tatbul, T. J. Lee, S. Zdonik, and M. Alam. Streaming data integration: Challenges and opportunities. IEEE Data Eng. Bull., 40(3):45–56, 2017.
  • [38] J. Wang, X. Yi, R. Guo, H. Jin, P. Xu, S. Li, X. Wang, X. Guo, C. Li, X. Xu, et al. Milvus: A purpose-built vector data management system. In Proceedings of the 2021 International Conference on Management of Data, pages 2614–2627, 2021.
  • [39] L. Wang, X. Zhang, H. Su, and J. Zhu. A comprehensive survey of continual learning: Theory, method and application, 2023.
  • [40] X. Zeng and S. Zhang. Parallelizing stream compression for iot applications on asymmetric multicores. In 2023 IEEE 39rd International Conference on Data Engineering (ICDE), 2023.
  • [41] S. Zhang, J. He, A. C. Zhou, and B. He. Briskstream: Scaling data stream processing on shared-memory multicore architectures. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD ’19, page 705–722, New York, NY, USA, 2019. Association for Computing Machinery.
  • [42] Z. Zhang, Y. Xu, J. Yang, X. Li, and D. Zhang. A survey of sparse representation: algorithms and applications. IEEE access, 3:490–530, 2015.
  • [43] Z.-H. Zhou. Stream efficient learning, 2023.