This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

\setlength

2em

Personalized Wireless Federated Learning for Large Language Models

Feibo Jiang, Member, IEEE, Li Dong, Siwei Tu, Yubo Peng, Kezhi Wang, Senior Member, IEEE, Kun Yang, Fellow, IEEE, Cunhua Pan, Senior Member, IEEE, Dusit Niyato, Fellow, IEEE
Abstract

Large Language Models (LLMs) have revolutionized natural language processing tasks. However, their deployment in wireless networks still face challenges, i.e., a lack of privacy and security protection mechanisms. Federated Learning (FL) has emerged as a promising approach to address these challenges. Yet, it suffers from issues including inefficient handling with big and heterogeneous data, resource-intensive training, and high communication overhead. To tackle these issues, we first compare different learning stages and their features of LLMs in wireless networks. Next, we introduce two personalized wireless federated fine-tuning methods with low communication overhead, i.e., (1) Personalized Federated Instruction Tuning (PFIT), which employs reinforcement learning to fine-tune local LLMs with diverse reward models to achieve personalization; (2) Personalized Federated Task Tuning (PFTT), which can leverage global adapters and local Low-Rank Adaptations (LoRA) to collaboratively fine-tune local LLMs, where the local LoRAs can be applied to achieve personalization without aggregation. Finally, we perform simulations to demonstrate the effectiveness of the proposed two methods and comprehensively discuss open issues.

Index Terms:
Large Language Model, Personalized Federated Learning, Pre-training, Fine-tuning.

I Introduction

With the development of 6G communications, the application of Artificial Intelligence (AI) in the wireless networks is becoming increasingly important. One of the key features of 6G is the deep integration of AI with wireless networks, which can support more intelligent services and applications. Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) tasks by demonstrating impressive language understanding and generation capabilities, pushing the boundaries of AI research. LLMs can also provide more accurate understanding of user semantics and intentions, thereby offering personalized services to 6G users [1].

However, as LLM scales continue to expand, reaching hundreds of billions or even trillions of parameters, traditional publicly available datasets face challenges in meeting the demands for training future LLMs. In 6G networks, there could be a vast array of mobile devices, potentially accumulating significant volumes of user data. However, concerns regarding data security and information privacy may still prevent the users to share their personal data for the training of LLMs in wireless networks.

To leverage the vast amount of distributed and privatized data for training LLMs, Federated Learning (FL) has been adopted to offer a collaborative learning approach. This approach enables future LLMs to learn from a broader range of data sources while maintaining data security and privacy. However, there are major challenges for training the LLM by wireless FL.

I-1 Big and Heterogeneous Data

LLMs require massive amounts of data from large and diverse data sources to train the model effectively. In wireless FL, the distributed structure of data is a critical challenge, as the data on each mobile device may be highly unbalanced, depending on their backgrounds, preferences, or behaviours. This can lead to slower convergence and poorer performance for training LLMs. Furthermore, diversity of data across mobile devices may introduce complex data-distribution in wireless networks.

I-2 Resource-intensive Training

Training an LLM is a resource-intensive task that demands high computational power and large memory. In wireless FL, the computation is decentralized, where individual devices may need to have sufficient resources to participate in the training process. However, not all contributing devices (such as smartphones or pads) have the necessary computational and storage resources, which can lead to slow training speed or even fail to train the LLM successfully,

I-3 High Communication Overhead

Wireless FL requires frequent communication between a central server and devices to update an LLM, which can incur high bandwidth and latency costs. For LLMs that have billions or even trillions of parameters, this could result in a huge amount of data to be transferred, leading to high communication overhead. Moreover, the communication cost increases with the number of communication rounds, the number of participating devices, and the model size. As such, reducing communication overhead without sacrificing model performance is a significant challenge.

Moreover, the demand for user-centric personalized LLMs has significantly increased. These LLMs have the ability to learn individual preferences and provide tailored results. Personalized Federated Learning (PFL) is an extension of FL that recognizes potential data and resource heterogeneity among different clients and aims to learn a personalized model for each client [2]. Parameter-Efficient Fine-Tuning (PEFT) enables efficient adaptation to client-specific data and tasks by adjusting a minimal number of parameters for the pre-trained LLM, thereby reducing computational resource and communication overhead. Hence, PFL combined with PEFT can overcome the aforementioned challenges associated with training LLMs by FL.

Unlike FL for generative models such as generative adversarial networks and diffusion models, LLMs require consideration of alignment with human values and interaction with external knowledge during learning. Hence, in the paper, we first introduce and compare different learning stages and their features of LLMs in wireless networks. We then summarize potential solutions of FL for various fine-tuning methods. Finally, we highlight advantages of PFL in fine-tuning LLM and propose two approaches for personalized wireless federated fine-tuning with low communication overhead as follows:

  • Personalized Federated Instruction Tuning (PFIT): We focus on Reinforcement Learning from Human Feedback (RLHF) and design two reward models to represent the helpfulness and safety by human feedback. By linearly combining the two reward models across different clients, personalized local models are obtained. Sparse self-attention is also employed to reduce communication overhead and accelerate the training speed of the federated instruction tuning.

  • Personalized Federated Task Tuning (PFTT): We combine two PEFT approaches, namely adapter and Low-Rank Adaptation (LoRA), to reduce communication overhead and accelerate the federated task fine-tuning. The adapter parameters of clients are sent to the server for global aggregation, while LoRA’s parameters are retained at the clients to maintain the personalization of local models.

The remainder of this paper can be organized as follows. Section II describes the harmonizing of LLM and FL in wireless networks. Section III provides some potential solutions of federated fine-tuning for LLMs. Section IV details the proposed personalized wireless federated fine-tuning schemes. Section V shows the simulation results. Section VI presents open issues and Section VII concludes this paper.

II Harmonizing LLM and FL in Wireless Networks

Unlike traditional deep learning, LLMs have a three-stage learning process: pre-training, fine-tuning and retrieval-augmented generation (RAG). As shown in Fig. 1, the pre-training stage provides a foundation of general language understanding, while the fine-tuning stage adapts this understanding to specific tasks or goals, and the RAG stage enhances this understanding and improves the accuracy of answers by retrieving information from external data sources. All learning stages of LLMs are also summarized and compared in Table I.

Refer to caption
Figure 1: Illustration of the three-stage learning of LLMs in wireless networks.

II-A Pre-training Stage of LLM

In the pre-training stage, an LLM is deployed on a server and undergoes self-supervised learning using a substantial volume of unlabelled data. Numerous client devices transmit their unlabelled data that does not contain sensitive information to the server for learning purposes over the wireless network. The goal of pre-training is to teach patterns, grammar, semantics, and world knowledge to LLMs by predicting missing or masked words in sentences. For instance, in the BERT model, it is done using a technique called Masked Language Modelling (MLM), where a certain percentage of words in a sentence are randomly replaced with a special mask token, and the model has to predict the original word [3]. During the pre-training stage, the LLM learns to capture contextual relationships and build a general understanding of language. A large number of parameters in the model enables it to encode a vast amount of information from the training corpus, resulting in a rich language representation.

However, FL may not be necessary for the pre-training stage of LLMs due to the following reasons:

  • High Resource Requirements: Pre-training of LLMs involves adjusting all their parameters, which requires significant computational resources. In FL, the model is trained across many devices or servers, with each device obtaining updates for the model based on its computation and storage resource [4]. Additionally, this distributed computation increases the communication overhead, as the update from each device needs to be aggregated to obtain the global model. Centralized training, on the other hand, can leverage powerful servers and optimized infrastructure to train LLMs more efficiently.

  • Privacy Concerns: Pre-training of LLMs typically involves using a large, publicly available corpus of text, such as books, websites, and other online resources. Since this data is already publicly available and does not contain sensitive personal information, there are no data privacy concerns that necessitate the use of FL. FL is more applicable in scenarios where data is sensitive and decentralized, such as in healthcare or personal mobile devices, where privacy regulations or concerns prevent the data from being shared directly.

II-B Fine-tuning Stage of LLM

Once the LLM completes pre-training on the server side, it undergoes further fine-tuning on a smaller, more specific dataset. This dataset typically consists of local private data. The clients download the pre-trained parameters of the LLM from the server and perform fine-tuning on an extremely small subset of parameters to enhance the performance of the LLM. To ensure security and privacy of local data, the fine-tuning process is conducted locally on the client side, and the updated subset of parameters are transmitted back to the server through wireless networks. This local dataset is often labelled, meaning that it comes with correct answers that the LLM should learn to predict. Fine-tuning requires less computational resources and smaller amount of data compared to pre-training. Fine-tuning of LLMs can be categorized into the following types:

II-B1 Instruction Tuning

Instruction tuning is a strategy for fine tuning LLMs on a combination of task instructions so that the LLM can generate the correct output based on the instructions. Instruction tuning uses natural language instructions as inputs to query the LLM and guide its output. The instructions consist of sequences that pair instructions with corresponding examples, which can provide explicit and precise guidance for the LLM to generate texts that are consistent with the user’s intent and the data source. The goal of instruction learning is to improve the comprehension and generalization of LLMs on unseen tasks, as well as their helpfulness and safety.

II-B2 Task Tuning

Once the LLM has been pre-trained on the unlabelled data, it is further fine-tuned on specific downstream tasks using labelled data. Task tuning involves training the pre-trained model on a smaller task-specific dataset that is annotated with labels or target outputs. During the task tuning, the pre-trained model is adapted to the specific task by updating its parameters based on the labelled task data. The objective is to fine-tune the model’s representations and weights to better align with the target task’s objectives and improve its performance. The task tuning process helps the model generalize its pre-learned knowledge to the specific task, making it more task-specific and accurate.

Pre-training requires massive resources and utilizes publicly available data, and therefore it may not be suitable for FL. However, FL could still be valuable in the fine-tuning stage where privacy is a concern or when the tuning data is inherently decentralized. The reasons are listed as follows:

  • Low Resource Requirements: Fine-tuning is relatively efficient as it requires less data and computational resources for adjusting a small subset of parameters of LLMs compared to the pre-training, making it feasible to train LLMs even on resource-constrained devices.

  • Data Privacy Protection: Fine-tuning often involves specific tuning data that could be sensitive. For example, the LLM might be fine-tuned on users’ interactions with a digital assistant, which could contain personal information. FL allows this fine-tuning to happen on the users’ own devices, ensuring that personal information remains secure and confidential.

II-C Retrieval-Augmented Generation Stage of LLM

RAG is an approach that combines LLMs with information retrieval techniques to enhance performance of LLMs[5].

Due to the enormous training cost of LLMs, the learned knowledge has a temporal lag. For instance, the training data for GPT-3.5 is current up until January 2022, implying that it lacks knowledge of any events transpiring after January 2022. During the generation process, instead of relying solely on the pre-trained or fine-tuned model’s knowledge, retrieval mechanisms are employed to retrieve the latest relevant information from external sources such as Internet or local knowledge bases. These knowledge bases are often deployed at the edge, and clients retrieve the latest local information by querying knowledge bases. The retrieved local information, along with the client’s request, is then sent to the server-side LLM by wireless networks. Moreover, LLMs can leverage the Internet to retrieve the latest relevant public information. Then, this retrieved information is incorporated into the generated output of the LLM, ensuring that the generated content is contextually relevant and factually accurate.

However, RAG may also not be suitable for FL due to the following reasons:

  • Additional Data Exposure: Sharing sensitive retrieval queries or accessing external resources across multiple clients may compromise the privacy and security of the distributed data.

  • No Weight Update: RAG enhances the performance of LLMs by incorporating the retrieved data into the prompt for in-context learning. It does not require updating parameters of LLMs, thus eliminating the need for local gradient descent in FL.

LLMs not only need to align with human values during the learning process, but also interact with external knowledge, which represents a more complex learning paradigm. Due to the massive amount of insensitive data and extensive model parameters involved in the pre-training stage, it is more suitable for centralized cloud-based learning. In contrast, during the RAG stage, only local data embedding and querying are required, making it more suitable for local execution. Therefore, we primarily focus on designing PFLs for the fine-tuning stage.

TABLE I: Comparison of the three-stages learning of LLMs.
Objective Data Requirement Learning Approach Adjusting Parameters Privacy Resource Requirements
Pre-training Learning general language representation Abundant and diverse datasets Unsupervised learning All parameters Require public dataset High computational and storage resources
Instruction tuning Generalization for many tasks Instruction data Supervised learning or Reinforcement learning Partial parameters (5%-10%) May require user instruction Medium computational and storage resources
Task tuning Personalization for specific tasks Task-specific data Supervised learning Few parameters (1%-2%) May require task data Low computational and storage resources
Retrieval-augmented generation Enhance output with relevant information Latest external data Retrieval No parameters Require up-to-date public or user data Low computational and high storage resources

III Potential solutions of wireless federated fine-tuning for LLMs

In this section, we present key techniques and potential solutions of federated fine-tuning for LLMs.

III-A Federated Instruction Tuning

Federated instruction tuning involves FL for instruction tuning of LLMs [4], which necessitates each client to possess sufficient computational resources to fine-tune a larger number of parameters compared with task fine-tuning methods. Additionally, it requires transmitting more model parameters over the network, which can be bandwidth-intensive and time-consuming. Furthermore, it is challenging to define a clear, algorithmic loss to measure the quality of the LLM’s output for local instructions. The following techniques hold the potential to address these challenges.

III-A1 Sparse Attention

In traditional attention mechanisms, every token (word) attends to every other token, resulting in a quadratic complexity with respect to the input sequence length. However, in scenarios where the input sequence is long, this approach becomes computationally expensive and memory-intensive. Sparse attention addresses this issue by introducing sparsity patterns, allowing tokens to attend only to a subset of other tokens rather than attending to all tokens [6]. This reduces the computational and memory requirements, and communication overhead while preserving the LLM’s ability to capture relevant dependencies between tokens in FL. The sparsity pattern can be learned during the instruction tuning, enabling the LLM to dynamically determine the subset of attention parameters based on the wireless environment.

III-A2 RLHF

RLHF is a Reinforcement Learning (RL)-based fine-tuning technique that improves the performance of LLMs by incorporating human-generated feedback [7] for difficult-to-assess objectives. This method involves a three-step process. Initially, an LLM is fine-tuned using supervised instruction learning, drawing from human-generated instructions or demonstrations. Next, a question and several model outputs are sampled. Human evaluators then rank these outputs from best to worst, and the ranked data is used to train a new reward model. Finally, the reward model calculates the reward for the LLM’s output, and the estimated reward is utilized to update the policy through RL. In FL, we can only adjust the parameters that are used to compute the generation probability distribution. These parameters control the probabilities of selecting different words or tokens during the generation process, which can influence the output of the LLM to better align with the expectations conveyed by human feedback (e.g., to fine-tune only the last few layers of LLMs).

III-B Federated Task Tuning

Federated task tuning employs FL for fine-tuning downstream tasks of LLMs [8], which is relatively efficient as it requires less data and computational resources of cilents than that of the instruction tuning. It allows for task-specific adaptation but also risks overfitting or forgetting the original knowledge of local LLMs. PEFT is a set of techniques designed to adapt LLMs to specific downstream tasks with minimal changes to the original parameters. The following PEFT methods can help mitigate overfitting and catastrophic forgetting in LLMs.

III-B1 Adapter

Adapter tuning is introduced to add small, task-specific adapter modules to a pre-trained LLM [9]. These adapter modules are inserted between the layers of the pre-trained LLM. They typically consist of a bottleneck structure: a down-projection, a non-linearity, and an up-projection. During fine-tuning, only the parameters of these adapter modules are updated, while the parameters of the pre-trained LLM are kept fixed. In FL, this approach significantly reduces the number of parameters that need to be updated in each client, while still allowing the LLM to adapt to the specific task with fewer resources. Moreover, when adaptation to wireless channel quality, we can define the dimensions of adapters adaptively, thereby dynamically adjusting the communication overhead for transmission in wireless channels.

III-B2 LoRA

LoRA adapts a pre-trained LLM to a new task by applying low-rank matrix decomposition to the model parameters [8]. LoRA decomposes the original weight matrices into the product of two smaller matrices, and only updates the smaller matrices during fine-tuning. In FL, LoRA can reduce the number of parameters that need to be updated, thereby decreasing the communication and computational cost on the client. By preserving historical low-rank matrices, LoRA can prevent catastrophic forgetting of local models, while preserving its generation and generalization abilities.

Federated instruction tuning deepens the understanding of queries by learning from human instructions, while federated task tuning enhances the performance of downstream tasks by learning from different task data. Both approaches strengthen the distributed learning performance of LLMs from different perspectives.

IV Personalized federated fine-tuning

IV-A Current Research Progress

There have been several joint research efforts on wireless FL and LLMs. The authors in [3] combined split learning and FL to pretrain the BERT model. Reference [4] proposed a federated instruction learning called Shepherd, which utilizes LoRA for fine-tuning LLMs through instruction data. Similarly, the authors in [8] presented a low-parameter federated learning based on LoRA for task fine-tuning of LLMs. Reference [10] addressed co-tuning and off Site-tuning of LLMs through a comprehensive FL open-source framework called Fate-LLM. However, all of these studies aim to train a unified LLM in a distributed manner, overlooking device variations, user preferences, and characteristics. As a result, they fail to achieve device-adaptive and user-centric LLMs.

IV-B Advantages of Personalized Federated Fine-tuning.

PFL allows for designing personalized LLMs that can adapt to the data of individual clients over wireless networks, which may improve user satisfaction and engagement levels. Advantages of applying PFL to LLMs include:

IV-B1 Personalized User Data

PFL allows personalized learning from various user data, which can be beneficial in fine-tuning LLMs on non-identically and independently distributed (non-IID) data. PFL can learn a personalized LLM for each client that is tailored to its own data distribution, enabling the LLM to learn from a wider range of personalized contexts, features and patterns. This can lead to improved understanding and generation capabilities of LLMs for user private data.

IV-B2 Customized Local Model

By allowing each client to have its own personalized LLM, PFL enables customization of the local tuning process to each device’s preference and constraints. This can lead to better suitability to personal computational, storage, and communication resources and improved model performance for all clients.

IV-B3 Specific Communication Process

With PFL, the global aggregation of LLMs can be tailored to each client’s preference and requirement, avoiding unnecessary updates that may not be relevant to them. This reduces communication costs, making the fine-tuning process more efficient.

Therefore, PFL provides flexibility in balancing the trade-off between learning shared knowledge and personalized knowledge for LLMs.

IV-C Personalized Federated Instruction Tuning

We propose a PFIT method based on RL, in which each client has personalized requirements for helpfulness and safety of the LLM. Helpfulness emphasizes the quality and accuracy of generated content, such as grammatical correctness, logical coherence, and relevance of the responses. Safety, on the other hand, emphasizes the legality and ethicality of the generated content, such as the absence of sensitive or harmful information. For example, in Fig. 2, the LLM with high helpfulness would provide answers to user questions regarding employee information, including sensitive details. However, the LLM with high safety levels would refrain from answering sensitive employee information, prioritizing user privacy and data protection. To achieve PFIT, we introduce three key innovations:

  • Double Reward Model: We define two reward models to evaluate the helpfulness and safety of local instruction responses. The quality reward for different clients’ outputs is obtained by linearly weighting these two reward models, enabling personalized instruction fine-tuning of the LLM for different clients’ requirements.

  • Personalized Reward Function: We design a personalized reward function that includes both the quality reward from two reward models and a negative regularization reward for global knowledge. The regularization reward is based on the Euclidean distance between local model parameters and global model parameters, promoting the knowledge sharing among clients in the PFL system.

  • Sparse Attention Update: To encourage lightweight devices to participate in PFL, we only train the last two layers of the LLM. Additionally, we adopt a sparse attention mechanism to further reduce the computation complexity and communication overhead in self-attention layers during the fine-tuning phase of the LLM.

The specific workflow of the PFIT is as follows:

Step 1: Initialize the pre-trained LLM as the global model on the server and freeze the earlier parts of the LLM.

Step 2: Each client uses the global LLM as the initial local LLM, sets a personalized reward function (In the red dashed box of Fig. 2), and selects its own instruction data for local learning.

Step 3: Based on the current local LLM, the client calculates the regularization reward related to the global LLM, evaluates the helpfulness and safety of responses, computes the quality reward for instructions, and then updates the unfrozen part of the local LLM using the PPO algorithm [11] according to the personalize reward function, including both quality reward and regularization reward.

Step 4: The server aggregates the sparse tunable layers (the unfrozen parts of LLMs) from all clients, obtains the updated global LLM, and sends the global LLM (the unfrozen part) to all clients.

Step 5: Steps 3-4 are repeated until the convergence criteria of the PFL system are met.

Refer to caption
Figure 2: The illustration of the proposed PFIT based on RL.

IV-D Personalized Federated Task Tuning

We propose a PFTT method based on PEFT, in which a set of clients have different task goals, and each client possesses a set of non-IID task data. For instance, in Fig. 3, different clients are tasked with film classification, but the distribution of labelled data varies across different clients. Some clients may have a higher proportion of science fiction and realistic film labels, while others may have a larger number of comedy and tragedy film labels. As a result, the LLM will excel at providing personalized movie classifications based on the local distribution of labelled data available to it. To achieve PFTT, we introduce three key innovations:

  • Universal Adapter: We incorporate adapters into the global pre-trained LLM to share task knowledge among different clients.

  • Local LoRA: We introduce LoRA in the local LLMs to enable personalized local knowledge learning. The size of LoRA can be adjusted based on the data volume or computational resources available on each client.

  • Partial Aggregation: During global aggregation, only the adapter parameters are aggregated for global knowledge sharing, while the LoRA parameters are not aggregated for maintaining personalization. This approach enables personalized task tuning in heterogeneous devices.

The specific workflow of the PFTT is as follows:

Step 1: Initialize the pre-trained LLM as the global LLM on the server and insert adapters to the LLM.

Step 2: Each client uses the global LLM as the initial local LLM and designs local LoRA parameters based on the data volume or computational resource of the local LLM.

Step 3: Based on the current global LLM and local LoRA parameters, the client fine-tunes the LLM using local task data and updates the adapter and LoRA parameters over the wireless networks.

Step 4: The server aggregates the adapter parameters from all clients, obtains the updated global adapter parameters, and sends them to the clients.

Step 5: Steps 3-4 are repeated until the convergence criteria of the system are met.

Through PFL, both fine-tuning methods can achieve better performance for personalized LLMs.

Refer to caption
Figure 3: The illustration of the proposed PFTT based on PEFT.

V Simulation Results

This section provides two experiments for demonstrating the effectiveness of the proposed PFIT and PFTT.

V-A Problem Formulation

We consider a PFL system for LLMs. Suppose the system consists of four clients and one server, where each client possesses different local data and model preferences. The server has sufficient resources. The PFL system aims to achieve personalized fine-tuning of the local LLMs for all clients in wireless networks with Rayleigh channel, the communication round is set to 40 and the SNR is set to 5dB.

V-B Simulation Settings

V-B1 Settings for PFIT

We first employ the Alpaca dataset [12] to evaluate the validity of the proposed PFIT scheme. Secondly, we use GPT-2 [13] as the local LLM with 40% sparse attention and adopt PPO as the local RL algorithm. During the fine-tuning process, we sample instructions from the dataset to generate the corresponding responses from GPT-2. Then, we incorporate the “instruction+response” into two reward models, with one model assessing the helpfulness score of the response and the other evaluating its safety score. We then utilize the reward score (i.e., helpfulness score plus safety score) and communication cost (i.e., the size of parameters to be aggregated) per round as evaluation metrics.

V-B2 Settings for PFTT

We first employ the AG’s news corpus [14] as the evaluation dataset for PFTT. Additionally, we adopt a Dirichlet distribution to facilitate a non-IID data partition among clients. Next, we utilize RoBERTa [15] as the local LLM, which is an improved BERT model pre-trained on a substantial corpus of English data in a self-supervised manner. We use 12 universal adapters for each client to exchange information among different clients. Subsequently, each client incorporates 10-12 local LoRAs, based on their local resources, to achieve local model personalization. Lastly, since the local LLM is responsible for sentence classification, we use the classification accuracy and communication (i.e., communication cost divided by transmission rate) delay per round as evaluation metrics.

V-C Evaluation Results

Fig. 4 presents the evaluation results for PFIT and its contenders, where SFL means to a fine-tuning method that uses only a single reward model (helpfulness metric) and incorporates 20% sparse attention. PFL, on the other hand, represents personalized fine-tuning without the sparse attention. Shepherd is an FL method that employs LoRA for instruction fine-tuning [4]. We can see that PFIT enables the local model to obtain the highest rewards than SFL and Shepherd with the single reward model. Moreover, compared to PFL that does not use sparse attention, PFIT reduces communication overhead by 20%. Shepherd, utilizing LoRA for instruction fine-tuning, results in the lowest communication overhead. However, this approach also affects the LLM’s performance, resulting in lower rewards compared to PFIT.

Refer to caption
Figure 4: Reward and communication cost for PFIT and its contenders.

Fig. 5 presents the evaluation results for PFTT and its benchmarks. In vanilla FL [1], the parameters of both adapters and LoRAs all need to be uploaded. FedBert [3] is a federal split learning method, and FedLora [8] is a federated fine-tuning method that exclusively incorporates LoRA. The results indicate that PFTT achieves the highest accuracy, which highlights the effectiveness of the LoRA-based personalized structure. Similarly, due to the fact that PFTT only requires the transmission of a part of fine-tuned parameters (universal adapters), it incurs the minimum communication overhead compared to other methods.

Refer to caption
Figure 5: Accuracy and communication cost for PFTT and its contenders.

VI Open Issues

VI-1 Wireless Aggregation and Divergence

In PFL, multiple participants collaborate to train a shared LLM. However, due to the possibility of signal quality fluctuations in wireless networks, mobile devices may experience communication interruptions and data loss. In the aggregation, these instabilities can lead to model divergence, where participants have different contributions to model updates. Addressing the challenges of model aggregation and divergence in wireless networks requires asynchronous model aggregation strategies and fair client selection mechanisms to ensure the model effectively incorporates contributions from all participants while balancing the differences to ensure the reliability of model updates.

VI-2 Personalization and Overfitting

Personalization is one of the core objectives in PFL, aiming to customize the shared LLM to meet specific needs of each client. However, the introduction of personalization can potentially lead to overfitting problems. If the personalized requirements are overly detailed, the LLM may overfit to the data of specific clients, resulting in degraded performance on other clients or tasks. Resolving the challenges of personalization and overfitting necessitates appropriate regularization and control during the fine-tuning process to strike a balance between the degree of personalization and the LLM’s generalization.

VI-3 Communication Efficiency and Model Accuracy

PFL involves communication and collaboration among multiple participants. Communication overhead can be a significant challenge, particularly in scenarios with LLMs and numerous participants. Frequent communication can increase communication latency and resource consumption. Moreover, the unreliability or instability of communication can result in the loss or delay of model updates, which can have a direct impact on the accuracy and performance of the LLM. Addressing the issue requires designing efficient communication protocols and strategies to reduce communication overhead while ensuring reliable transmission of data and model parameters.

VII Conclusion

In this paper, we have first summarized three learning stages of LLMs and discussed potential solutions for combining FL with LLMs. Next, we have proposed two PFLs for different fine-tuning methods. Specifically, we introduced PFIT for instruction fine-tuning of LLMs based on client preferences. We then designed two reward models based on helpfulness and safety, and we used RLHF to fine-tune the LLM based on diverse combinations of reward models from clients. Furthermore, we proposed PFTT to fine-tune downstream classification tasks based on locally non-IID data. In PFTT, we used global adapters to enable information exchange between devices and incorporated locals LoRA to customize the local LLM. Finally, we carried out simulations to validate the effectiveness of these proposed methods.

References

  • [1] Q. Duan et al., “Combining federated learning and edge computing toward ubiquitous intelligence in 6g network: Challenges, recent advances, and future directions,” IEEE Communications Surveys and Tutorials, vol. 25, no. 4, pp. 2892–2950, 2023.
  • [2] A. Z. Tan et al., “Towards personalized federated learning,” IEEE Transactions on Neural Networks and Learning Systems, vol. 34, no. 12, pp. 9587–9603, 2023.
  • [3] Y. Tian et al., “Fedbert: When federated learning meets pre-training,” ACM Transactions on Intelligent Systems and Technology (TIST), vol. 13, no. 4, pp. 1–26, 2022.
  • [4] J. Zhang et al., “Towards building the federated gpt: Federated instruction tuning,” arXiv preprint arXiv:2305.05644, 2023.
  • [5] F. Jiang et al., “Large language model enhanced multi-agent systems for 6g communications,” arXiv preprint arXiv:2312.07850, 2023.
  • [6] A. Roy et al., “Efficient content-based sparse attention with routing transformers,” Transactions of the Association for Computational Linguistics, vol. 9, pp. 53–68, 2021.
  • [7] T. R. McIntosh et al., “The inadequacy of reinforcement learning from human feedback-radicalizing large language models via semantic vulnerabilities,” IEEE Transactions on Cognitive and Developmental Systems, 2024.
  • [8] J. Jiang, X. Liu, and C. Fan, “Low-parameter federated learning with large language models,” arXiv preprint arXiv:2307.13896, 2023.
  • [9] N. Ding et al., “Parameter-efficient fine-tuning of large-scale pre-trained language models,” Nature Machine Intelligence, vol. 5, no. 3, pp. 220–235, 2023.
  • [10] T. Fan et al., “Fate-llm: A industrial grade federated learning framework for large language models,” arXiv preprint arXiv:2310.10049, 2023.
  • [11] Y. Gu et al., “Proximal policy optimization with policy feedback,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 7, pp. 4600–4610, 2021.
  • [12] Y. Wang et al., “Self-instruct: Aligning language model with self generated instructions,” arXiv preprint arXiv:2212.10560, 2022.
  • [13] A. Radford et al., “Language models are unsupervised multitask learners,” OpenAI blog, vol. 1, no. 8, p. 9, 2019.
  • [14] X. Zhang, J. Zhao, and Y. LeCun, “Character-level convolutional networks for text classification,” in Advances in Neural Information Processing Systems, vol. 28.   Curran Associates, Inc., 2015.
  • [15] Y. Liu et al., “Roberta: A robustly optimized bert pretraining approach,” arXiv preprint arXiv:1907.11692, 2019.

Biographies

Feibo Jiang ([email protected]) received Ph.D. degree from the Central South University, China. He is currently an Associate Professor at Hunan Normal University, China.

Li Dong ([email protected]) received Ph.D. degree from the Central South University, China. She is currently an Associate Professor at Hunan University of Technology and Business, China.

Siwei Tu ([email protected]) is currently pursuing the master’s degree with Hunan Normal University, China.

Yubo Peng ([email protected]) is currently pursuing the master’s degree with Hunan Normal University, China.

Kezhi Wang ([email protected]) received Ph.D. degree from University of Warwick, U.K. in 2015. Currently he is a Senior Lecturer with the Department of Computer Science, Brunel University London, U.K.

Kun Yang ([email protected]) received his PhD from the Department of Electronic & Electrical Engineering of University College London (UCL), UK. He is currently a Chair Professor in the School of Computer Science & Electronic Engineering, University of Essex, UK. He is also an affiliated professor at UESTC, China.

Cunhua Pan ([email protected]) received Ph.D. degrees from Southeast University, China, in 2015. He held a post-doctoral position at Queen Mary University of London, U.K., from 2016 and 2019. From 2019 to 2021, he was a Lecturer in the same university. From 2021, he is a full professor in Southeast University, China.

Dusit Niyato ([email protected]) received the Ph.D. degree in electrical and computer engineering from the University of Manitoba in Canada in 2008. He is a professor in the College of Computing and Data Science, Nanyang Technological University, 639798 Singapore.