ReFormer: Generating Radio Fakes for Data Augmentation
Abstract
We present ReFormer, a generative AI (GAI) model that can efficiently generate synthetic radio-frequency (RF) data, or RF fakes, statistically similar to the data it was trained on, or with modified statistics, in order to augment datasets collected in real-world experiments. For applications like this, adaptability and scalability are important issues. This is why ReFormer leverages transformer-based autoregressive generation, trained on learned discrete representations of RF signals. By using prompts, such GAI can be made to generate the data which complies with specific constraints or conditions, particularly useful for training channel estimation and modeling. It may also leverage the data from a source system to generate training data for a target system. We show how different transformer architectures and other design choices affect the quality of generated RF fakes, evaluated using metrics such as precision and recall, classification accuracy and signal constellation diagrams.
Index Terms:
ReFormer, VQ-VAE, neural radio synthesis, RF-decoder-only transformerI Introduction
Generative AI models (GAI) are the key technology for advancing modern wireless communication systems. They offer novel solutions for tasks such as signal generation, channel modeling, and system optimization. The most popular among generative models for wireless communications are diffusion models, demonstrating superior performance in various applications. Diffusion models are a class of probabilistic generative models that generate data by reversing a gradual noising process. Learning the iterative reverse process, called ”denoising”, allows for generating signals starting from pure noise. Researchers have proposed using denoising diffusion probabilistic models (DDPMs) to enhance communication systems with non-ideal transceivers and complex channels [1, 2]. These models have shown significant improvements in bit error rates, especially in low signal-to-noise ratio conditions, by effectively denoising received signals. DDPMs have been also used to probabilistically shape constellation symbols [3], optimizing the probability of symbol occurrence to enhance information rates and communication performance. This approach allows for adaptive signal design, aligning transmitted signals more closely with varying channel conditions. As the real-world radio frequency (RF) data is scarce, GAI can be used to create large amounts of additional data for training machine learning models, including the ones mentioned above. Augmented data improves model performance in diverse scenarios. DDPMs are highly computationally inefficient for generative synthesis of large RF datasets, as the denoising process goes through iterations in both the training and the generation phase.
Generative Adversarial Networks (GANs) have also been utilized for synthesizing realistic radio signals [4] and channel estimation [5, 6]. For instance, the ”Radio GAN” framework [4] employs an unrolled generator design combined with previously estimated pure signal distribution as a prior. To synthesize high-quality radio signals, this method learns transmitter characteristics and various channel effects through modeling the underlying sampling distribution. The drawback of this approach is that GANs are extremely difficult to train, which complicates any ablation analysis and adaptations. Given the complexity and diversity of multi-antenna systems and other Next-Generation network (NextG) scenarios, replacing diffusion and GAN-based GAI models by transformer-based architectures may be a good solution for scalability in generating RF fakes. Adaptation through prompting is another attractive feature. In this paper we present a simple and tractable generative model named ReFormer, which creates RF signal fakes sampled from the learned distribution of the training signals captured by a decoder-only transformer. Section II introduces deep learning architectures, methods and data utilized to train the ReFormer GAI. Section III describes methods of evaluation. Section IV presents results of evaluation. We show how different transformer architectures and other design choices affect the quality of generated RF fakes, evaluated using metrics such as precision and recall, classification accuracy and signal constellation diagrams.
II System Model
We aim to train a generative model to produce RF signals whose probability distribution closely matches probability distribution of the RF datapoints presented in the training dataset. In the big picture, this will allow us to later include the promts which characterize desired variations in according to a conditional distribution Such an ability will be essential for generating massive amounts of diverse datapoints for the augmentation of RF datasets, which is very important in the field of RF machine learning given the difficulties in collecting this type of data by sensing the RF spectrum.
II-A Data
Our dataset comprises of datapoints which represent sampled RF signals, of the types used in modern wireless communications. Each datapoint arises from a specifically modulated sequence of random information bits, converted to a baseband RF signal. The datapoint can be represented as with . Here, a modulated signal is obtained as , where is the employed modulation scheme, with denoting the finite set of available digital modulation schemes. In this paper,
For any , describes the modulation function of modulation class . The random sequence of bits of length is encoded into a sequence of complex valued numbers of length where the complex sample encodes the modulation phase and amplitude We create datapoints as sub-sequences of of length We prepared the training dataset by using an open-source library torchsig featured in [7]. contains RF samples of high SNR, for both simplicity and future utility in creating various signal augmentations controlled by a prompt. The torchsig library function ComplexTo2D is used to transform vectors of complex-valued numbers into 2-channel datapoints, traditionally referred to as I and Q components. Each channel is comprised of real numbers, previously normalized. Channel 1 contains real components and channel 2 contains the imaginary ones Depending on the modulation, contains more or less mappings of the original random sequence of bits contains diverse modulation orders (how many original bits are represented by a single complex value). By using we may be able to better train the proposed generative model. However, this would require more complex neural net architecture and longer training. Future work will explore the effect of on the quality of the generated RF fakes.
II-B Method
Prior work [8, 9] used a hierarchical VQVAE model to learn several posterior distributions, including in its first level of hierarchy, where is the vector-quantized version of the latent mapped from the RF datapoint by the trained encoder. Now we want to learn the prior which will allow us to sample and generate similar RF fakes. However, to decrease the training complexity of the model which parameterizes this distribution, we choose to leverage for mapping to a discrete space and than learn the prior of Generating the fakes in the discrete latent domain is easier, and we can directly leverage the power of the transformer architecture [10]. Once we obtain a fake we will use the VQVAE decoder to map it back to the original space, resulting in a fake RF datapoint
II-C Learning Discrete Posterior
Using a Vector Quantized Variational Autoencoder model [11] to learn an efficient, discrete, low-dimensional representation of every RF datapoint allows us to create a discrete mapping of The dataset is mapped from the training dataset using the mapping where is the trained VQVAE encoder with parameters and is the vector quantizer based on the trained codebook Equations (1) represent the trained VQVAE model utilized for the above mapping.
(1) |
Subsection II-D will show how we learn the prior of the discrete latent sequences generated from : we learn probability model by training the transformer to learn its autoregressive form . Our VQVAE model has three components: (1) an encoder block which down-samples the input by a factor of 2 in each convolutional channel dimension and produces an output such that the number of channels of the output is equal to the codebook dimension ; we refer to each column-partition of size as a slice resulting in slices (2) a decoder block (3) a vector quantizer , which is applied to the output of amd provides an output to be decoded by
Vector quantization (VQ) is a process which discretizes the latent space (output of ). Each latent slice is discretized by applying the quantization function , which maps it to an element of consisting of codewords, the learnable embedding vectors of the same length . VQ is effectively mapping each slice to an index of the ordered set of codewords, referred to as the codebook The VQ-VAE model aims to learn the optimal codebook vectors through the training process. The decoder block, denoted , is tasked with reconstructing the original input from the mapped codebook vectors. While the most intuitive approach for is to simply perform a nearest neighbor lookup, more advanced techniques such as stochastic codeword selection [12] or heuristics such as exponential moving averages (EMA) [13] can be employed instead. We here present the results obtained using stochastic mapping In the equations (1), is the latent representation of at the output of while is its quantized version.
We are interested in another form of the quantized latent , denoted in Fig.1 by which replaces the codewords quantizing the slices with their indices in the codebook. We refer to those indices as tokens where is the slice indices
II-C1 Training VQVAE
The ability of the VQVAE to faithfully reconstruct an input will suffer a great degree if the codebook is not trained properly [14], leading to mode collapse. Our objective for this model is for the reconstructed output, referred to as , to be as close as possible to the input (or the original) . Thus, we use reconstruction between and : We also need to incentivize proper training of the codebook. Therefore, we introduce 2 other terms to our loss function, known as the quantization loss and the commit loss . measures the degree to which the codebook should be trained with respect to the output of the encoder, while measures the degree to which the encoder should be trained with respect to the codewords (Eqn. (2)).

(2) |
where denotes the stop gradient function. acts as the identity function during the forward pass, while in the backwards pass it produces a -valued partial derivative with respect to all trainable parameters .
As the codebook is more difficult to train than the autoencoder, we multiply by such that the codewords are ”more trainable” than the encoder. As we apply stochastic quantization by sampling the codebook [15] according to the learned discrete posterior
(3) |
the loss also must include the KL divergence between the posterior in (3) and the discrete uniform prior , . Hence, the complete loss function is:
(4) |
II-D Learning Discrete Prior
To learn the prior in the discrete space, we train a decoder-only transformer (DoT) on the dataset mapped from the training dataset using the mapping
The DoT model [10], also known as the autoregressive transformer, is a popular generative model derived from the original encoder-decoder transformer model [16]. The encoder stack is eliminated from this transformer, leaving the decoder stack to learn the causal attention structure in Once trained, the model can perform autoregressive sequence generation. The self-attention mechanism enables the model to predict the next token given all previous tokens. DoT can also include cross-attention, allowing for different creative methods of obtaining the cross-attending context. The training relies on the cross-entropy loss (CE) between the masked indexed output of the VQ-VAE quantizer and its estimate at the output of the transformer (see Fig. 1). The latent fake generation, is an auto-regressive inference. It starts from a random token, and continues generating subsequent tokens of a fake auto-regressively. We trained the transformer using a simple form of cross-attending context by preceding each datapoint with its class token Hence, during the inference, to indicate what class of the latent fake we aim to generate, we start the autoregressive generation with the token Once the latent fake is generated, we map it into its non-indexed version and transform it by the VQ-VAE decoder into a RF fake.
II-E Training the Transformer Models
To train the transformer model, we used two different architectures: the nano-GPT architecture [17] and the MONAI transformer [18]. Both models were trained to learn the autoregressive representation of the discrete latent space , which was generated by the VQ-VAE quantization process.
The discrete latent sequences contain tokens, where each token represents an index of the nearest codeword from the trained VQ-VAE codebook. To prepare the training data for the transformer, we constructed and as follows:
-
•
consists of tokens, where the first token is a class label representing the modulation scheme. The remaining tokens are indices derived from the quantized sequence.
-
•
consists of tokens, representing the indices from the second token onward in the sequence.
II-E1 Training Process
The training process for both the GPT and MONAI transformer models is designed to minimize the CE loss between the predicted indices and the ground truth indices in . The steps involved in the training process are described below:
-
1.
Input Preparation: For each training example, the sequence is fed into the transformer model. The transformer generates a sequence of logits, where each logit corresponds to a prediction (out of N numbers) for the next token in the sequence.
-
2.
Autoregressive Training: The transformer is trained in an autoregressive manner. For a sequence :
-
•
At step , the model uses the first tokens of as input to predict the -th token: i.e., given , the model predicts the token .
-
•
. At step it stops.
-
•
-
3.
Loss Function: The loss function is the CE loss between the predicted logits of the indices and the ground truth tokens in . Formally, the loss for a single example is computed as:
where is the predicted probability of the -th token given the preceding tokens.
-
4.
Inference and Loss Calculation: During inference, the model generates tokens one by one in an autoregressive manner, starting with the class label and the random first token .
II-E2 Implementation and Training Details
We implemented the nano-GPT and MONAI transformer architectures using PyTorch. The transformers consist of multiple layers of self-attention and feedforward networks, with each layer designed to capture the dependencies between tokens in the sequence. The training was performed using a learning rate scheduler, with the Adam optimizer applied to minimize the cross-entropy loss.
Both models were trained for 100 epochs with a batch size of 32, using a maximum sequence length of tokens. We continued training both models after the 100 epochs but the MONAI validation and training losses stated to diverge. Hence, the models that are evaluated here are those trained with 100 epochs each. This training enabled the transformer to learn an autoregressive mapping of the latent discrete space, providing a robust mechanism for generating new latent sequences during inference. These latent sequences are then transformed back into the original RF signal space using the VQ-VAE decoder
III Methods of Evaluation
To evaluate the effectiveness of our approach, we utilized multiple evaluation techniques targeting both the VQVAE and the transformer-generated outputs.
III-A Evaluation of VQVAE
The VQVAE model was evaluated using a variety of reconstruction metrics. Most importantly, we analyzed the codebook usage for each class (Figure 6). This helps identify potential mode collapse, which would result in skewed histograms of codebook usage for one or more classes. Ensuring uniform codebook usage across classes validates proper training of the VQVAE and indicates robustness in the learned representations. Results of these evaluations are discussed in subsequent sections. Additionally, we created 10 VQVAE reconstructions for each class and visualized the constellation diagrams for the I/Q samples. These diagrams (Fig. 3) were compared to the original class constellations (Fig. 2) to assess the fidelity of reconstructions visually.
III-B Fidelity and Diversity Evaluation
For evaluating the fidelity and diversity of the generated fakes, we adopted the metrics described in [19], specifically Topological Precision and Recall (TopP&R). These metrics are robust and reliable for evaluating generative models, offering statistical consistency under noise and perturbations.
III-B1 Fidelity
Fidelity measures how closely the generated samples resemble the real samples in the dataset. Using the TopP&R framework, fidelity is computed based on the overlap between the estimated support of the real data and that of the generated data. This overlap is quantified through kernel density estimation (KDE) and a bootstrap-derived confidence band that ensures robustness against noise. The fidelity metric helps determine if the generated data retains key characteristics of the real data, such as constellations or signal structures, without introducing artifacts.
III-B2 Diversity
Diversity evaluates whether the generative model produces outputs that span the full variability of the real data. A high diversity score indicates that the generated samples adequately represent the range of variations in the training data. Using TopP&R, diversity is assessed by determining whether the generated samples cover the support of the real data. The robust support estimation provided by TopP&R ensures that diversity metrics are not unduly influenced by outliers or sparsely distributed data points.
III-B3 Comparison with Other Metrics
The TopP&R framework offers several advantages over traditional metrics like Fréchet Inception Distance (FID) and Precision and Recall (P&R). By systematically filtering out topological noise and focusing on statistically significant features, TopP&R ensures reliable support estimation and consistent evaluation. This robustness is particularly valuable for RF signal datasets, which are prone to noise and adversarial perturbations.
III-C Perceptual Analysis of Constellations
Similarly like with VQVAE reconstructions, the perceptual method of evaluation involves plotting the I/Q constellations for each class and visually comparing them against the original constellations (Figures IV-D, IV-D). This qualitative analysis provides an intuitive understanding of how well the reconstructions preserve class-specific signal characteristics.
IV Results
We here present performance of the proposed generative model, ReFormer, evaluated using fidelity, diversity, and Top-F1 metrics for both the MONAI and the GPT transformer transformer. Additional quantitative metric included here is the accuracy of a pretrained classifier on the original dataset, the VQVAE reconstructions and the fakes from the both transformers. Finally, qualitative evaluation was conducted by visualizing I/Q constellations and analyzing codebook usage histograms.
IV-A Quantitative Metrics
The fidelity, diversity, and Top-F1 metrics, computed for both transformers, are summarized in Table I.
Transformer | Parameters | Fidelity | Diversity | Top-F1 |
---|---|---|---|---|
MONAI | 443K | 1.0 | 0.6909 | 0.8172 |
nano-GPT | 36.2K | 1.0 | 0.8455 | 0.9163 |
Both transformers achieved a fidelity score of 1.0, indicating that the generated samples closely resemble the real data. However, the nano-GPT transformer outperformed the MONAI transformer in both diversity (0.8455 vs. 0.6909) and Top-F1 (0.9163 vs. 0.8172), demonstrating its ability to produce a more diverse range of samples while maintaining high accuracy.
To further validate the quality of the generated RF fakes, a pretrained classifier, initially trained on the original dataset, was tested on various datasets: the original test data, VQVAE reconstructions of the test data, and fakes generated by both the MONAI and nano-GPT transformers. The test data consisted of 500 samples per modulation class. Table II summarizes these results.
Dataset | Classification Accuracy (%) |
---|---|
Original Test Data | 100.00 |
Reconstructed Test Data | 100.00 |
GPT Generated Fake Data | 81.80 |
MONAI Generated Fake Data | 44.07 |
The classification accuracy metrics provide insight into the practical utility of the generated datasets. Both the original and reconstructed test data achieved a perfect classification accuracy of 100%, demonstrating that the VQVAE model effectively captures and retains the essential characteristics of the RF signals, such that the classifier remains highly accurate.
However, there is a notable difference in the performance of the fakes: the nano-GPT transformer produced fakes with a classification accuracy of 81.80%, significantly higher than that of the MONAI transformer, which only achieved 44.07%. This disparity underscores the nano-GPT transformer’s superior capability to generate more realistic and complex RF signal fakes which maintain salient characteristics.
The performance loss in the MONAI-generated fakes suggests potential issues in capturing the finer details of the signal characteristics, or the overfitting to less generalizable features during the training phase (see the conclusions).
IV-B Qualitative Evaluation
To further assess the performance, we visualized:
-
•
Reconstructions: Figures IV-D and IV-D illustrate a random single-fake-based I/Q constellation for both the MONAI and GPT transformers, respectively. This includes I/Q constellations for all six modulation classes. Both generative models preserve the class-specific characteristics of the signals. The nano-GPT based fakes produced constellation diagrams that closely resemble the original, indicating effective signal learning.
-
•
Codebook Usage: We analyzed the codebook usage across all signals for the both transformers and the VQVAE latents. The codebook usage histograms for all six classes were combined into a single figure, revealing uniform usage across all codewords, thereby validating the proper training of the VQVAE.




IV-C Codebook Usage
Figure 7 shows the codebook usage histograms for all six modulation classes generated by the GPT transformer. Similarly, Figure V presents the codebook usage for the MONAI transformer. Each figure contains six subplots, one for each modulation class. The uniform usage of codewords across all classes indicates that the VQVAE was properly trained and did not suffer from mode collapse.


















IV-D Discussion
The results indicate that both transformers are capable of generating high-fidelity RF signal fakes. While the MONAI transformer achieves satisfactory performance, the GPT transformer demonstrates superior diversity and accuracy, making it a more robust choice for generating a wide range of RF signal variations. The qualitative evaluations further support these findings, showing consistent reconstruction quality and uniform codebook usage. The fact the loss curves for the MONAI training start diverging after 100 epochs, indicate that this model is at the edge of overfitting, possibly because its number of parameters is 10 times the number of nano-GPT parameters. This may be the cause of the slightly inferior performance, both in terms of qualitative (Figures and ) and quantitative indicators (Table I).
V Conclusion
The presented results validate the effectiveness of the proposed ReFormer framework in generating RF fakes for data augmentation, with potential applications in training machine learning models for wireless communication systems. In general, the both transformer models performed surprisingly well. Even the statistics of fake tokens (codeword indices) in Figures 7, are almost indistinguishable and similar to VQVAE statistics. This is a promising and simple approach for RF dataset augmentation. Future work will address the optimal complexity of the transformer model, as a function of the RF-signal dimension the length of the discrete representation and the dimension of the codebook Additionally, we will extend the model to include the context that can add the effects of the channel.
References
- [1] M. Arvinte and J. I. Tamir, “Mimo channel estimation using score-based generative models,” IEEE Trans. on Wireless Communications, vol. 22, no. 6, pp. 3698–3713, 2023.
- [2] ——, “Score-based generative models for robust channel estimation,” in 2022 IEEE Wireless Comm. and Network. Conference (WCNC), 2022.
- [3] M. Letafati, S. Ali, and M. Latva-aho, “Generative AI-Based Probabilistic Constellation Shaping With Diffusion Models,” 2023.
- [4] Weidong Wang and Jiancheng An and Hongshu Liao and Lu Gan and Chau Yuen, “Radio Generation Using Generative Adversarial Networks with An Unrolled Design,” 2023. [Online]. Available: https://arxiv.org/abs/2306.13893
- [5] E. Balevi, A. Doshi, A. Jalal, A. Dimakis, and J. G. Andrews, “High dimensional channel estimation using deep generative networks,” IEEE Journal on Selected Areas in Communications, vol. 39, no. 1, p. 18–30, 2020.
- [6] Y. Zhou, A. Wijesinghe, S. Zhang, and Z. Ding, “Tire-gan: Task-incentivized generative learning models for radiomap estimation with radio propagation model,” 2024.
- [7] L. Boegner, M. Gulati, G. Vanhoy, P. Vallance, B. Comar, S. Kokalj-Filipovic, C. Lennon, and R. D. Miller, “Large Scale Radio Frequency Signal Classification,” 2022. [Online]. Available: https://arxiv.org/abs/2207.09918
- [8] Y. Kaasaragadda, A. Rodriguez, and S. Kokalj-Filipovic, “Can We Learn to Compress RF Signals?” in IEEE International Balkan Conference on Communications and Networking (BalkanCom), 2024.
- [9] A. Rodriguez, Y. Kaasaragadda, and S. Kokalj-Filipovic, “Deep-Learned Compression for Radio-Frequency Signal Classification,” 2024 IEEE Int. Symposium on Information Theory Workshops (ISIT-W), 2024.
- [10] A. Radford, K. Narasimhan, T. Salimans, and I. Sutskever, “Improving language understanding by generative pre-training,” https://cdn.openai.com/research-covers/ language-unsupervised/language_understanding_paper.pdf, 2018.
- [11] A. Van Den Oord, O. Vinyals et al., “Neural discrete representation learning,” Advances in neural information processing systems, vol. 30, 2017.
- [12] C. K. Sonderby, B. Poole, and A. Mnih, “Continuous Relaxation Training of Discrete Latent Variable Image Models,” in Intern. Conf. on Neural Information Processing Systems, 2017.
- [13] A. Razavi, A. van den Oord, and O. Vinyals, “Generating diverse high-fidelity images with VQ-VAE-2,” in Intern. Conf. on Neural Information Processing Systems, 2019.
- [14] M. Huh, B. Cheung, P. Agrawal, and P. Isola, “Straightening out the straight-through estimator: Overcoming optimization challenges in vector quantized networks,” in Intern. Conf. on Machine Learning, 2023.
- [15] Takida Y. et al., “SQ-VAE: Variational bayes on discrete representation with self-annealed stochastic quantization,” arXiv preprint arXiv:2205.07547, 2022.
- [16] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
- [17] A. Karpathy, “NG Video Lecture,” https://github.com/karpathy/ng-video-lecture, 2023.
- [18] Walter H. L et al., “Generative AI for Medical Imaging: extending the MONAI Framework,” 2023. [Online]. Available: https://arxiv.org/abs/2307.15208
- [19] P. J. Kim, Y. Jang, J. Kim, and J. Yoo, “Toppr: Robust support estimation approach for evaluating fidelity and diversity in generative models,” 2024. [Online]. Available: https://arxiv.org/abs/2306.08013