FB01fi
Wireless Image Transmission with Semantic and Security Awareness
Abstract
Semantic communication is an increasingly popular framework for wireless image transmission due to its high communication efficiency. With the aid of the joint-source-and-channel (JSC) encoder implemented by neural network, semantic communication directly maps original images into symbol sequences containing semantic information. Compared with the traditional separate source and channel coding design used in bit-level communication systems, semantic communication systems are known to be more efficient and accurate especially in the low signal-to-the-noise ratio (SNR) regime. This thus prompts a critical while yet to be tackled issue of security in semantic communication: it makes the eavesdropper much easier to crack the semantic information as it can be retrieved even in a highly noisy channel. In this letter, we develop a semantic communication framework that accounts for both semantic meaning decoding efficiency and its risk of privacy leakage. To this end, targeting wireless image transmission, we propose an JSC autoencoder featuring residual structure for efficient semantic meaning extraction and transmission, and the training of which is guided by a well-designed loss function that can flexibly regulate the efficiency-privacy trade-off. Extensive experimental results are provided to show the effectiveness and robustness of the proposed scheme.
I introduction
The wide success of artificial intelligence (AI) in every perspectives of our society has also driven the rapid advancement in wireless communications [1]. Recently, as a consequence of the fusion of AI and communication, a novel paradigm, called semantic communication, has received great attention. Building on the deep learning based end-to-end communication system [2], semantic communication further introduces efficient semantic encoder network, so that the essential semantic information instead of the raw data can be extracted, encoded and delivered to the receiver, which is believed to be a more efficient and effective way to convey information in the next generation wireless networks [3].
In particular, semantic communication has shown promising gain in image transmission task. In classic bit-level communication, the images are first compressed into binary sequences by source coding algorithms (e.g., JPEG, JPEG2000, BPG), followed by channel coding schemes (e.g., Turbo, LDPC) that add certain redundancy to combat against the random channel perturbation, and after that the codewords are modulated into symbol sequences for reliable transmission. Such a separate source and channel coding scheme is hard to guarantee the optimality of the whole system in terms of rate-distortion trade-off. Prompted by this, authors in [4] first proposed a joint-source-and-channel-coding (JSCC) method for wireless image transmission, where the images were directly mapped into complex-valued symbols through a well-trained neural network. To further improve the quality of reconstructed image, the feedback and multi-layer bandwidth-agile design were subsequently proposed in [5, 6]. In addition, since semantic communication aims to deliver semantic meaning instead of perfectly reconstructing the original source at the receiver. For image transmission tasks, the commonly-used pixel similarity (e.g., PSNR [4]) is no longer appropriate to describe the goodness of semantic communication. Given this, some new reconstruction performance metric customized for semantic communication were proposed in [7, 8], where semantic communication exhibits remarkably higher efficiency and accuracy than the bit-level communication, especially in the low signal-to-the-noise ratio (SNR) regime.
Like every coin has two sides, accompanying the good performance in the low SNR regime is the higher risk of privacy leakage, as it implies that the eavesdroppers can crack the semantic information more easier even through a highly noisy channel. This thus prompts a critical issue regarding secure semantic communications. To design security-aware semantic communication systems, one needs to balance the trade-off between the transmission efficiency at the destination user (Bob), and the information leakage to the eavesdropper (Eve). In classic bit-level communication systems, the secure channel capacity, rather than channel capacity, serves as the main performance metric of interest to ensure security. The theoretical analysis of secure capacity was presented in [9, 10] targeting bit-level. Building on it, the secrecy capacity region can be derived, and secure transmission can be achieved by proper transmission power control and specific channel coding designs [11, 12, 13]. Nevertheless, in semantic communication, the “black-box” nature of JSCC block implemented by neural networks makes the derivation of secure channel untractable if not impossible. Therefore, the existing secure communication schemes building for bit-level communication systems cannot be directly applied to semantic communication systems, leaving the secure semantic communication remains a largely uncharted area.
As discussed above, there are two basic objectives in secure semantic communication systems, namely, the one concerns efficiency, i.e., the semantic recovery quality at Bob; and the other concerns privacy, i.e., the semantic leakage to Eve. This gives rise to a fundamental trade-off between efficiency and privacy. In this letter, we develope a secure semantic communication framework that accounts for both objectives above. Firstly, we propose an efficient joint-source-channel (JSC) autoencoder featuring the cascading of residual block with convolution layer for efficient semantic meaning extraction and transmission, and the training of which is guided by a well-designed loss function that can flexibly balance the efficiency-privacy trade-off. Extensive experiments are conducted to show that the proposed JSCC scheme can significantly outperform the traditional separate source and channel coding scheme in the low SNR regime, in the meanwhile, prevent privacy leakage at the semantic level, thus achieving the desired efficient and secure semantic communication.
II System Model
In this section, we present the downlink semantic communication system for image transmission, and put forth the privacy issue caused by Eve.
II-A Semantic Transmitter

As shown in Fig 1, the base station (BS) to confidentially transmit the image to the legitimate user (Bob), in the presence of a passive eavesdropper (Eve). Different from the conventional separate source coding (e.g., JPEG, BPG) and channel coding (e.g., Turbo, LDPC code) design, the compression and anti-jamming are implemented by the joint-source-and-channel (JSC) encoder composed of deep neural networks (DNNs). The encoding process is given as follows:
(1) |
where and denote the trainable parameters in JSC encoder and the latent semantic representation of the image source , respectively.
Considering the transmit power limitation, we have the following power constraint on the transmitted signal, i.e., .
II-B Wireless Transmission
We consider the additive white Gaussian noise (AWGN) channel. The received signal of Bob through the legitimate AWGN channel is given by
(2) |
where , is the average noise power.
II-C Semantic Receiver
In the receiver side, both Bob and Eve can try to decode the image as follows:
(4) |
We note that both JSC encoder and decoder can only be deployed after sufficient training, while the unbearable communication overhead will be introduced as a cost. Moreover, there is a strong demand for serving multiple users in semantic communication system. Given these, sharing the JSC decoder publicly should be a proper solution for alleviating the training burden, i.e., , as users in the cell can collaborate to improve the JSC decoder through federated learning. However, the shared JSC decoder raises critical privacy issue that it makes Eve easy to crack the semantic information as it can be retrieved even in a quite noisy channel. We shall tackle such issues in the subsequent section.
III Proposed Method
In this section, we first propose a JSC autoencoder featuring the cascading of residual block with convolution layer to extract the semantic information efficiently. Given the potential privacy leakage, we then propose a data-driven scheme that balances the efficiency-privacy trade-off.
III-A JSC Autoencoder Design

The network architecture of JSC autoencoder is shown in Fig. 2. As in [14], the residual blocks are added to improve the model performance and training stability. In the encoder part, we adopt the method of alternately cascading the residual block with convolution layer, and downsampling the input image three times through residual block. In addition, all the intermediate results are normalized by generalized normalization transformations (GDN) [15], which is widely used in image compression. The network structure of the decoder is similar to the encoder, while the sub-pixel convolution layer [16] is adopted to reconstruct the image. Compared with the transposition convolution layer, it can improve the resolution of the obtained image through learning, thus improving the reconstruction performance.
Note that, unlike in traditional communication systems where perfect recovery of from is pursued, we train the autoencoders in an end-to-end way, as such, the image compression and channel adaption can be achieved by using the following loss function,
(5) |
where denotes the batch size, is the mean squared-error (MSE) distortion between the reconstructed image and the raw image.
III-B Privacy-aware Design
As reported in [3, 4], one of the advantages of semantic communication is that the satisfactory performance can be achieved even in the low SNR regime. As shown in Fig. 3111The detailed settings are given in Section IV., with the poor wiretap channel, Eve from the conventional image transmission system cannot decode anything, while with the powerful JSC decoder, the Eve from the semantic communication system can still crack the semantic information. It shows that the semantic communication system does have higher risk of privacy leakage than bit-level communication.
To address this issue, similar to secure channel capacity, an intuitive way is to take the reconstruction quality of Eve into account of the training objective. The loss function with privacy aware is given by
(6) |
where is the weighting factor, characterizes the privacy leakage to Eve.
Then, the main challenge is to give a proper design of . There are two principles for it. Firstly, does not have to be the same form with , as privacy information may have various definitions. Secondly, there exists a trade-off between the reconstruction quality of Bob and privacy leakage to Eve. We should minimize the reconstruction distortion of Bob while protecting privacy to a certain degree. Considering these, we propose the following criterion of privacy leakage,
(9) |
where is the predefined indicator of privacy protection, denotes the all black image with a same shape of .
Remark 1.
Generally, the degree of privacy leakage can be characterized by the mutual information . However, is not tractable. We instead minimize , the upper bound of . It can be achieved by forcing the image decoded by Eve to converge to the constant one, i.e., the all-black image. In addition, serves as a penalty function for the training objective, for which we set a threshold, and if the privacy leakage exceeds , will corrects the training direction for privacy protection.
IV Simulation Results
In this section, we conduct a set of experiments evaluating the performance of the proposed scheme, including the reconstruction quality at Bob and the privacy leakage to Eve. The AWGN system in Section II-B is first considered, where is set to 15dB. In addition, a more practical multiple-input-single-output (MISO) system is also considered, where precoding techniques can be exploited to relax the noise power requirement in AWGN system. Specifically, is normalized to to satisfy the average power constraints, with . In all the presented figures, we denote as the transmission signal-to-noise ratio at Bob side.
Dataset: We use the Linnaeus 5 dataset for training and testing.222chaladze.com/l5 The images have dimension . The whole image dataset is composed of 5 classes, including berry, bird, dog, flower, and other. There are 1200 training images, 400 testing images per class.
Performance Metric: To measure the performance of the proposed scheme and the baseline schemes, we use the structural similarity index measure (SSIM) as the performance metrics, which is given below.
(10) |
where , , are the mean and variance of , and the covariance between and , respectively. and are constants for numeric stability.
Training Setting: We adopt the Adam optimizer, with the learning rate of 0.0001. the pretrained model is first obtained by using the ImageNet dataset with the loss function of MSE and the assumption of ideal transmission (i.e., the receiver can obtain without noise.). Then the final model is obtained through training under specific channel and loss setting. All the number of filters in residual blocks and convolution layers are . The experiments are implemented by PyTorch and Python 3.8 on a Linux server with 2 NVIDIA RTX 3090 GPUs.
IV-A Reconstruction Evaluation

In this subsection, we compare the reconstruction quality of the proposed schemes, the JSCC scheme in [4], and the conventional schemes with separate source and channel coding. For the conventional schemes, two source coding schemes including JPEG2000 and BPG are considered. To ensure fairness, the same number of transmitted symbols are guaranteed. The results are presented in Fig. 4. It can be seen that for the two traditional schemes, the reconstruction quality is bad when the channel quality is poor (i.e., ). This is because the traditional scheme needs to represent the original picture as bit sequences. The poor channel leads to high bit error rate. For the compression and decompression schemes of BPG and JPEG2000 standards, the accumulation of bit errors will cause decoding failure. As for the JSC coding scheme, it transmits the most important semantic information in the form of symbols. Although there exist symbol errors, only semantic offset occurs. It has a relatively satisfactory performance under low SNR regime and maintains similar performance with traditional schemes under high SNR regime. Moreover, the alternating residual and convolutional structure outperforms the full convolution structure, which verifies the effectiveness of the proposed scheme.
IV-B Security Evaluation







IV-B1 AWGN System
Under the AWGN system, the reconstruction performance of the model based on the two training objectives (e.g., MSE in (5), SecureMSE in (6)) is shown in Fig. 5. It is found that the MSE scheme achieves the best reconstruction perforamance at Bob side. However, with the increase of SNR, especially when , the reconstruction performance of Eve improves a lot as well. We present some examples of reconstructed images when , it can be seen that baseline models without privacy awareness can also roughly reconstruct the approximate image, thus verifying the existence of privacy leakage in the current semantic communication system. For the proposed SecureMSE scheme, the reconstruction quality of Eve does not improve a lot as the SNR grows due to the privacy awareness embedded in the well-designed loss function. From Fig. 7, it can be seen that Eve with SecureMSE model can no longer obtain any privacy information. In addition, comparing the reconstruction effect of Bob under two objectives, SecureMSE only causes a slight performance loss, which is negligible for human’s perception. The validity of the proposed algorithm is thus verified.





IV-B2 MISO System
For a typical MISO system, and are respectively given by
(11) |
where and denote the channel between BS and Bob, BS and Eve, respectively. The maximum ratio transmission (MRT) precoding scheme is adopted, that is, . Then, and can be rewritten as
(12) |
where . In the following experiment, we has , and .
The performance comparison between the proposed scheme and the baseline one under MISO system is shown in Fig. 6. In the low SNR regime (i.e., ), the reconstruction performance of SecureMSE in Bob decreases to a certain extent. This is because in the MISO system, as shown in (12), the main difference between Bob and Eve is the fading coefficient. With the high noise level, the JSC decoder can not distinguish Bob and Eve from the heterogeneity of channel, thus makes it more difficult to maximize the reconstruction quality at Bob while suppressing the privacy leakage to Eve. As the SNR increases (i.e., ), it can be seen that both the reconstruction performance of Bob and the privacy preservation against Eve improve a lot. In addition, the example reconstruction image with is shown in Fig. 8. It can be seen that the model based on the conventional MSE loss still suffer from the problem of privacy leakage, while the model trained with the proposed SecureMSE loss once again prevents privacy leakage, thus verifying the robustness of the proposed algorithm in dealing with various scenarios.
V Conclusion
In this letter, we study the semantic communication system for wireless image transmission, and an efficient JSC framework is developed. In addition, we discuss the privacy issue in the current semantic communication system and reveal the potential privacy leakage. Prompted by this, we proposed a data-driven privacy protection scheme called SecureMSE featuring a well designed loss function with privacy awareness. Experimental results verify the effectiveness and robustness of the proposed scheme.
References
- [1] D. Gunduz, Z. Qin, I. E. Aguerri, H. S. Dhillon, Z. Yang, A. Yener, K. K. Wong, and C.-B. Chae, “Beyond transmitting bits: Context, semantics, and task-oriented communications,” [Online]. Available: https://arxiv.org/abs/2207.09353, 2022.
- [2] H. Ye, L. Liang, G. Y. Li, and B.-H. Juang, “Deep learning-based end-to-end wireless communication systems with conditional gans as unknown channels,” IEEE Trans. Wireless Commun., vol. 19, no. 5, pp. 3133–3143, 2020.
- [3] H. Xie, Z. Qin, G. Y. Li, and B.-H. Juang, “Deep learning enabled semantic communication systems,” IEEE Trans. Singal Process., vol. 69, pp. 2663–2675, 2021.
- [4] E. Bourtsoulatze, D. B. Kurka, and D. Gündüz, “Deep joint source-channel coding for wireless image transmission,” IEEE Trans. Cogn. Commun. Netw., vol. 5, no. 3, pp. 567–579, 2019.
- [5] D. B. Kurka and D. Gündüz, “Deepjscc-f: Deep joint source-channel coding of images with feedback,” IEEE J. Sel. Areas Inf. Theory, vol. 1, no. 1, pp. 178–193, 2020.
- [6] D. B. Kurka and D. Gündüz, “Bandwidth-agile image transmission with deep joint source-channel coding,” IEEE Trans. Wireless Commun., vol. 20, no. 12, pp. 8081–8095, 2021.
- [7] J. Wang, S. Wang, J. Dai, Z. Si, D. Zhou, and K. Niu, “Perceptual learned source-channel coding for high-fidelity image semantic transmission,” [Online]. Available: https://arxiv.org/abs/2205.13120, 2022.
- [8] D. Huang, F. Gao, X. Tao, Q. Du, and J. Lu, “Towards semantic communications: Deep learning-based image semantic coding,” [Online]. Available: https://arxiv.org/abs/2208.04094, 2022.
- [9] P. K. Gopala, L. Lai, and H. El Gamal, “On the secrecy capacity of fading channels,” IEEE Trans Inf. Theory, vol. 54, no. 10, pp. 4687–4698, 2008.
- [10] Y. Liang, H. V. Poor, and S. Shamai, “Secure communication over fading channels,” IEEE Trans Inf. Theory, vol. 54, no. 6, pp. 2470–2492, 2008.
- [11] K.-L. Besser, P.-H. Lin, C. R. Janda, and E. A. Jorswieck, “Wiretap code design by neural network autoencoders,” IEEE Trans. Inf. Forensics Security, vol. 15, pp. 3374–3386, 2019.
- [12] R. Fritschek, R. F. Schaefer, and G. Wunder, “Deep learning for the gaussian wiretap channel,” in Proc. IEEE International Conference on Communications (ICC), pp. 1–6, Shanghai, China, 2019.
- [13] T.-Y. Tung and D. Gunduz, “Deep joint source-channel and encryption coding: Secure semantic communications,” [Online]. Available: https://arxiv.org/abs/2208.09245, 2022.
- [14] Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Deep residual learning for image compression.,” in Proc. of the IEEE conference on computer vision and pattern recognition (CVPR), Long Beach, USA, 2019.
- [15] J. Ballé, V. Laparra, and E. P. Simoncelli, “Density modeling of images using a generalized normalization transformation,” [Online]. Available: https://arxiv.org/abs/1511.06281, 2015.
- [16] W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, and Z. Wang, “Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network,” in Proc. of the IEEE conference on computer vision and pattern recognition (CVPR), pp. 1874–1883, Las Vegas, USA, 2016.