Joint Source-Channel Coding for Wireless Image Transmission: A Deep Compressed-Sensing Based Method
Abstract
Nowadays, the demand for image transmission over wireless networks has surged significantly. To meet the need for swift delivery of high-quality images through time-varying channels with limited bandwidth, the development of efficient transmission strategies and techniques for preserving image quality is of importance. This paper introduces an innovative approach to Joint Source-Channel Coding (JSCC) tailored for wireless image transmission. It capitalizes on the power of Compressed Sensing (CS) to achieve superior compression and resilience to channel noise. In this method, the process begins with the compression of images using a block-based CS technique implemented through a Convolutional Neural Network (CNN) structure. Subsequently, the images are encoded by directly mapping image blocks to complex-valued channel input symbols. Upon reception, the data is decoded to recover the channel-encoded information, effectively removing the noise introduced during transmission. To finalize the process, a novel CNN-based reconstruction network is employed to restore the original image from the channel-decoded data. The performance of the proposed method is assessed using the CIFAR-10 and Kodak datasets. The results illustrate a substantial improvement over existing JSCC frameworks when assessed in terms of metrics such as Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) across various channel Signal-to-Noise Ratios (SNRs) and channel bandwidth values. These findings underscore the potential of harnessing CNN-based CS for the development of deep JSCC algorithms tailored for wireless image transmission.
Index Terms:
Wireless image transmission, joint source-channel coding, compressed sensing, deep learningI Introduction
I-A Motivation
Wireless transmission of images have faced various challenges in compression, transmission resilience, and quality preservation. Despite the popularity of the wireless image transmission systems, achieving reliable transfer with efficient image compression remains a challenge [1, 2]. Conventional approaches utilize separate source and channel coding methods. While this strategy has its merits, an alternative is needed to improve the performance in noisy and bandwidth-limited conditions [3]. The joint source-channel coding (JSCC) method offers a compelling alternative, integrating statistical image properties with channel characteristics to enhance compression efficiency and resilience to channel noise [4].
Incorporating signal-processing concepts, particularly compressed sensing (CS), in the design of JSCC presents an opportunity to enhance wireless image transmission [5]. While CS has demonstrated the ability to recover sparse signals, its sparsity assumption and expensive reconstruction process motivate new efficient methods. Deep Learning (DL) provides a suitable option for CS to address its limitations and improve the efficiency [6]. This paper aims to leverage DL-based CS within the JSCC framework for wireless image transmission. Through DL-based sampling and advanced reconstruction, the goal is to propose a novel approach that enhances compression efficiency and preserves image quality in challenging conditions.

I-B Literature Review
Up to now, various techniques and frameworks have been proposed for JSCC and CS methods. In the following, these methods are categorized and some important relevant papers are reviewed and summarized.
I-B1 DL-based JSCC for wireless image transmission
DL-based methods have offered a new option for JSCC by leveraging encoder and decoder network models that learn from input images. These models are typically auto-encoders implemented as deep networks. In this methodology, the encoder output forms the transmitted code-word, which is a compressed representation of the source. The paired decoder at the receiver end aims to reconstruct the original source image by decoding the received noisy code-word, which is essentially a latent variables distorted by channel noise. For instance, Bourtsoulatze et al. introduce an end-to-end model that exhibits high performance, though it sacrifices interpretability [7]. Likewise, the authors in [8] present a DL-based JSCC approach capable of adapting to channel variations, which, however, suffers from a notable encoding delay.
I-B2 DL-based CS methods
DL excels in learning features for tasks like recognition and restoration, replacing traditional methods in CS. The DL models prioritize preserving image information, notably local features, by integrating stacked convolutional layers in the reconstruction process. This advancement significantly accelerates reconstruction speed compared to conventional methods. DL-based CS models dynamically adapt during training, eliminating the need for manual design. For example, Deep-CS [9] offers a straightforward CS approach, lacking robust theoretical guarantees. AMP-Net, a denoising-based approach with end-to-end training, aims to leverage prior knowledge but is sensitive to hyperparameter tuning [10]. Trans-CS, utilizing self-attention mechanisms, enhances reconstruction quality in compressed sensing [11]. Despite its flexibility, it is prone to overfitting.
I-B3 Application of CS in wireless image transmission systems
Applying CS to wireless image transmission offers practical advantages, evident in solutions like SoftCast [12] and SparseCast [13]. SoftCast uses a Discrete Cosine Transform (DCT) on images and transmits coefficients directly through a dense constellation [12]. SparseCast encodes DCT coefficients, optimizing bandwidth with frequency domain sparsity and minimal metadata using fixed measurement levels [13]. While versatile, these methods are sensitive to channel changes. Song et al. propose a distributed CS for scalable cloud-based image transmission [14]. This strategy improves reconstruction using cloud resources, cutting transmission time and enhancing resistance to channel impairments. However, cloud disruptions may impact image quality and increase transmission errors.
I-C Contributions
In this paper, we propose a novel JSCC algorithm that leverages the power of DL-based CS to achieve higher compression rate and better resilience to channel noise compared to state-of-the-art DL-based JSCC methods. In the proposed method, the images are firstly compressed and encoded using a novel CNN-based structure. This structure comprises a block-based CS (BCS) module realised via a convolutional neural network (CNN) which complements a DL-based source and channel encoder. The introduction of this module allows to leverage the properties of CS to enhance the performance of the DL-based JSCC scheme. The CS module captures the image’s structural information which is then mapped to a complex-valued signal by the DL-based encoder. The compressed encoded images are then transmitted over a noisy channel modeled as a non-trainable layer. At the receiver side, a CNN-based decoder recovers the channel-encoded data and reconstructs the images. The decoder consists of a DL-based decoder network which recovers the channel-encoded information from the channel noise. This is then fed into a CNN-based reconstruction network, which reconstructs the original image from the compressed samples. The proposed JSCC algorithm leverages a DL-based sampling matrix and reconstruction capabilities for improved image compression and reconstruction in wireless image transmission system. Numerical evaluations show that the proposed scheme significantly outperforms existing DL-based JSCC methods such as Deep JSCC (DJSCC) [7] and Attention DL based JSCC (ADJSCC) [15] with respect to various metrics.
II System Model
Consider a point-to-point image transmission system as shown in Fig. 1. An input image of size heightwidthnumber of channels is represented by a vector , where and denotes the set of real numbers. The joint source-channel encoder encodes via the encoding function which produces a vector of complex-valued channel input symbols . The encoding process can be expressed as:
(1) |
where is the number of channel input symbols, is the parameter set of the joint source-channel encoder and denotes the set of complex numbers. The encoder maps the -dimensional vector of real-valued image to a -dimensional vector of complex-valued channel input samples .
To satisfy the average power constraint at the joint source-channel encoder, is also imposed, where denotes the complex conjugate of and is the average power constraint [3]. The encoded symbols are transmitted over a noisy channel represented by the function . Additive white Gaussian noise (AWGN) is considered in our work. The channel output symbols received by the joint source-channel decoder are expressed as:
(2) |
where the vector consists of independent and identically distributed samples with the distribution . is the average noise power and denotes a circularly symmetric complex Gaussian distribution. The proposed method can be extended to other channel models which can be represented by a differentiable transfer function. The joint source-channel decoder uses a decoding function to map as follows:
(3) |
where is an estimation of the original image , and is the parameter set of the joint source-channel decoder. In this paper, the encoder and decoder functions are modeled using a novel CNN structure, as presented in the following section.
III Deep CS-Based JSCC
III-A Model Architecture
The architectural details of the encoder and decoder networks, along with their constituent blocks, are shown in Fig. 2. The JSCC encoder comprises a BCS sampling network, followed by an array of further processing blocks, which collectively realise image compression and resilience to channel-induced noise. Considering that the input channel statistics are generally not known at the decoder, the initial step involves normalizing input images based on the maximum pixel value of 255, thereby restricting pixel values to the [0, 1] range. Subsequently, these normalized pixels feed into the sampling layer, which gathers CS measurements. The sampling layer, employing BCS [16], generates compressed image samples.

The CS sampling operates as follows [17]. Initially, the image is partitioned into non-overlapping blocks, each having dimensions . Here, denotes the number of colour channels, and signifies the block size. Compressed measurements are derived using a sampling matrix , with dimensions . In scenarios where a sampling ratio, such as , is applied, the dimensions of become . The sampling process can be expressed as , with and representing the block. One important insight is that each row of the sampling matrix can be perceived as a filter. Hence, a convolutional layer is adopted to simulate this compressed sampling process. Given that the size of every image block is , the dimensions of each filter in the sampling layer are also , allowing each filter to produce a single measurement. Notably, for non-overlapping sampling, the convolutional layer employs a stride of . There are no biases associated with these filters, and no activation functions are applied post-sampling. In essence, the output is feature maps, with each column of output encapsulating measurements originating from an image block. Importantly, the learning process encompasses learning the sampling matrix alongside other network parameters through an end-to-end optimization, as elaborated in subsequent sections.
Subsequent to the sampling layer, the data flow progresses through a sequence of convolutional layers, followed by the application of Parametric Rectified Linear Unit (PReLU) activation functions and a normalization layer. This sequence of convolutional layers takes on the role of extracting crucial features from the compressed image. These features are then amalgamated to generate the channel’s input samples. The inclusion of nonlinear activation functions, represented here by PReLU, is pivotal. They facilitate the learning of a nonlinear mapping that maps the source signal space into the coded signal space. This is where the network can model complex, non-linear relationships within the data. As a final step within the encoder, the output of the last convolutional layer is subjected to a normalization process as follows:
(4) |
where is the conjugate transpose of , such that the channel input satisfies the average transmit power constraint . It should be noted that the output of last CNN layer would be the input of normalization layer. Following the encoding operation, the joint source-channel coded sequence is sent over the communication channel by directly transmitting the real and imaginary parts of the channel input samples over the I and Q components of the digital signal. The channel introduces random corruption to the transmitted symbols. To be able to optimize the proposed wireless image transmission system in an end-to-end manner, the communication channel must be incorporated into the overall architecture. Therefore, the communication channel is modeled as a non-trainable layer, which is represented by the transfer function in Eq. (2) [7].
The receiver comprises a joint source-channel decoder which reconstructs the received noisy compressed data. The decoder firstly maps the corrupted and compressed complex-valued signal to an estimation of the original channel input, then image blocks are reconstructed using a reconstruction network. Specifically, the decoder first inverts the operations performed by the encoder by passing the received corrupted coded inputs through a series of transpose convolutional layers with PReLU activation functions to map the image features to an estimate of the originally transmitted image block.
The recovered CS measurements are then used to reconstruct the original image. The reconstruction network consists of an initial reconstruction network and a deep reconstruction network [17]. Similar to the compressed sampling process, a convolutional layer with appropriate kernel size and stride is utilized to implement the initial reconstruction process. convolutional filters of size are used to obtain each initial reconstructed block. Then, a combination layer, which contains a reshape function and a concatenation function, is utilized to obtain the initial reconstructed image. This layer first reshapes each reconstructed vector to a block, then concatenates all blocks to get an initial reconstructed image. The initial reconstruction allows to reconstruct the entire image rather than an independent image block, thus making full use of both intra- and inter-block information for better reconstruction. Since there is no activation layer in the initial reconstruction network, it is a linear signal reconstruction network.
The initial reconstruction is followed by a non-linear reconstruction process which further improves the quality of the reconstructed image. In this paper, a deep sub-network is utilized [17], called a deep reconstruction sub-network, which realises the non-linear reconstruction process. The deep reconstruction sub-network contains layers where the layers except the first and the last are of the same type: filters of size where a filter operates on a spatial region across channels (feature maps). The first layer of the deep reconstruction sub-network operates on the initial reconstructed output, so that it has filters of size . The last layer, which outputs the final image estimation, consists of a single filter of size . In the experiments, and are set to and . Furthermore, ReLU is also utilized as activation function after each convolution layer in the deep reconstruction sub-network.
III-B Loss Function
The proposed encoder and the decoder network are optimized jointly in an end-to-end manner. Given the input image , the goal is to obtain a highly compressed encoded measurement with the encoder, and then recover the original input image from its noisy version with the decoder network. Since the encoder, decoder and communication channel form an end-to-end network, they can be trained jointly. Following most of DL based methods in this field, the mean square error is adopted as the cost function of the proposed network. The optimization objective is represented as:
(5) |
where represents the parameter of the decoder network needed to be trained, and is the final reconstructed output . Also, represents the number of samples or data points in the dataset. It should be noted that we train the encoder network and the decoder network jointly, but they can be utilized in the model independently.
IV Results and Discussions
The proposed model is implemented in Tensorflow and optimized using the Adam algorithm. The compression ratio , defined as the ratio of the channel bandwidth to source bandwidth , is changed from to . Also, the channel signal-to-noise ratio (SNR), defined as:
(6) |
is varied during different experiments. The performance of the algorithm is quantified in terms of peak SNR (PSNR) of the reconstructed images. PSNR is calculated as the ratio of the peak signal power () to the mean squared error () between the original and reconstructed images:
(7) |
We train our proposed JSCC architecture on both CIFAR-10 and Imagenet datasets and compare the results with state-of-the-art deep learning-based JSCC methods, namely DJSCC [7] and ADJSCC [15].
IV-A Evaluation on CIFAR-10 Dataset
The training dataset comprises images, each sized , alongside randomly generated realizations of the communication channel [7]. To gauge the effectiveness of the proposed technique, it is assessed on a distinct set of 10000 test images from the CIFAR-10 dataset, these images being separate from those used during training. Initially, a learning rate of is employed, which is then lowered to after 500000 iterations. Training is conducted using mini-batches, each containing samples, until the performance on the test dataset no longer shows improvement. The following values are considered in the experiments for this dataset: and . It is important to note that the test set images are not employed for tuning network hyperparameters. To account for the influence of channel-induced randomness, each image is transmitted 10 times during performance evaluation. The study examines the performance of the proposed algorithm within AWGN environment, with SNR being adjusted to varying levels.
In Fig. 3, the performance of the proposed algorithm on CIFAR-10 test images with respect to the compression ratio, for different SNR values is shown. It must be noted that for each case, the same SNR value is used in training and evaluation. The results show that the proposed method significantly outperforms the state-of-the-art methods DJSCC and ADJSCC across the entire range of compression ratio and for both low and medium SNR values.


Fig. 4 depicts the PSNR of the reconstructed images against the SNR of the channel, with a fixed compression ratio of . Each curve within Fig. 4 is generated by training the proposed end-to-end system at a specific channel SNR value, referred to as . Subsequently, the performance of the learned encoder/decoder parameters is assessed using test images under various SNR conditions, designated as . Essentially, each curve illustrates the effectiveness of the proposed approach when optimized for a channel SNR equal to , then tested in distinct channel conditions corresponding to . These findings shed light on the algorithm behavior when operating in channel conditions divergent from the optimization scenario, highlighting its resilience to channel quality variations. The outcomes highlight that the proposed method consistently outperforms trained DJSCC. Moreover, both the proposed method and ADJSCC exhibit adaptability to changing SNR, evident from their smooth decline in performance as SNR decreases. Notably, the proposed method holds an advantage over ADJSCC, showcasing superior performance as increases, surpassing ADJSCC () by up to .


IV-B Evaluation on Kodak Dataset
The proposed approach is also evaluated using higher resolution images. To achieve this, the proposed architecture is trained on the Imagenet dataset [7], a widely used dataset in this domain comprising around million images. The images are subjected to random cropping to generate patches of dimensions , which are then processed in batches of samples through the network. For this set of experiments, we set and . The model’s learning rate is set to , and training continues until convergence is achieved. For the Imagenet dataset, the model is trained using values of . This involves splitting the dataset into a ratio for training and validation purposes. For final assessment, the Kodak dataset is employed [18]. During the evaluation process, each image is transmitted times, allowing the performance to be averaged across multiple instances of the random channel. The evaluation scenario involves the consideration of AWGN channel.
Fig. 5 presents the comparison of average PSNR against SNR for a compression ratio of with DJSCC and ADJSCC. The results in Fig. 5 demonstrate that our method outperforms DJSCC and ADJSCC, capturing critical visual details in compressed images for better quality reconstructions. This indicates consistently higher PSNR values across varying SNR levels, highlighting improved image fidelity and minimized channel-induced distortions. The proposed technique excels in delivering superior image quality with increased compression level, even in noisy conditions.
Finally, a visual comparison of the reconstructed images for the proposed scheme trained for CIFAR-10 under consideration in AWGN channels in comparison with DJSCC and ADJSCC is presented in Fig. 6. For each reconstruction, we calculated the PSNR and structural similarity index measure (SSIM) values. Based on the given results, it is clear that the proposed method exhibits excellent visual reconstruction ability. Also, the method can more accurately restore the details of the original image. It can be concluded that the proposed method is capable of preserving the image quality and results in a superior compressing and reconstruction performance.
V Conclusion
This paper introduces a novel deep joint source-channel coding algorithm for efficient image transmission over wireless channels. The approach combines block-based CS with DL techniques to design a joint source-channel encoder. This encoder employs a CNN-based model for image compression, enhancing resilience against noise by proper encoding. The model integrates an adaptive CNN-based sampling matrix to capture structural information for improved compression and encodes compressed images into complex-valued signals that adhere to the average power constraint. The decoder network, comprising CNN-based layers, reconstructs high-quality images from channel-encoded data. Through joint training, the proposed method minimizes the loss function for high-quality image reconstruction. Evaluations on CIFAR-10 and Kodak datasets highlight the method’s superior performance compared to DJSCC and ADJSCC frameworks. Our approach consistently outperforms these methods in terms of PSNR across varying SNR and compression ratios, showcasing its effectiveness in achieving robust image transmission in the presence of wireless channel noise. This synergy between CS principles and DL-based techniques presents a promising solution for improved image compression and reconstruction in wireless image transmission systems.
References
- [1] N. Thomos, N. V. Boulgouris, and M. G. Strintzis, “Wireless image transmission using turbo codes and optimal unequal error protection,” IEEE transactions on image processing, vol. 14, no. 11, pp. 1890–1901, 2005.
- [2] J. Xu, B. Ai, W. Chen, A. Yang, P. Sun, and M. Rodrigues, “Wireless image transmission using deep source channel coding with attention modules,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 4, pp. 2315–2328, 2021.
- [3] M. Fresia, F. Perez-Cruz, H. V. Poor, and S. Verdu, “Joint source and channel coding,” IEEE Signal Processing Magazine, vol. 27, no. 6, pp. 104–113, 2010.
- [4] V. Abolghasemi, S. Ferdowsi, B. Makkiabadi, and S. Sanei, “On optimization of the measurement matrix for compressive sensing,” in 2010 18th European Signal Processing Conference. IEEE, 2010, pp. 427–431.
- [5] R. Middya, N. Chakravarty, and M. K. Naskar, “Compressive sensing in wireless sensor networks–a survey,” IETE technical review, vol. 34, no. 6, pp. 642–654, 2017.
- [6] M. Rani, S. B. Dhok, and R. B. Deshmukh, “A systematic review of compressive sensing: Concepts, implementations and applications,” IEEE access, vol. 6, pp. 4875–4894, 2018.
- [7] E. Bourtsoulatze, D. B. Kurka, and D. Gündüz, “Deep joint source-channel coding for wireless image transmission,” IEEE Transactions on Cognitive Communications and Networking, vol. 5, no. 3, pp. 567–579, 2019.
- [8] D. B. Kurka and D. Gündüz, “DeepJSCC-f: Deep joint source-channel coding of images with feedback,” IEEE Journal on Selected Areas in Information Theory, vol. 1, no. 1, pp. 178–193, 2020.
- [9] Y. Wu, M. Rosca, and T. Lillicrap, “Deep compressed sensing,” in International Conference on Machine Learning. PMLR, 2019, pp. 6850–6860.
- [10] Z. Zhang, Y. Liu, J. Liu, F. Wen, and C. Zhu, “AMP-Net: Denoising-based deep unfolding for compressive image sensing,” IEEE Transactions on Image Processing, vol. 30, pp. 1487–1500, 2020.
- [11] M. Shen, H. Gan, C. Ning, Y. Hua, and T. Zhang, “TransCS: a transformer-based hybrid architecture for image compressed sensing,” IEEE Transactions on Image Processing, vol. 31, pp. 6991–7005, 2022.
- [12] S. Jakubczak and D. Katabi, “SoftCast: One-size-fits-all wireless video,” in Proceedings of the ACM SIGCOMM 2010 conference, 2010, pp. 449–450.
- [13] T.-Y. Tung and D. Gündüz, “SparseCast: Hybrid digital-analog wireless image transmission exploiting frequency-domain sparsity,” IEEE Communications Letters, vol. 22, no. 12, pp. 2451–2454, 2018.
- [14] X. Song, X. Peng, J. Xu, G. Shi, and F. Wu, “Distributed compressive sensing for cloud-based wireless image transmission,” IEEE Transactions on Multimedia, vol. 19, no. 6, pp. 1351–1364, 2017.
- [15] J. Xu, B. Ai, W. Chen, A. Yang, P. Sun, and M. Rodrigues, “Wireless image transmission using deep source channel coding with attention modules,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 4, pp. 2315–2328, 2021.
- [16] W. Shi, F. Jiang, S. Zhang, and D. Zhao, “Deep networks for compressed image sensing,” in 2017 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2017, pp. 877–882.
- [17] W. Shi, F. Jiang, S. Liu, and D. Zhao, “Image compressed sensing using convolutional neural network,” IEEE Transactions on Image Processing, vol. 29, pp. 375–388, 2019.
- [18] Kodak, “Kodak color management,” https://r0k.us/graphics/kodak/, accessed: January 8, 2013.