Computation of Rate-Distortion-Perception Functions With Wasserstein Barycenter
Abstract
The nascent field of Rate-Distortion-Perception (RDP) theory is seeing a surge of research interest due to the application of machine learning techniques in the area of lossy compression. The information RDP function characterizes the three-way trade-off between description rate, average distortion, and perceptual quality measured by discrepancy between probability distributions. However, computing RDP functions has been a challenge due to the introduction of the perceptual constraint, and existing research often resorts to data-driven methods. In this paper, we show that the information RDP function can be transformed into a Wasserstein Barycenter problem. The non-strictly convexity brought by the perceptual constraint can be regularized by an entropy regularization term. We prove that the entropy regularized model converges to the original problem. Furthermore, we propose an alternating iteration method based on the Sinkhorn algorithm to numerically solve the regularized optimization problem. Experimental results demonstrate the efficiency and accuracy of the proposed algorithm.
I Introduction
Lossy compression plays a vital role in the communication and storage of images, videos, and audio data [1, 2, 3, 4]. As the cornerstone of lossy compression, the classical Rate-Distortion (RD) theory [5] studies the tradeoff between the bit rate used for representing data and the distortion caused by compression [6]. The reconstruction quality is traditionally measured by a per-letter distortion metric, such as the mean-squared error. However, it has been shown that in practice, minimizing the distortion does not result in perceptually satisfying output for human subjects [7]. Since high perceptual quality may come at the expense of distortion [8, 9], researchers are motivated to extend the RD theory by bringing perception into account [10, 11, 12].
Blau and Michaeli first proposed and studied the information Rate-Distortion-Perception (RDP) functions in [10]. Theoretical solutions with closed form expressions to the RDP problem are often intractable, except for some special cases such as the Gaussian source with squared error distortion and Wasserstein-2 metric perception [11]. Therefore, a computation method for RDP functions is desirable.
Traditionally, the Blahut–Arimoto (BA) algorithm [13, 14] has been successful in the computation of capacities and RD functions. However, to our best knowledge, we have not seen any generalization of the BA algorithm to computing RDP functions. This may be due to the fact that RDP functions have more independent variables than RD functions, while in each alternating iteration step in the BA algorithm, all variables except the updating one need to be fixed. Also, RDP functions own an additional nonlinear constraint on the perception which destroys the original simplex structure and invalidates existing numerical methods. An alternative way to compute RDP functions is based on data-driven methods, such as generative adversarial networks, which minimize a weighted combination of the distortion and the perception [10, 11, 15]. However, these deep learning-based methods often require huge computational resources while having poor generalization ability.
In this paper, we propose a numerical method for computing the RDP functions. Referring to the approach in a recent paper [16] of RD, we reformulate RDP functions in an Optimal Transport (OT) form with an additional constraint on perception. However, compared to RD functions, the introduction of an additional constraint on perceptual quality destroys the origin simplex structure in RDP functions. Thus the Alternating Sinkhorn (AS) algorithm proposed in [16] cannot be applied directly to solving RDP functions. To handle this issue, we prove that the additional constraint can be equivalently converted to a set of linear constraints by introducing an auxiliary variable. Consequently, we observe that the new model appears to be in the form of the celebrated Wasserstein Barycenter problem [17, 18, 19], as it can be viewed as a minimizer over two couplings (i.e., the transition mappings and the newly introduced variable) between Barycenter (i.e., the reconstruction distribution) and the source distribution. Moreover, the objective of the optimization is to compute a weighed distance according to the Wasserstein metric. Our model therefore will be referred to as the Wasserstein Barycenter model for Rate-Distortion-Perception functions (WBM-RDP).
With the operations above, we are able to design an algorithm for the WBM-RDP. In order to tackle the difficulty that the WBM-RDP is not strictly convex, we construct an entropy-regularized formulation of the WBM-RDP. We show that the new form admits a unique optimal solution, and that it converges to the origin WBM-RDP. After obtaining the Lagrangian of the entropy regularized WBM-RDP, we observe that the degrees of freedom therein can be divided into three groups to be optimized alternatively with closed-form solutions. As such, we propose an improved AS algorithm, which effectively combines the advantages of AS algorithm for RD functions [16] and entropy regularized algorithm for Wasserstein Barycenter model [18]. Numerical experiments demonstrate that the proposed algorithm reaches high precision and efficiency under various circumstances.
II RDP functions and WBM-RDP
II-A Rate-Distortion-Perception Functions
Consider a discrete memoryless source and a reconstruction , where are finite alphabets. Suppose and are defined on the probability space We consider the single-letter distortion measure and perception measure between distributions
Definition 1 (The information RDP function [10]).
Given a distortion fidelity and a perception quality , the information RDP function is defined as
(1a) | ||||
s.t. | (1b) | |||
(1c) |
where the minimization is taken over all conditional distributions, and is the mutual information.
Note that the information RDP function (1) degenerates to RD functions when the constraint (1c) is removed.
Since the alphabets and are finite, we denote
and for all . Here is the channel transition mapping. Thus the discrete form of problem (1) can be written as
(2a) | ||||
s.t. | (2b) | |||
(2c) | ||||
(2d) |
II-B Perception Measures
One commonly used measure of perceptual quality is the Wasserstein metric,
(3a) | ||||
(3b) |
where denotes the cost matrix between and . In some cases, the total variation (TV) distance and the Kulback-Leibler (KL) divergence are also considered as a measure of perceptual quality [10]. We will focus on the Wasserstein metric, as generalizations to the TV and KL metrics follow straightforwardly as will be discussed in Section III.
II-C Wasserstein Barycenter Model for RDP functions
Obviously, solving the RDP functions is more difficult than solving the RD function due to the introduction of the new perception constraint (2d). Numerically, we need to frequently update the to ensure that the perception constraint is satisfied. For metrics such as the Wasserstein metric (3), the computational cost for this is high. Moreover, the original feasible domain of RD functions is changed due to the introduction of (2d). The feasible solution of RD functions is inside the simplex structure since (2b) and (2c) are both linear constraints. This is the reason why the BA algorithm and the AS algorithm can solve the RD functions at low costs [16]. However, the perception constraint (2d) breaks the desirable simplex structure, which brings computational challenges. In this section, we develop a new model and an algorithm for solving RDP functions efficiently and accurately. Our starting point is to convert the perception constraint (2d) into a linear constraint in a higher dimensional space. Consequently, all the constraints, (2b)-(2d), also preserve the simplex structure in the higher-dimensional space. This would bring great convenience to our algorithm design. The main idea is summarized into the following theorem:
Theorem 1.
The solution to RDP functions, which is the optimal value of (2), equals the optimal value of the following optimization problem. Meanwhile, the optimal solution of the following optimization problem (4) is the optimal solution of (2):
(4a) | ||||
s.t. | (4b) | |||
(4c) | ||||
(4d) | ||||
(4e) |
Theorem 1 establishes an equivalence between the RDP problem (2) and the optimization (4). Also, we observe that the model (4) has the Wasserstein Barycenter structure. The optimization variable can be regarded as the Barycenter, and and are two couplings between each input and Barycenter . Thus, we have obtained the Wasserstein Barycenter model for RDP functions. In general, the numerical methods for the Wasserstein Barycenter problem are computationally intensive [18, 20]. However, as is not explicitly included in the objective function of (4), it is possible to design a fast and efficient algorithm for the above WBM-RDP.
III Entropy Regularized WBM-RDP and Improved AS Algorithm
The above section establishes the WBM-RDP model (4), which is the first step towards computing the RDP functions. However, there are two difficulties in designing a practical algorithm. First, the WBM-RDP (4) is not strictly convex on . Although the optimal value of (4) is unique, the corresponding optimal solutions may vary in the dimensions of . Therefore, the convergence and the numerical stability of the AS algorithm designed for RD functions [16] cannot be guaranteed. Second, WBM-RDP contains logarithmic terms of the Barycenter objective optimization function , which is not standard formulation of the classical Wasserstein Barycenter problems. In this section, we will overcome these difficulties and improve the Alternating Sinkhorn algorithm to solve WBM-RDP.
III-A Entropy Regularized WBM-RDP
As discussed above, the WBM-RDP (4) is not strictly convex on , the most direct way [21] is to introduce an extra entropy regularized term in the objective optimization function (4a), i.e.,
This leads to the entropy regularized WBM-RDP:
(5a) | ||||
s.t. | (5b) | |||
(5c) | ||||
(5d) | ||||
(5e) |
Here, is a newly introduced regularization parameter. Thus, we get the entropy regularized WBM-RDP with strict convexity. Moreover, during the alternative optimization iterations, closed-form expressions of the dual variables can always be obtained, which accelerates the algorithm while improving the accuracy. Not only that, the following theorem guarantees that the solution to entropy regularized WBM-RDP (5) converges to WBM-RDP (4) as .
III-B The Improved Alternating Sinkhorn Algorithm
We construct the Lagrangian function of the regularized WBM-RDP (5) by introducing dual variables and :
(7) | ||||
Here we note that the Lagrangian function (7) is convex with respect to each variable. Furthermore, we can improve the algorithm by designing the directions of the alternating iterations according to how the variables appear in (7). Next, we sketch the main ingredients of our algorithm. The pseudo-code is presented in Algorithm 1.
-
1)
Update and associated dual variables while fixing as constant parameters. First, we can update and :
(8) where and . Second, we update by a root-searching operation on the following monotonic single-variable function on :
(9) -
2)
Update and associated dual variables while fixing as constant parameters. We alternatively update :
(10) where and . Again, we can update by the root-searching operation on the following monotonic single-variable function on :
(11) -
3)
Update and associated dual variables while fixing as constant parameters. We can update by finding the root of the following single-variable function on its largest monotone interval :
(12) and we finally get
(13)
We stress that the three steps above do not contain inner iterations, because is not explicitly included in the objective optimization function of the WBM-RDP as discussed above. Therefore, compared to other existing algorithms for solving the Wasserstein Barycenter, our proposed algorithm gain much efficiency and simplicity. On the other hand, our proposed algorithm adopts a similar alternating iteration technique to the Alternating Sinkhorn algorithm for RD functions. However, a vital operation we conduct is altering the projection direction of different joint distribution variables, i.e., and , which is essentially different from the AS algorithm for RD functions. Thus we name it the Improved Alternating Sinkhorn Algorithm.
Remark 1.
Remark 2.
If the perception measure in (2d) is substituted by KL divergence, our improved AS algorithm would be simpler. The Sinkhorn iteration in step 2) can be omitted since does not need to be introduced. In step 2) the is substituted by
where and is substituted by
and .
IV Numerical Experiment
In this section, we numerically study the validity of the WBM-RDP and the improved AS algorithm. All the experiments are conducted on a PC with 8G RAM, and one 11th Gen Intel (R) Core (TM) i5-1135G7 CPU @2.40GHz.
We compute RDP functions under two settings with different perception measures: one is the binary source with Hamming distortion [10], and the other is the Gaussian source with squared error distortion [11]. Moreover, the above two settings have analytical expressions [10, 11] when the perceptual constraints are TV distance and Wasserstein-2 metric, respectively.
For the binary source, we can directly compute the result since we can set the discrete distribution beforehand. As for Gaussian source, we first truncate the sources into an interval and then discretize it by a set of uniform grid points whose adjacent spacing is , i.e.,
The corresponding distribution of the Gaussian source can then be denoted by
(14) |
where denotes the distribution of the Gaussian source. Unless otherwise specified, we take for the binary source and for the Gaussian source. For our method, we set . Additionally, we note that the space complexity is with the dimension of the above discretized Gaussian source. The following results will suggest is precise enough for such discretization.

In Fig. 1, we plot RDP curves given by our method under different perception parameter and compare them to the results with known theoretical expression. The results obtained by our method match well the analytic expression in Fig. 1 a) and c). Furthermore, we can also plot the results where the analytical solution is not known in Fig. 1 b) and d). We also plot the 3D diagram of RDP surface in Fig. 2. For the Gaussian source, we set for visual effect. The results are in accord with those derived from data-driven methods in [10].

IV-A Algorithm Convergence Verification
We verify the convergence of the improved AS algorithm in this subsection. Here we consider the residual errors of the Karish-Kuhn-Tucker (KKT) condition of the optimization problem (5) to be the indicator of convergence. We define residual errors as and ,,,,, can be defined similarly. We define the overall residual error to be the root mean square of the above residual errors.
In Fig. 3, we respectively output the convergent trajectories of of the binary source with TV distance and Gaussian source with Wasserstein-2 metric against iteration numbers. For the binary source, we set . For the Gaussian source, we set . The results show different convergence behaviors with different distortion parameters, and all of these cases will converge below at last.

IV-B The Regularization Parameter
We have theoretically demonstrated that when the entropy regularized problem converges to the original problem. However, according to the general results of entropy regularized OT problem, problems with smaller have higher precision but require more computation [20]. Moreover, according to the theory on Sinkhorn [20], an excessive might trigger numerical stability problems. Thus we wish to investigate the impact of on the WBM-RDP.
Results are shown in Table I, where the error represents the difference between the algorithm results and the explicit results. Here for the binary source with TV distance, we set . And we set for the Gaussian source with Wasserstein-2 metric. When decreases, the computational time rises while the results are more accurate regardless of the source. Furthermore, for in both sources, the small causes numerical instability. Therefore, from a practical perspective, seems to be an ideal choice for RDP functions, as it ensures a certain level of accuracy and does not consume too much time.
Time (s) | Error | ||
---|---|---|---|
Binary source | 1.00E-01 | 6.16 | 4.36E-04 |
5.00E-02 | 6.50 | 8.75E-05 | |
1.00E-02 | 8.69 | 5.41E-06 | |
5.00E-03 | 10.62 | 3.47E-06 | |
1.00E-04 | - | - | |
Gaussian source | 1.00E-01 | 70.01 | 1.49E-02 |
5.00E-02 | 74.87 | 7.30E-03 | |
1.00E-02 | 99.56 | 2.30E-03 | |
5.00E-03 | 118.97 | 1.80E-03 | |
1.00E-04 | - | - |
V Conclusion
In this work, we present a new scheme for computing the information Rate-Distortion-Perception functions. We convert the original problem to a Wasserstein Barycenter model for Rate-Distortion-Perception functions. Furthermore, we propose the improved Alternating Sinkhorn method with entropy regularization to solve the optimization problem. Numerical experiments show that our algorithm performs with high accuracy and efficiency. Extensions to properties of RDP functions and implementation to practical lossy compression schemes will be reported in the journal version.
References
- [1] F. Mentzer, G. Toderici, M. Tschannen, and E. Agustsson, “High-fidelity generative image compression,” in Advances in Neural Information Processing Systems (NeurIPS 2020), H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, and H. Lin, Eds., vol. 33, Dec. 2020, pp. 11 913–11 924.
- [2] S. Ma, X. Zhang, C. Jia, Z. Zhao, S. Wang, and S. Wang, “Image and video compression with neural networks: A review,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1683–1698, 2020.
- [3] G. Lu, X. Zhang, W. Ouyang, L. Chen, Z. Gao, and D. Xu, “An end-to-end learning framework for video compression,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, pp. 3292–3308, 2020.
- [4] N. Zeghidour, A. Luebs, A. Omran, J. Skoglund, and M. Tagliasacchi, “Soundstream: An end-to-end neural audio codec,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 495–507, 2022.
- [5] C. E. Shannon et al., “Coding Theorems for a Discrete Source with a Fidelity Criterion,” Institute of Radio Engineers International Convention Record, vol. 4, no. 142-163, p. 1, Mar. 1959.
- [6] T. M. Cover and J. A. Thomas, Elements of Information Theory. Wiley-Interscience, 2006.
- [7] S. Santurkar, D. M. Budden, and N. Shavit, “Generative Compression,” in 2018 Picture Coding Symposium (PCS), San Francisco, California, USA, Jun. 2018, pp. 1–5.
- [8] E. Agustsson, M. Tschannen, F. Mentzer, R. Timofte, and L. V. Gool, “Generative Adversarial Networks for Extreme Learned Image Compression,” in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, South Korea, Oct.-Nov. 2019, pp. 221–231.
- [9] Y. Blau and T. Michaeli, “The Perception-Distortion Tradeoff,” in 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA, Jun. 2018, pp. 6228–6237.
- [10] Y. Blau and T. Michaeli, “Rethinking Lossy Compression: The Rate-Distortion-Perception Tradeoff,” in 36th International Conference on Machine Learning (ICML), Long Beach, California, USA, Jun. 2019, pp. 675–685.
- [11] G. Zhang, J. Qian, J. Chen, and A. Khisti, “Universal Rate-Distortion-Perception Representations for Lossy Compression,” in Advances in Neural Information Processing Systems (NIPS 2021), vol. 34, Dec. 2021, pp. 11 517–11 529.
- [12] X. Niu, D. Gündüz, B. Bai, and W. Han, “Conditional rate-distortion-perception trade-off,” in 2023 IEEE International Symposium on Information Theory (ISIT). (to appear), 2023.
- [13] S. Arimoto, “An Algorithm for Computing the Capacity of Arbitrary Discrete Memoryless Channels,” IEEE Transactions on Information Theory, vol. 18, no. 1, pp. 14–20, Jan. 1972.
- [14] R. E. Blahut, “Computation of Channel Capacity and Rate-Distortion Functions,” IEEE Transactions on Information Theory, vol. 18, no. 4, pp. 460–473, Jan. 1972.
- [15] O. Kirmemis and A. M. Tekalp, “A Practical Approach for Rate-Distortion-Perception Analysis in Learned Image Compression,” in 2021 Picture Coding Symposium (PCS), Bristol, United Kingdom, Jun. 2021, pp. 1–5.
- [16] S. Wu, W. Ye, H. Wu, H. Wu, W. Zhang, and B. Bai, “A Communication Optimal Transport Approach to the Computation of Rate Distortion Functions,” arXiv preprint arXiv:2212.10098, 2022.
- [17] B. Pass, “Multi-marginal Optimal Transport: Theory and Applications,” ESAIM: Mathematical Modelling and Numerical Analysis, vol. 49, no. 6, pp. 1771–1790, Feb. 2015.
- [18] M. Cuturi and A. Doucet, “Fast Computation of Wasserstein Barycenters,” in 31th International Conference on Machine Learning (ICML), vol. 32, Beijing, China, Jun. 2014, pp. 685–693.
- [19] M. Agueh and G. Carlier, “Barycenters in the Wasserstein Space,” Society for Industrial and Applied Mathematics Journal on Mathematical Analysis, vol. 43, no. 2, pp. 904–924, Jan. 2011.
- [20] G. Peyré, M. Cuturi et al., “Computational Optimal Transport: With Applications to Data Science,” Foundations and Trends® in Machine Learning, vol. 11, no. 5-6, pp. 355–607, Feb. 2019.
- [21] M. Nutz and J. Wiesel, “Entropic Optimal Transport: Convergence of Potentials,” Probability Theory and Related Fields, vol. 184, no. 1, pp. 401–424, Nov. 2022.
- [22] C. Villani, Optimal Transport: Old and New. Springer, 2009, vol. 338.