Abstract
In information theory, the well-known log-sum inequality is a fundamental tool which indicates the non-negativity for the relative entropy. In this article, we establish a set of inequalities which are similar to the log-sum inequality involving two functions defined on scalars. The parametric extended log-sum inequalities are shown. We extend these inequalities for the commutative matrices. In addition, utilizing the Löwner partial order relation and the Hansen-Pedersen theory for non-commutative positive semi-definite matrices we demonstrate a number of matrix-inequalities analogous to the log-sum inequality.
Keywords: log-sum inequality; deformed logarithm; convex function; Löwner partial order; Hansen-Pedersen theory
1 Introduction
In information theory, the so-called log-sum inequality is a generalization of Shannon inequality (often also called Klein inequality) which is used to prove the non-negativity of relative entropy. The essence of the non-negativity of the relative entropy is the simple inequality for . Therefore, log-sum inequality is important to study information theory. This is a variant of the Jensen inequality of convex functions, which plays a crucial role for proving the Gibbs’ inequality or the convexity of Kullback-Leibler divergence [1].
Recall that, a function defined on a convex set is said to be a convex function if for all and for all we have . When is the set of real numbers, this inequality is mentioned as the Jensen inequality. In general, we represent the Jensen inequality [2], [3] as
|
|
|
(1) |
A twice-differentiable real valued function on is convex if and only if . Let and be non-negative numbers. As is a convex function, the Jensen inequality (1) suggests
|
|
|
where and . Now, replacing and we obtain,
|
|
|
(2) |
which is the standard log-sum inequality. The inequality is valid for provided as well as . In [4], an analog of log-sum inequality is derived as
|
|
|
where is strictly convex at . The equality holds if and only if , for , and constant number .
Although log sum inequality is important in classical information theory, a detailed study on this topic is not available in the literature, to the best of our knowledge. This article demonstrates a number of inequalities which are analogous to the log-sum inequality. First, we extend the log-sum inequality with two real-valued functions, which is illustrated in Theorem 1. This theorem expands the applicability of the log-sum inequality as it breaks the restriction that and are the real numbers. The convexity of a function is essential for proving log-sum inequality. Considering the concavity of a function, we also find analogous results. Investigating the variants of log-sum inequality for concave functions is crucial as there are classes of functions whose convexity depends on specific values of a parameter, for instance, the -deformed logarithm. We also elaborate the matrix analogs of the log-sum inequality. Here we observe that all the inequalities derived for the real functions can be easily extended as trace-form-inequality for commuting self-adjoint matrices. A recent trend in matrix analysis is constructing matrix theoretic counterparts of the known inequalities for real functions, where the matrix inequality is driven by the Löwner partial order relation. Proving the log-sum inequality in this context is non-trivial and difficult. We are able to derive it under several conditions. To the best of our knowledge, this is the first attempt in literature to extend the log-sum inequality for the non-commutative positive semidefinite matrices.
This article is distributed into five sections. Preliminary concepts are discussed before applying them in a mathematical derivation. In section 2, we consider inequalities for real-valued functions. Here, we discuss the log-sum inequality for deformed logarithms, for instance the -deformed logarithm. Section 3 is dedicated to the discussion of log-sum inequality as a trace-form-inequality for commuting self-adjoint matrices. It has a number of consequences in quantum information theory. In the next section, we attempt to derive the log-sum inequality for Löwner partial order relation. This section extensively uses the idea of operator monotone, operator convex functions, and the operator Jensen inequality, which we mention at the beginning of section 4. Then we conclude the article.
2 log-sum inequalities for real functions
In this section, we extend the idea of the log-sum inequality for real-valued functions. Both concave and convex functions are considered for the investigation. Interestingly, the -deformed logarithm is convex when , which assists us to illustrate two sets of log-sum inequalities. We begin our discussion with the following theorem.
Theorem 1.
Let be a real valued function whose domain contains and , such that for all . Consider another function for which is convex, where and . Then,
|
|
|
(3) |
Proof.
Define and . Note that, , which is a convex combination of for . Clearly, the ratio belongs to the convex set . Now,
|
|
|
As for all and we have and . Now, the Jensen inequality indicates
|
|
|
(4) |
Expanding we get
|
|
|
Combining, we obtain the result.
∎
From now on, we denote and throughout the article, where is a real valued function whose domain of definition contains and , such that for all .
Defining and for , we observe that is a convex function. Then, Theorem 1 leads us to the log-sum inequality mentioned in equation (2).
Theorem 1 expands the scope of applications of the log-sum inequality. The domain of may not be a convex set. We can consider and arbitrarily such that for all . In other words, the range of must have a non-empty intersection with the set of positive reals.
Let be a twice-differentiable function in . The function is convex if and only if
|
|
|
(5) |
If , then any monotone increasing and convex function defined on fulfill equation (5). Now, we are in a position to consider a few special cases of Theorem 1.
Example 1.
Define for some real parameter , and for . As is a convex function applying Theorem 1 we observe
|
|
|
Thus we have
|
|
|
(6) |
Example 2.
The -deformed logarithm or -logarithm [5] is defined by
|
|
|
(7) |
for and . Note that is a convex function for as . The division rule of -deformed logarithm can be expressed as
|
|
|
|
|
(8) |
|
|
|
|
|
Putting for and real parameter as well as with in Theorem 1 we obtain
|
|
|
where and are positive real numbers. Now applying equation (8), we find
|
|
|
Combining we get
|
|
|
(9) |
For the above inequality reduces to
|
|
|
(10) |
Instead of , one may consider trigonometric, exponential, hyperbolic, or any other function to get new inequalities.
Recall that, a real-valued function defined on a convex set is said to be concave if is convex. Applying it in equation (1) observe that for a concave function we have
|
|
|
Considering as a concave function in equation (4), we obtain
|
|
|
under the equivalent conditions on and as well as the real valued functions and as mentioned in Theorem 1.
When in equation (7) we observe that is a concave function. Therefore the inequalities in equation (9) and (10) becomes
|
|
|
(11) |
and
|
|
|
respectively, under the similar conditions on and .
If we find that is a concave function. The equation (2) suggests that
|
|
|
which implies
|
|
|
In general, does not hold. For instance, consider , which refers . This fact leads us to a new inequality which we consider in the following theorem.
Theorem 2.
Let be a real-valued function whose domain contains and , such that for ; and be a function for which is a concave function, where , and . Then,
|
|
|
Proof.
It is easy to find that,
|
|
|
∎
From now on, we use , and throughout this article, where is a real valued function whose domain of definition contains and , such that for all .
We know that a twice-differentiable function is concave if and only if , which indicates
|
|
|
(12) |
Note that, equation (12) and (5) are not equivalent, in general. For example, consider the function , where . We can calculate
|
|
|
Now, equation (12) suggests that is concave when
|
|
|
But, putting the values of and in equation (5) we observe that is concave when
|
|
|
Now, replacing and in Theorem 2 we observe
|
|
|
where and are greater than .
3 log-sum inequalities for commuting self-adjoint matrices
We begin this section with a number of basic concepts of matrix analysis. Given any self-adjoint matrix , there exists a unitary matrix , such that, , where is a diagonal matrix whose diagonal entries, for , are the eigenvalues of . Throughout this article denotes the conjugate transpose of the matrix . If the matrices and commute, that is , then there exists a unitary matrix , such that, and , hold simultaneously. We also denote . Let be a continuous real valued function defined on an interval and be a self-adjoint matrix with eigenvalues in , then
|
|
|
(13) |
Now, we demonstrate the results derived in Section 2 for the self-adjoint matrices.
Theorem 3.
Let and be commuting self-adjoint matrices, with the sets of eigenvalues and , respectively. Also, let be a function, such that, for all . In addition, is a function for which is convex. Then,
|
|
|
Proof.
As and are commuting self-adjoint matrices there exists an unitary matrix , such that,
|
|
|
as for all . Therefore,
|
|
|
which indicates
|
|
|
Note that, are the eigenvalues of the matrix for . Hence,
|
|
|
Also, and . Applying Theorem 1 we obtain
|
|
|
Combining we get the result.
∎
The following corollary holds trivially from the above theorem.
Corollary 1.
Given positive definite, commuting matrices and ,
|
|
|
Proof.
Consider and in Theorem 3. As and are positive definite we have and for all . Note that,
|
|
|
Also, applying and we observe that
|
|
|
Combining we get the result.
∎
Theorem 3 is a matrix-theoretic counterpart of Theorem 1, which assists us to establish a number of matrix inequalities, which are derived in Section 2 for real functions. Some of these matrix inequalities have immediate consequences in quantum information theory. Consider the following example.
Example 3.
The matrix counterpart of equation (6) is represented as
|
|
|
(14) |
where and are self-adjoint commuting matrices as well as is positive definite. Putting in the above inequality we have
|
|
|
Considering , we can prove that . Replacing it at the left hand side [6], [7, Theorem 3.3]
|
|
|
(15) |
Recall that, in quantum information theory [8] a quantum state is represented by a density matrix which is a positive semidefinite, self-adjoint matrix with . The von-Neumann entropy is a well-known measure of quantum information which is given by for a density matrix . Given two density matrices and the quantum relative entropy is determined by .
Considering in equation (15), we find that and are density matrices. Then, the quantum relative entropy given two density matrices and ,
|
|
|
Considering , the identity matrix of order , in equation (14) we observe that
|
|
|
Putting we have . In addition, if is a density matrix, that is, then the above inequality indicates . The von-Neumann entropy of is , which is its maximum value.
Example 4.
The matrix counterpart of equation (9) will be given by
|
|
|
where and are self-adjoint commuting matrices as well as is positive definite. For we have
|
|
|
Note that in the above two inequalities we consider to ensure convexity of . For equation (11) suggests
|
|
|
for self-adjoint commuting matrices and as well as positive definite matrix .
Theorem 2 also indicates a trace inequality, which we mention below without a proof.
Theorem 4.
Let and be commuting self-adjoint matrices, with the sets of eigenvalues and , respectively. Also, let be a function, such that, for all . In addition, is a function for which is a concave. Then,
|
|
|
4 log-sum inequalities with Löwner partial order
Recall that given self-adjoint matrices and we write or if the matrix is positive semidefinite. This ordering is called the Löwner partial order. If is positive definite we write .
Let be the set of all complex matrices of order with eigenvalues in . Equation (13) indicates that we can redefine a function as a matrix function , the set of all complex matrices of order . Let and be two self-adjoint matrices of order with eigenvalues in . Now, a continuous function is said to be matrix monotone of order if implies . Also, is said to be a matrix convex function of order if
|
|
|
for all and every pair of matrices , and of order . An operator monotone function is a matrix monotone function for any natural number . Similarly, an operator convex function is a matrix convex function for any natural number . A function is operator concave if is operator convex. Well known operator monotone functions are for for , as well as for . In addition, is operator convex on for and [9, 10]. The operator Jensen inequality [11, 12] plays a crucial role for further development, which we mention below:
Lemma 1.
([11])
If is a continuous, real function defined on an interval with , the following conditions are equivalent.
-
1.
is operator convex and .
-
2.
for all with and every self-adjoint with spectrum in .
-
3.
for all with and all with spectrum in .
-
4.
for every projection and every self-adjoint with spectrum in .
We utilize Dirac’s bra-ket notation [13] for simplifying our notations. Recall that denotes a column vector of length , or equivalently an matrix. The conjugate transpose of is given by , which is a row vector of length . Note that, is a self-adjoint matrix of order . Now the spectral decomposition of any self-adjoint matrix can be written as where is a normalized eigenvector corresponding to the eigenvalue .
The superadditivity of a mean has been shown in [14] and the corresponding advanced results for a mean and a perspective have been studied in [15, 16]. The following proposition is proven by the elementary calculations under a certain condition.
Proposition 1.
Let and be a two sets of positive definite matrices of order , and , where . Also, let be an operator concave function on an interval containing the eigenvalues of and . Then
|
|
|
Proof.
As each is a self-adjoint operator, the spectral decomposition of can be written as
|
|
|
where is an eigenvector associated to the eigenvalue of .
Then we have
|
|
|
where . Now for a function we have
|
|
|
Combining we get
|
|
|
Summing over we find
|
|
|
(16) |
If , then we have . Now, applying Lemma 1 we find that
|
|
|
Multiplying in both side of the above ineuality we get
|
|
|
Putting the value of from equation (16) we find the result.
∎
Corollary 2.
Let and be two sets of positive definite self-adjoint operators with and , such that, . Also, let be an operator concave function on an interval containing the eigenvalues of and as well as . Then
|
|
|
Proof.
The proof follows trivially from Proposition 1.
∎
If every is expansive (i.e., ), then the condition in Proposition 1 is satisfied. However, we have not obtained a proper result for contractive condition such as . Closing this section, we present a result which does not impose an additional condition on the matrices . We need the following known facts for proving the next theorem:
Lemma 2.
([18])
Let and be bounded linear operators on a Hilbert space . Suppose that and . If is an operator monotone function defined on , then .
Lemma 3.
([9, p.14])
For any square matrix and positive definite matrices we have
|
|
|
Theorem 5.
Let and be two sets of positive definite matrices, as well as and . Also, is an operator monotone function defined on . Then we have
|
|
|
(18) |
and
|
|
|
|
|
|
(19) |
Proof.
We can write
|
|
|
We have , that is . Also for any complex square matrix we have . Applying these together we obtain
|
|
|
As is a matrix monotone function, we can write
|
|
|
From for all , we can see
|
|
|
which implies , since for every operator in general. Thus we have . We also have so that we have the following inequality by Lemma 2,
|
|
|
That is,
|
|
|
Multiplying to the both sides, we have
|
|
|
Taking an inverse of the both sides, we have
|
|
|
(20) |
Thus we have
|
|
|
(21) |
Applying Lemma 3 to the above, we have
|
|
|
|
|
|
(22) |
Combining (21) and (22), we get the inequality (18).
To prove the inequality (19), we start from (20). By multiplying to the both sides in (20), we have
|
|
|
Taking a summation on from to for the both sides in the above inequality, we obtain
|
|
|
That is, we have
|
|
|
|
|
|
(23) |
Applying Lemma 3 to the right hand side in (23), we have
|
|
|
that is,
|
|
|
which implies
|
|
|
|
|
|
(24) |
Combining (23) and (24), we get the inequality (19).
∎