New identities for the Shannon-Wiener entropy function with applications

Aiden A. Bruen The author gratefully acknowledges the financial support of the National Sciences and Engineering Research Council of Canada Adjunct Research Professor,
Carleton University,
[email protected]

Abstract

We show how the two-variable entropy function $H(p,q)=p\log\frac{1}{p}+q\log\frac{1}{q}$ can be expressed as a linear combination of entropy functions symmetric in $p$ and $q$ involving quotients of polynomials in $p,q$ of degree $n$ for any $n\geq 2$ .

Keywords: entropy, Binary Symmetric Channel, Shannon, Wiener, identities

2020 MSC: 94A15 Information theory, 94A17 Measures of information, entropy, 94A60 Cryptography

1 Introduction

In Renyi’s A Diary on Information Theory, [8], he points out that the entropy formula, i.e.,

H(X)=p_{1}\log\frac{1}{p_{1}}+p_{2}\log\frac{1}{p_{2}}+\cdots+p_{N}\log\frac{1}{p_{N}},

where logs are to the base 2, was arrived at, independently, by Claude Shannon and Norbert Wiener in 1948. This famous formula was the revolutionary precursor of the information age. Renyi goes on to say that,

this formula had already appeared in the work of Boltzmann which is why it is also called the Boltzmann-Shannon Formula. Boltzmann arrived at this formula in connection with a completely different problem. Almost half a century before Shannon he gave essentially the same formula to describe entropy in his investigations of statistical mechanics. He showed that if, in a gas containing a large number of molecules the probabilities of the possible states of the individual molecules are $p_{1},p_{2},\ldots,p_{N}$ then the entropy of the system is $H=c(p_{1}\log\frac{1}{p_{1}}+p_{2}\log\frac{1}{p_{2}}+\cdots+p_{N}\log\frac{1}{p_{N}})$ where $c$ is a constant. (In statistical mechanics the natural logarithm is used and not base 2 …). The entropy of a physical system is the measure of its disorder …

Since 1948 there have been many advances in information theory and entropy. A well-known paper of Dembo, Cover and Thomas is devoted to inequalities in information theory. Here, we concentrate on equalities. We show how a Shannon function $H(p,q)$ can be expanded in infinitely many ways in an infinite series of functions each of which is a linear combination of Shannon functions of the type $H\bigl{(}f(p),g(q)\bigr{)}$ , where $f,g$ are quotients of polynomials of degree $n$ for any $n\geq 2$ . Apart from its intrinsic interest, this new result gives insight into the algorithm in Section 6 for constructing a common secret key between two communicating parties — see also [2], [3], [4], [6].

2 Extensions of a Binary Symmetric Channel

Recall that a binary symmetric channel has input and output symbols drawn from $\{0,1\}$ . We say that there is a common probability $q=1-p$ of any symbol being transmitted incorrectly, independently for each transmitted symbol, $0\leq p\leq 1$ .

We use the channel matrix $P=\left(\begin{array}[]{cc}p&q\\ q&p\end{array}\right)$ . Again, $p$ is the probability of success, i.e., $p$ denotes the probability that 0 is transmitted to 0 and also the probability that 1 gets transmitted to 1. The second extension $P^{(2)}$ of $P$ has alphabet $\{00,01,10,11\}$ and channel matrix

P^{(2)}=\left(\begin{array}[]{cccc}p^{2}&pq&qp&q^{2}\\ pq&p^{2}&q^{2}&qp\\ qp&q^{2}&p^{2}&pq\\ q^{2}&qp&pq&p^{2}\end{array}\right)=\left(\begin{array}[]{cc}pP&qP\\ qP&pP\end{array}\right).

An alternative way to think of an $n^{th}$ extension of a channel $C$ (see Welsh [9]) is to regard it as $n$ copies of $C$ acting independently and in parallel. See Figure 1.

Refer to caption — Figure 1: $n$ copies of $C$ acting independently and in parallel.

Let us assume also that, for $C$ , the input probability of 0 and the input probability of 1 are both equal to $\frac{1}{2}$ .

Theorem 2.1.

Let $\mathbf{X}=(X_{1},\ldots,X_{n})$ , and $\mathbf{Y}=(Y_{1},\ldots,Y_{n})$ denote an input-output pair for $C^{(n)}$ . Then

(a)

$H(\mathbf{X})=H(X_{1},\ldots,X_{n})=n$
(b)

$H(\mathbf{Y})=H(Y_{1},\ldots,Y_{n})=n$
(c)

$H(\mathbf{X}|\mathbf{Y})$ is equal to $nH(p,q)$ .
(d)

The capacity of $C^{(n)}$ is $n\bigl{(}1-H(p,q)\bigr{)}$ .

Proof.

(a) Since, by definition, the $X_{i}$ are independent,

H(\mathbf{X})=H(X_{1})+H(X_{2})+\cdots+H(X_{n}).

We have

	$\displaystyle H(X_{i})$	$\displaystyle=-\biggl{[}Pr(X_{i}=0)\log Pr(X_{i}=0)+Pr(X_{i}=1)\log Pr(X_{i}=1)\biggr{]}$
		$\displaystyle=-\biggl{[}\frac{1}{2}\log\bigl{(}\frac{1}{2}\bigr{)}+\frac{1}{2}\log\bigl{(}\frac{1}{2}\bigr{)}\biggr{]}$
		$\displaystyle=-\log\bigl{(}\frac{1}{2}\bigr{)}$
		$\displaystyle=-[\log 1-\log 2]=\log 2=1.$

Thus $H(\mathbf{X})=n$ .

(b) The value of $Y_{i}$ only depends on $X_{i}$ . The $X_{i}$ are independent. Thus $Y_{1},Y_{2},\ldots,Y_{n}$ are independent. For any $Y_{i}$ we have

Pr(Y_{i}=0)=Pr(X_{i}=0)(p)+Pr(X_{i}=1)(q)=\frac{1}{2}(p+q)=\frac{1}{2}.

Also $Pr(Y_{i}=1)=\frac{1}{2}$ . Then, as for $X_{i}$ , $H(Y_{i})=1$ . Thus $H(\mathbf{Y})=H(Y_{1},Y_{2},\ldots,Y_{n})=\sum_{i}H(Y_{i})=n$ .

(c) We have $H(\mathbf{X})-H(\mathbf{X}|\mathbf{Y})=H(\mathbf{Y})-H(\mathbf{Y}|\mathbf{X})$ . Since $H(\mathbf{X})=H(\mathbf{Y})=n$ we have $H(\mathbf{X}|\mathbf{Y})=H(\mathbf{Y}|\mathbf{X})$ . Now

H(\mathbf{Y}|\mathbf{X})=\sum_{\mathbf{x}}Pr(\mathbf{x})H(\mathbf{Y}\,|\,\mathbf{X}=\mathbf{x}),

where $\mathbf{x}$ denotes a given value of the random vector $\mathbf{X}$ . Since the channel is memoryless,

H(\mathbf{Y}\,|\,\mathbf{X}=\mathbf{x})=\sum_{i}H(Y_{i}\,|\,\mathbf{X}=\mathbf{x})=\sum_{i}H(Y_{i}\,|\,X_{i}=x_{i}).

The last step needs a little work — see [5] Exercise 4.10 or [7] or [3] or the example below for details. Then

	$\displaystyle H(\mathbf{Y}\,\|\,\mathbf{X})$	$\displaystyle=\sum_{\mathbf{x}}Pr(\mathbf{x})\sum_{i}H(Y_{i}\,\|\,X_{i}=x_{i})$
		$\displaystyle=\sum_{i}\sum_{u}H(Y_{i}\,\|\,X_{i}=u)Pr(X_{i}=u).$

Thus

H(\mathbf{Y}\,|\,\mathbf{X})=\sum_{i=1}^{n}H(Y_{i}\,|\,X_{i})=nH(p,q)=H(\mathbf{X}\,|\,\mathbf{Y}).

Example Let $n=2$ and $\left(\begin{array}[]{c}x_{1}\\ x_{2}\end{array}\right)=\left(\begin{array}[]{c}0\\ 1\end{array}\right)$ . Then $\left(\begin{array}[]{c}y_{1}\\ y_{2}\end{array}\right)=\left(\begin{array}[]{c}1\\ 0\end{array}\right)$ or $\left(\begin{array}[]{c}0\\ 1\end{array}\right)$ or $\left(\begin{array}[]{c}1\\ 1\end{array}\right)$ or $\left(\begin{array}[]{c}0\\ 0\end{array}\right)$ . Using independence we get that the coresponding entropy term is $-[q^{2}\log q^{2}+p^{2}\log p^{2}+qp\log qp+pq\log pq]$ . This simplifies to $-2[p\log p+q\log q]=2H(p,q)$ . Note that the probability that $\left(\begin{array}[]{c}x_{1}\\ x_{2}\end{array}\right)=\left(\begin{array}[]{c}0\\ 1\end{array}\right)$ is $\frac{1}{4}$ .

(d) The capacity of $C$ is the maximum value, over all inputs, of $H(X)-H(X\,|\,Y)$ . Since $X$ is random, the input probability of a 1 or 0 is 0.5. This input distribution maximizes $H(X)-H(X\,|\,Y)$ for $C$ , the maximum value being $1-H(p,q)$ . Then the capacity of $C^{(n)}$ is $n(1-H(p,q))$ . It represents the information about $\mathbf{X}$ conveyed by $\mathbf{Y}$ or the amount of information about $\mathbf{Y}$ conveyed by $\mathbf{X}$ . $\blacksquare$

3 An Entropy Equality

First we need some additional discussion on entropy.

A. Extending a basic result.

A fundamental result for random variables $X,Y$ is that $H(X)+H(Y|X)=H(Y)+H(X|Y)$ . A corresponding argument may be used to establish similar identities involving more than two random variables. For example,

	$\displaystyle H(X,Y,Z)$	$\displaystyle=H(X)+H(Y\|X)+H(Z\,\|\,X,Y)$
		$\displaystyle=H(X,Y)+H(Z\,\|\,X,Y).$

Also

H(X,Y,Z)=H(X)+H(Y,Z\,|\,X).

B. From Random Variables to Random Vectors.

For any random variable $X$ taking only a finite number of values with probabilities $p_{1},p_{2},\ldots,\allowbreak p_{n}$ such that

\sum p_{i}=1\qquad\hbox{and}\qquad p_{i}>0\quad(1\leq i\leq n),

we define the entropy of $X$ using the Shannon formula

H(X)=-\sum_{k=1}^{n}p_{k}\log p_{k}=\sum_{k=1}^{n}p_{k}\log\frac{1}{p_{k}}.

Analogously, if $\mathbf{X}$ is a random vector which takes only a finite number of values $\mathbf{u_{1}},\mathbf{u_{2}},\ldots,\allowbreak\mathbf{u_{m}}$ , we define its entropy by the formula

H(\mathbf{X})=-\sum_{k=1}^{m}Pr(\mathbf{u_{k}})\log Pr(\mathbf{u_{k}}).

For example, when $\mathbf{X}$ is a two-dimensional random vector, say $\mathbf{X}=(U,V)$ with $p_{ij}=Pr(U=a_{i},V=b_{j})$ , then we can write

H(\mathbf{X})=H(U,V)=-\sum_{i,j}p_{ij}\log p_{ij}.

Note that $\sum_{i,j}p_{ij}=1$ .

More generally, if $X_{1},X_{2},\ldots,X_{m}$ is a collection of random variables each taking only a finite set of values, then we can regard $\mathbf{X}=(X_{1},X_{2},\ldots,X_{m})$ as a random vector taking a finite set of values and we define the joint entropy of $X_{1},\ldots,X_{m}$ by

	$\displaystyle H(X_{1},X_{2},\ldots,X_{m})$
	$\displaystyle=H(\mathbf{X})$
	$\displaystyle=-\sum Pr(X_{1}=x_{1},X_{2}=x_{2},\ldots,X_{m}=x_{m})\log Pr(X_{1}=x_{1},X_{2}=x_{2},\ldots,X_{m}=x_{m}).$

Standard results for random variables then carry over to random vectors — see [1], [9].

C. The Grouping Axiom for Entropy.

This axiom or identity can shorten calculations. It reads as follows ([9, p. 2], [1, p. 8], [3, Section 9.6]).

Let $p=p_{1}+p_{2}+\cdots+p_{m}$ and $q=q_{1}+q_{2}+\cdots+q_{n}$ where each $p_{i}$ and $q_{j}$ is non-negative. Assume that $p,q$ are positive with $p+q=1$ . Then

	$\displaystyle H(p_{1},p_{2},\ldots,p_{m},q_{1},q_{2},\ldots,q_{n})$
	$\displaystyle\ \ =H(p,q)+pH\left(\frac{p_{1}}{p},\frac{p_{2}}{p},\ldots,\frac{p_{m}}{p}\right)+qH\left(\frac{q_{1}}{q},\frac{q_{2}}{q},\ldots,\frac{q_{n}}{q}\right).$

For example, suppose $m=1$ so $p_{1}=p$ . Then we get

H(p_{1},q_{1},q_{2},\ldots,q_{n})=H(p,q)+qH\left(\frac{q_{1}}{q},\ldots,\frac{q_{n}}{q}\right).

This is because $p_{1}H\bigl{(}\frac{p_{1}}{p_{1}},0\bigr{)}=p_{1}H(1,0)=p_{1}(1\log 1)=p_{1}(0)=0$ .

Theorem 3.1.

Let $\mathbf{X},\mathbf{Y},\mathbf{Z}$ be random vectors such that $H(\mathbf{Z}\,|\,\mathbf{X},\mathbf{Y})=0$ . Then

(a)

$H(\mathbf{X}|\mathbf{Y})=H(\mathbf{X},\mathbf{Z}\,|\,\mathbf{Y})$ .
(b)

$H(\mathbf{X}|\mathbf{Y})=H(\mathbf{X}\,|\,\mathbf{Y},\mathbf{Z})+H(\mathbf{Y}\,|\,\mathbf{Z})$ .

Proof.

	$\displaystyle H(\mathbf{X}\,\|\,\mathbf{Y})$	$\displaystyle=H(\mathbf{X},\mathbf{Y})-H(\mathbf{Y})$
		$\displaystyle=H(\mathbf{X},\mathbf{Y},\mathbf{Z})-H(\mathbf{Z}\,\|\,\mathbf{X},\mathbf{Y})-H(\mathbf{Y})$
		$\displaystyle=H(\mathbf{X},\mathbf{Y},\mathbf{Z})-H(\mathbf{Y})\qquad\hbox{[\,since $H(\mathbf{Z}\,\|\,\mathbf{X},\mathbf{Y})=0$\,]}$
		$\displaystyle=H(\mathbf{Y})+H(\mathbf{X},\mathbf{Z}\,\|\,\mathbf{Y})-H(\mathbf{Y})$
		$\displaystyle=H(\mathbf{X},\mathbf{Z}\,\|\,\mathbf{Y}),\qquad\hbox{proving (a).}$

For (b),

	$\displaystyle H(\mathbf{X}\,\|\,\mathbf{Y})$	$\displaystyle=H(\mathbf{X},\mathbf{Z},\mathbf{Y})-H(\mathbf{Y})\hbox{ from (a)}$
		$\displaystyle=H(\mathbf{X},\mathbf{Z},\mathbf{Y})-H(\mathbf{Y},\mathbf{Z})+H(\mathbf{Y},\mathbf{Z})-H(\mathbf{Y})$
		$\displaystyle=H(\mathbf{X}\,\|\,\mathbf{Y},\mathbf{Z})+H(\mathbf{Z}\,\|\,\mathbf{Y}).$

$\blacksquare$

4 The New Identities

We will use the above identity, i.e.,

H(\mathbf{X}\,|\,\mathbf{Y})=H(\mathbf{X}\,|\,\mathbf{Y},\mathbf{Z})+H(\mathbf{Z}\,|\,\mathbf{Y})

(4.1)

which holds under the assumption that $H(\mathbf{Z}\,|\,\mathbf{X},\mathbf{Y})=0$ . We begin with arrays $A=\left(\begin{array}[]{c}a_{1}\\ a_{2}\\ \vdots\\ a_{n}\end{array}\right)$ , $B=\left(\begin{array}[]{c}b_{1}\\ b_{2}\\ \vdots\\ b_{n}\end{array}\right)$ , where $n$ is even. We assume that $A,B$ are random binary strings subject to the condition that, for each $i$ , we have $Pr(a_{i}=b_{i})=p$ . We also assume that the events $\{(a_{i}=b_{i})\}$ form an independent set. We divide $A,B$ into blocks of size 2.

To start, put $\mathbf{X}=\left(\begin{array}[]{c}x_{1}\\ x_{2}\end{array}\right)$ , $\mathbf{Y}=\left(\begin{array}[]{c}y_{1}\\ y_{2}\end{array}\right)$ , $\mathbf{Z}=x_{1}+x_{2}$ .

Lemma 4.2.

H(\mathbf{Z}\,|\,\mathbf{X},\mathbf{Y})=0.

Proof.

We want to calculate $\sum_{\mathbf{x},\mathbf{y}}H(\mathbf{Z}\,|\,\mathbf{x},\mathbf{y})Pr(X=\mathbf{x},Y=\mathbf{y})$ . Given $\mathbf{x},\mathbf{y}$ , say $\mathbf{x}=\left(\begin{array}[]{c}\alpha_{1}\\ \alpha_{2}\end{array}\right)$ , $\mathbf{y}=\left(\begin{array}[]{c}\beta_{1}\\ \beta_{2}\end{array}\right)$ the value of $\mathbf{Z}$ is $\alpha_{1}+\alpha_{2}$ . There is no uncertainty in the value of $\mathbf{Z}$ given $\mathbf{x},\mathbf{y}$ , i.e., each term in the above sum for $H$ is $H(1,0)=0$ . Therefore $H(\mathbf{Z}\,|\,\mathbf{X},\mathbf{Y})=0$ . $\blacksquare$

From this we can use formula (4.1) for this block of size two. We can think of a channel from $\mathbf{X}$ to $\mathbf{Y}$ (or from $\mathbf{Y}$ to $\mathbf{X}$ ) which is the second extension of a binary symmetric channel where $p$ is the probability of success. We have

H(\mathbf{X}\,|\,\mathbf{Y})=H(\mathbf{X}\,|\,\mathbf{Y},\mathbf{Z})+H(\mathbf{Z}\,|\,\mathbf{Y}).

From Theorem 2.1 part (c) the left side, i.e., $H(\mathbf{X}\,|\,\mathbf{Y})$ is equal to $2H(p,q)$ . Next we calculate the right side beginning with $H(\mathbf{Z}\,|\,\mathbf{Y})$ , i.e., $H\left(\mathbf{Z}\,\Big{|}\,\left(\begin{array}[]{c}y_{1}\\ y_{2}\end{array}\right)\right)$ . We have

	$\displaystyle H(\mathbf{Z}\,\|\,\mathbf{Y})$	$\displaystyle=H\bigl{(}\mathbf{Z}\,\|\,(y_{1}+y_{2}=x_{1}+x_{2})\bigr{)}Pr(y_{1}+y_{2}=x_{1}+x_{2})$
		$\displaystyle\qquad+H\bigl{(}\mathbf{Z}\,\|\,(y_{1}+y_{2}\neq x_{1}+x_{2})\bigr{)}Pr(y_{1}+y_{2}\neq x_{1}+x_{2}).$

We know that $Pr(x_{1}+x_{2}=y_{1}+y_{2})=p^{2}+q^{2}$ and $Pr(x_{1}+x_{2}\neq y_{1}+y_{2})=1-(p^{2}+q^{2})=2pq$ since $p+q=1$ . From the standard formula we have $H(\mathbf{Z}\,|\,\mathbf{Y})=(p^{2}+q^{2})\log\left(\frac{1}{p^{2}+q^{2}}\right)+2pq\log\left(\frac{1}{2pq}\right)$ since $H(\mathbf{Z}\,|\,\mathbf{Y})=H(p^{2}+q^{2},2pq)$ .

Next we calculate

H\left(\left(\begin{array}[]{cc}x_{1}\\ x_{2}\end{array}\right)\,\Big{|}\,\left(\begin{array}[]{c}y_{1}\\ y_{2}\end{array}\right),(x_{1}+x_{2})\right)=H(\mathbf{X}\mid\mathbf{Y},\mathbf{Z}).

Again we have two possibilities, i.e., $y_{1}+y_{2}=x_{1}+x_{2}$ and $y_{1}+y_{2}\neq x_{1}+x_{2}$ . The corresponding probabilities are $p^{2}+q^{2}$ and $2pq$ respectively. We obtain

H(\mathbf{X}|\mathbf{Y},\mathbf{Z})=(p^{2}+q^{2})H\left(\frac{p^{2}}{p^{2}+q^{2}},\frac{q^{2}}{p^{2}+q^{2}}\right)+2pqH\left(\frac{pq}{2pq},\frac{pq}{2pq}\right).

This comes about from the facts that

(a)

If $y_{1}+y_{2}=x_{1}+x_{2}$ then we either have $y_{1}=x_{1}$ and $y_{2}=x_{2}$ or $y_{1}=1-x_{1}$ , $y_{2}=1-x_{2}$ .
(b)

If $y_{1}+y_{2}\neq x_{1}+x_{2}$ then either $y_{1}=x_{1}$ and $y_{2}\neq x_{2}$ or $y_{1}\neq x_{1}$ and $y_{2}=x_{2}$ .
(c)

$H(\frac{1}{2},\frac{1}{2})=1$ .

Then from equation (4.1) we have our first identity as follows

2H(p,q)=(p^{2}+q^{2})H\left(\frac{p^{2}}{p^{2}+q^{2}},\frac{q^{2}}{p^{2}+q^{2}}\right)+2pq+H(p^{2}+q^{2},2pq).

(4.3)

Blocks of Size Three.

Here $\mathbf{X}=\left(\begin{array}[]{c}x_{1}\\ x_{2}\\ x_{3}\end{array}\right)$ , $\mathbf{Y}=\left(\begin{array}[]{c}y_{1}\\ y_{2}\\ y_{3}\end{array}\right)$ , $\mathbf{Z}=x_{1}+x_{2}+x_{3}$ . As in Lemma 4.2 we have $H(\mathbf{Z}\,|\,\mathbf{X},\mathbf{Y})=0$ so we can use formula (4.1) again, i.e.,

H(\mathbf{X}\,|\,\mathbf{Y})=H(\mathbf{X}\,|\,\mathbf{Y},\mathbf{Z})+H(\mathbf{Z}\,|\,\mathbf{Y}).

We have a channel from $\mathbf{X}$ to $\mathbf{Y}$ (or from $\mathbf{Y}$ to $\mathbf{X}$ ) which is the third extension $C^{(3)}$ of a binary symmetric channel $C$ , where $p$ is the probability that 0 (or 1) is transmitted to itself.

From Theorem 2.1 we have $H(\mathbf{X}|\mathbf{Y})=3H(p,q)$ .

Similar to the case of blocks of size 2, we have $H(\mathbf{Z}|\mathbf{Y})=H(p^{3}+3pq^{2},q^{3}+3qp^{2})$ . This is because the probabilities that $Z=y_{1}+y_{2}+y_{3}$ or $Z\neq y_{1}+y_{2}+y_{3}$ are, respectively, $p^{3}+3pq^{2}$ or $q^{3}+3qp^{2}$ , as follows.

If $Z=y_{1}+y_{2}+y_{3}$ , then either $x_{1}=y_{1}$ , $x_{2}=y_{2}$ , $x_{3}=y_{3}$ or else, for some $i$ , $1\leq i\leq 3$ (3 possibilities) $x_{i}=y_{i}$ and, for the other two indices $j,k$ , $x_{j}\neq y_{j}$ and $x_{k}\neq y_{k}$ .

A similar analysis can be carried out for the case where $Z\neq y_{1}+y_{2}+y_{3}$ . We then get $H(\mathbf{X}\,|\,\mathbf{Y},\mathbf{Z})=f(p,q)+f(q,p)$ where

f(p,q)=(p^{3}+3pq^{2})\left\{H\left(\frac{p^{3}}{p^{3}+3pq^{2}},\frac{pq^{2}}{p^{3}+3pq^{2}},\frac{pq^{2}}{p^{3}+3pq^{2}},\frac{pq^{2}}{p^{3}+3pq^{2}}\right)\right\}.

We now use the grouping axiom for $m=1$ . The $p$ in the formula refers to $\frac{p^{3}}{p^{3}+3pq^{2}}$ here and the $q$ there is now replaced by $\frac{3pq^{2}}{p^{3}+3pq^{2}}$ . Then

	$\displaystyle f(p,q)$	$\displaystyle=(p^{3}+3pq^{2})\left\{H\left(\frac{p^{3}}{p^{3}+3pq^{2}},\frac{3pq^{2}}{p^{3}+3pq^{2}}\right)+\frac{3pq^{2}}{p^{3}+3pq^{2}}H(\frac{1}{3},\frac{1}{3},\frac{1}{3})\right\}$
		$\displaystyle=(p^{3}+3pq^{2})H\left(\frac{p^{3}}{p^{3}+3pq^{2}},\frac{3pq^{2}}{p^{3}+3pq^{2}}\right)+3pq^{2}\log 3.$

$f(q,p)$ is obtained by interchanging $p$ with $q$ . We note that, since $p+q=1$ , $3pq^{2}\log 3+3qp^{2}\log 3=3pq\log 3$ .

From working with blocks of size 3 we get

	$\displaystyle 3H(p,q)$	$\displaystyle=H(p^{3}+3pq^{2},q^{3}+3qp^{2})+(p^{3}+3pq^{2})H\left(\frac{p^{3}}{p^{3}+3pq^{2}},\frac{3pq^{2}}{p^{3}+3pq^{2}}\right)$
		$\displaystyle\hphantom{=H(p^{3}+3pq^{2},}+(q^{3}+3qp^{2})H\left(\frac{q^{3}}{q^{3}+3qp^{2}},\frac{3qp^{2}}{q^{3}+3qp^{2}}\right)+3pq\log 3.$		(4.4)

For blocks of size 2 formula (4.3) can be put in a more compact form in terms of capacities, namely,

2\bigl{(}1-H(p,q)\bigr{)}=\biggl{[}1-H(p^{2}+q^{2},2pq)\biggr{]}+\left[(p^{2}+q^{2})\left(1-H\left(\frac{p^{2}}{p^{2}+q^{2}},\frac{q^{2}}{p^{2}+q^{2}}\right)\right)\right].

(4.5)

Using the same method we can find a formula analogous to formulae (4.3), (4) for obtaining $nH(p,q)$ as a linear combination of terms of the form $H(u,v)$ where $u,v$ involve terms in $p^{n},p^{n-2}q^{2},\ldots,q^{n},q^{n-2}p^{2},\ldots$ plus extra terms such as $3pq\log 3$ as in formula (4).

5 Generalizations, an Addition Formula

The result of Theorem 2.1 can be extended to the more general case where we take the product of $n$ binary symmetric channels even if the channel matrices can be different corresponding to differing $p$ -values.

As an example, suppose we use the product of 2 binary symmetric channels with channel matrices

\left(\begin{array}[]{cc}p_{1}&q_{1}\\ q_{1}&p_{1}\end{array}\right),\quad\left(\begin{array}[]{cc}p_{2}&q_{2}\\ q_{2}&p_{2}\end{array}\right).

Then the argument in Section 4 goes through. To avoid being overwhelmed by symbols we made a provisional notation change.

Notation 5.1.

We denote by $h(p)$ the quantity $H(p,q)=p\log\frac{1}{p}+q\log\frac{1}{q}$ .

Then we arrive at the following addition formula

$\displaystyle h(p_{1})+$	$\displaystyle h(p_{2})$
	$\displaystyle=h(p_{1}p_{2}+q_{1}q_{2})+(p_{1}p_{2}+q_{1}q_{2})\,h\left(\frac{p_{1}p_{2}}{p_{1}p_{2}+q_{1}q_{2}}\right)$
	$\displaystyle\qquad+(p_{1}q_{2}+p_{2}q_{1})\,h\left(\frac{p_{1}q_{2}}{p_{1}q_{2}+p_{2}q_{1}}\right).$	(5.2)

Similarly to the above we can derive a formula for $h(p_{1})+h(p_{2})+\cdots+h(p_{n})$ .

6 The Shannon Limit and Applications to Cryptography

The above method of using blocks of various sizes is reminiscent of the algorithm for the key exchange in [4] which relates to earlier work in [2], [6] and others. Indeed the identities above were informed by the details of the algorithm.

The algorithm starts with two arrays $(a_{1},\ldots,a_{n})$ and $(b_{1},\ldots,b_{n})$ . We assume that the set of events $\{a_{i}=b_{i}\}$ is an independent set with $p=Pr(a_{i}=b_{i})$ . We subdivide $A,B$ into corresponding sub-blocks of size $t$ , where $t$ divides $n$ . Exchanging parities by public discussion we end up with new shorter sub-arrays $A_{1},B_{1}$ , where the probabilities of corresponding entries being equal are independent with probability $p_{1}>p$ . Eventually after $m$ iterations we end up with a common secret key $A_{m}=B_{m}$ .

Let us take an example. Start with two binary arrays $(a_{1},\ldots,a_{n})$ and $(b_{1},\ldots,b_{n})$ of length $n$ with $n$ even, $n=2t$ , say. We subdivide the arrays into corresponding blocks of size 2. If the two blocks are $\left(\begin{array}[]{c}a_{1}\\ a_{2}\end{array}\right)$ and $\left(\begin{array}[]{c}b_{1}\\ b_{2}\end{array}\right)$ we discard those blocks if the parities disagree. If the parities agree, which happens with probability $p^{2}+q^{2}$ , we keep $a_{1}$ and $b_{1}$ , discarding $a_{2}$ and $b_{2}$ . Thus, on average, we keep $(p^{2}+q^{2})\frac{n}{2}$ partial blocks and discard $\bigl{[}1-(p^{2}+q^{2})\bigr{]}\frac{n}{2}$ blocks of size 2.

Let us suppose $n=100$ and $p=0.7$ . From Theorem 2.1 part (d), the information that $Y$ has about $X$ , i.e., that $B$ has about $A$ is $100[1-H(0.7,0.3)]\approx 100(1-0.8813)\approx 11.87$ Shannon bits.

We are seeking to find a sub-array of $A,B$ such that corresponding bits are equal. Our method is to publicly exchange parities. The length of this secret key will be at most 11.

Back to the algorithm. $A,B$ keep on average $(50)(p^{2}+q^{2})$ blocks of size 2, i.e., $(50)(0.58)=29$ blocks of size 2. $A$ and $B$ remove the bottom element of each block. We are left with 29 pairs of elements $(a_{1},b_{1})$ . The probability that $a_{1}=b_{1}$ given that $a_{1}+a_{2}=b_{1}+b_{2}$ is $\frac{p^{2}}{p^{2}+q^{2}}$ , i.e., $\frac{(0.7)^{2}}{(0.7)^{2}+(0.3)^{2}}=\frac{0.49}{0.58}\approx 0.845$ . Next, $1-H(0.845,0.155)\approx(1-0.6221)\approx 0.3779$ . To summarize, we started with 100 pairs $(a_{i},b_{i})$ with $Pr(a_{i}=b_{i})=0.7$ . The information revealed to $B$ by $A$ is $(100)(1-H(0.7,0.3))=11.87$ Shannon bits of information. After the first step of the algorithm we are left with 29 pairs $(a_{j},b_{j})$ with $Pr(a_{j}=b_{j})=0.845$ . The amount of information revealed to the remnant of $B$ by the remnant of $A$ is $29(0.3779)\approx 10.96$ Shannon bits of information. So we have “wasted” less than 1 bit, i.e. the wastage is about 8%. Mathematically we have $100\bigl{(}1-H(0.7,0.3)\bigr{)}=29\bigl{(}1-H(0.845,0.155)\bigr{)}+$ Wastage. In general we have

n[1-H(p,q)]=\frac{n}{2}(p^{2}+q^{2})\left[1-H\left(\frac{p^{2}}{p^{2}+q^{2}},\frac{q^{2}}{p^{2}+q^{2}}\right)\right]+W,

where $W$ denotes the wastage. Dividing by $\frac{n}{2}$ we get

2[1-H(p,q)]=(p^{2}+q^{2})\left[1-H\left(\frac{p^{2}}{p^{2}+q^{2}},\frac{q^{2}}{p^{2}+q^{2}}\right)\right]+\frac{2W}{n}.

Comparing with formula(4.5) we see that $W=\frac{n}{2}\bigl{[}1-H(p^{2}+q^{2},2pq)\bigr{]}$ . In this case $W=50\bigl{[}1-H(0.58,0.42)\bigr{]}\approx 50(1-0.9815)=(50)(0.0185)=0.925$ .

To sum up then the new identities tell us exactly how much information was wasted and not utilized. They also tell us, in conjunction with the algorithm, the optimum size of the sub-blocks at each stage.

One of the original motivations for work in coding theory was that the Shannon fundamental theorem showed how capacity was the bound for accurate communication but the problem was to construct linear codes or other codes such as turbo codes that came close to the bound.

Here we have an analogous situation. In the example just considered the maximum length of a cryptographic common secret key obtained as a common subset of $A,B$ is bounded by $n(1-H(p))$ . The problem is to find algorithms which produce such a common secret key coming close to this Shannon bound of $n\bigl{(}1-H(p)\bigr{)}$ .

This work nicely illustrates the inter-connections between codes, cryptography, and information theory. Information theory tells us the bound. The identities tell us the size of the sub-blocks for constructing a common secret key which attains, or gets close to, the information theory bound.

Coding theory is then used to ensure that the two communicating parties have a common secret key by using the hash function described in the algorithm using a code $C$ . Error correction can ensure that the common secret key can be obtained without using another round of the algorithm (thereby shortening the common key) if the difference between the keys of $A$ and $B$ is less than the minimum distance of the dual code of $C$ . This improves on the standard method of checking parities of random subsets of the keys $A_{m},B_{m}$ at the last stage.

7 Concluding Remarks.

1.

Please see Chapter 25 of the forthcoming Cryptography, Information Theory, and Error-Correction: A Handbook for the Twenty-First Century, by Bruen, Forcinito, and McQuillan, [3], for background information, additional details, and related material.
2.

Standard tables for entropy list values to two decimal places. When $p$ is close to 1 interpolation for three decimal places is difficult as $h(p)$ is very steeply sloped near $p=1$ . Formula 4.3 may help since $\frac{p^{2}}{p^{2}+q^{2}}$ is less than $p$ , and the formula can be re-iterated.
3.

In [4] the emphasis is on the situation where the eavesdropper has no initial information. The case where the eavesdropper has initial information is discussed in [6]. In Section 7 of [4] the quoted result uses Renyi entropy rather than Shannon entropy.
4.

The methods in this note suggest possible generalizations which we do not pursue here.

Acknowledgement: The author thanks Drs Mario Forcinito, James McQuillan and David Wehlau for their help and encouragement with this work.

References

[1] R. B. Ash. Information Theory. Dover, 1990.
[2] C. H. Bennett and G. Brassard. Quantum cryptography: Public key distribution and coin tossing. In International Conference on Computers, Systems & Signal Processing, December 1984.
[3] A. A. Bruen, M. A. Forcinito, and J. M. McQuillan. Cryptography, Information Theory, and Error-Correction: A Handbook for the Twenty-First Century. Wiley, second edition, 2021. to appear.
[4] A. A. Bruen, D. L. Wehlau, and M. Forcinito. Error correcting codes, block designs, perfect secrecy and finite fields. Acta Applicandae Mathematica, 93, September 2006.
[5] G. A. Jones and J. M. Jones. Information and Coding Theory. Springer, 2000.
[6] U. Maurer and S. Wolf. Privacy amplification secure against active adversaries. In Advances in Cryptology, Proceedings of Crypto’97, pages 307–321. Springer-Verlag, August 1997.
[7] R. J. McEliece. The Theory of Information and Coding. Addison-Wesley, 1978.
[8] A. Renyi. A Diary on Information Theory. Wiley, 1987.
[9] D. Welsh. Codes and Cryptography. Oxford, 1988.

	$\displaystyle H(\mathbf{X}\,\|\,\mathbf{Y})$	$\displaystyle=H(\mathbf{X},\mathbf{Y})-H(\mathbf{Y})$
		$\displaystyle=H(\mathbf{X},\mathbf{Y},\mathbf{Z})-H(\mathbf{Z}\,\|\,\mathbf{X},\mathbf{Y})-H(\mathbf{Y})$
		$\displaystyle=H(\mathbf{X},\mathbf{Y},\mathbf{Z})-H(\mathbf{Y})\qquad\hbox{[\,since $H(\mathbf{Z}\,\|\,\mathbf{X},\mathbf{Y})=0$\,]}$
		$\displaystyle=H(\mathbf{Y})+H(\mathbf{X},\mathbf{Z}\,\|\,\mathbf{Y})-H(\mathbf{Y})$
		$\displaystyle=H(\mathbf{X},\mathbf{Z}\,\|\,\mathbf{Y}),\qquad\hbox{proving (a).}$