This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

New identities for the Shannon-Wiener entropy function with applications

Aiden A. Bruen The author gratefully acknowledges the financial support of the National Sciences and Engineering Research Council of Canada Adjunct Research Professor,
Carleton University,
[email protected]
Abstract

We show how the two-variable entropy function H(p,q)=plog1p+qlog1qH(p,q)=p\log\frac{1}{p}+q\log\frac{1}{q} can be expressed as a linear combination of entropy functions symmetric in pp and qq involving quotients of polynomials in p,qp,q of degree nn for any n2n\geq 2.


Keywords: entropy, Binary Symmetric Channel, Shannon, Wiener, identities

2020 MSC: 94A15 Information theory, 94A17 Measures of information, entropy, 94A60 Cryptography

1 Introduction

In Renyi’s A Diary on Information Theory, [8], he points out that the entropy formula, i.e.,

H(X)=p1log1p1+p2log1p2++pNlog1pN,H(X)=p_{1}\log\frac{1}{p_{1}}+p_{2}\log\frac{1}{p_{2}}+\cdots+p_{N}\log\frac{1}{p_{N}},

where logs are to the base 2, was arrived at, independently, by Claude Shannon and Norbert Wiener in 1948. This famous formula was the revolutionary precursor of the information age. Renyi goes on to say that,

  • this formula had already appeared in the work of Boltzmann which is why it is also called the Boltzmann-Shannon Formula. Boltzmann arrived at this formula in connection with a completely different problem. Almost half a century before Shannon he gave essentially the same formula to describe entropy in his investigations of statistical mechanics. He showed that if, in a gas containing a large number of molecules the probabilities of the possible states of the individual molecules are p1,p2,,pNp_{1},p_{2},\ldots,p_{N} then the entropy of the system is H=c(p1log1p1+p2log1p2++pNlog1pN)H=c(p_{1}\log\frac{1}{p_{1}}+p_{2}\log\frac{1}{p_{2}}+\cdots+p_{N}\log\frac{1}{p_{N}}) where cc is a constant. (In statistical mechanics the natural logarithm is used and not base 2 …). The entropy of a physical system is the measure of its disorder …

Since 1948 there have been many advances in information theory and entropy. A well-known paper of Dembo, Cover and Thomas is devoted to inequalities in information theory. Here, we concentrate on equalities. We show how a Shannon function H(p,q)H(p,q) can be expanded in infinitely many ways in an infinite series of functions each of which is a linear combination of Shannon functions of the type H(f(p),g(q))H\bigl{(}f(p),g(q)\bigr{)}, where f,gf,g are quotients of polynomials of degree nn for any n2n\geq 2. Apart from its intrinsic interest, this new result gives insight into the algorithm in Section 6 for constructing a common secret key between two communicating parties — see also [2], [3], [4], [6].

2 Extensions of a Binary Symmetric Channel

Recall that a binary symmetric channel has input and output symbols drawn from {0,1}\{0,1\}. We say that there is a common probability q=1pq=1-p of any symbol being transmitted incorrectly, independently for each transmitted symbol, 0p10\leq p\leq 1.

We use the channel matrix P=(pqqp)P=\left(\begin{array}[]{cc}p&q\\ q&p\end{array}\right). Again, pp is the probability of success, i.e., pp denotes the probability that 0 is transmitted to 0 and also the probability that 1 gets transmitted to 1. The second extension P(2)P^{(2)} of PP has alphabet {00,01,10,11}\{00,01,10,11\} and channel matrix

P(2)=(p2pqqpq2pqp2q2qpqpq2p2pqq2qppqp2)=(pPqPqPpP).P^{(2)}=\left(\begin{array}[]{cccc}p^{2}&pq&qp&q^{2}\\ pq&p^{2}&q^{2}&qp\\ qp&q^{2}&p^{2}&pq\\ q^{2}&qp&pq&p^{2}\end{array}\right)=\left(\begin{array}[]{cc}pP&qP\\ qP&pP\end{array}\right).

An alternative way to think of an nthn^{th} extension of a channel CC (see Welsh [9]) is to regard it as nn copies of CC acting independently and in parallel. See Figure 1.

Refer to caption
Figure 1: nn copies of CC acting independently and in parallel.

Let us assume also that, for CC, the input probability of 0 and the input probability of 1 are both equal to 12\frac{1}{2}.

Theorem 2.1.

Let 𝐗=(X1,,Xn)\mathbf{X}=(X_{1},\ldots,X_{n}), and 𝐘=(Y1,,Yn)\mathbf{Y}=(Y_{1},\ldots,Y_{n}) denote an input-output pair for C(n)C^{(n)}. Then

  1. (a)

    H(𝐗)=H(X1,,Xn)=nH(\mathbf{X})=H(X_{1},\ldots,X_{n})=n

  2. (b)

    H(𝐘)=H(Y1,,Yn)=nH(\mathbf{Y})=H(Y_{1},\ldots,Y_{n})=n

  3. (c)

    H(𝐗|𝐘)H(\mathbf{X}|\mathbf{Y}) is equal to nH(p,q)nH(p,q).

  4. (d)

    The capacity of C(n)C^{(n)} is n(1H(p,q))n\bigl{(}1-H(p,q)\bigr{)}.

  • Proof.

    (a) Since, by definition, the XiX_{i} are independent,

    H(𝐗)=H(X1)+H(X2)++H(Xn).H(\mathbf{X})=H(X_{1})+H(X_{2})+\cdots+H(X_{n}).

    We have

    H(Xi)\displaystyle H(X_{i}) =[Pr(Xi=0)logPr(Xi=0)+Pr(Xi=1)logPr(Xi=1)]\displaystyle=-\biggl{[}Pr(X_{i}=0)\log Pr(X_{i}=0)+Pr(X_{i}=1)\log Pr(X_{i}=1)\biggr{]}
    =[12log(12)+12log(12)]\displaystyle=-\biggl{[}\frac{1}{2}\log\bigl{(}\frac{1}{2}\bigr{)}+\frac{1}{2}\log\bigl{(}\frac{1}{2}\bigr{)}\biggr{]}
    =log(12)\displaystyle=-\log\bigl{(}\frac{1}{2}\bigr{)}
    =[log1log2]=log2=1.\displaystyle=-[\log 1-\log 2]=\log 2=1.

    Thus H(𝐗)=nH(\mathbf{X})=n.

    (b) The value of YiY_{i} only depends on XiX_{i}. The XiX_{i} are independent. Thus Y1,Y2,,YnY_{1},Y_{2},\ldots,Y_{n} are independent. For any YiY_{i} we have

    Pr(Yi=0)=Pr(Xi=0)(p)+Pr(Xi=1)(q)=12(p+q)=12.Pr(Y_{i}=0)=Pr(X_{i}=0)(p)+Pr(X_{i}=1)(q)=\frac{1}{2}(p+q)=\frac{1}{2}.

    Also Pr(Yi=1)=12Pr(Y_{i}=1)=\frac{1}{2}. Then, as for XiX_{i}, H(Yi)=1H(Y_{i})=1. Thus H(𝐘)=H(Y1,Y2,,Yn)=iH(Yi)=nH(\mathbf{Y})=H(Y_{1},Y_{2},\ldots,Y_{n})=\sum_{i}H(Y_{i})=n.

(c) We have H(𝐗)H(𝐗|𝐘)=H(𝐘)H(𝐘|𝐗)H(\mathbf{X})-H(\mathbf{X}|\mathbf{Y})=H(\mathbf{Y})-H(\mathbf{Y}|\mathbf{X}). Since H(𝐗)=H(𝐘)=nH(\mathbf{X})=H(\mathbf{Y})=n we have H(𝐗|𝐘)=H(𝐘|𝐗)H(\mathbf{X}|\mathbf{Y})=H(\mathbf{Y}|\mathbf{X}). Now

H(𝐘|𝐗)=𝐱Pr(𝐱)H(𝐘|𝐗=𝐱),H(\mathbf{Y}|\mathbf{X})=\sum_{\mathbf{x}}Pr(\mathbf{x})H(\mathbf{Y}\,|\,\mathbf{X}=\mathbf{x}),

where 𝐱\mathbf{x} denotes a given value of the random vector 𝐗\mathbf{X}. Since the channel is memoryless,

H(𝐘|𝐗=𝐱)=iH(Yi|𝐗=𝐱)=iH(Yi|Xi=xi).H(\mathbf{Y}\,|\,\mathbf{X}=\mathbf{x})=\sum_{i}H(Y_{i}\,|\,\mathbf{X}=\mathbf{x})=\sum_{i}H(Y_{i}\,|\,X_{i}=x_{i}).

The last step needs a little work — see [5] Exercise 4.10 or [7] or [3] or the example below for details. Then

H(𝐘|𝐗)\displaystyle H(\mathbf{Y}\,|\,\mathbf{X}) =𝐱Pr(𝐱)iH(Yi|Xi=xi)\displaystyle=\sum_{\mathbf{x}}Pr(\mathbf{x})\sum_{i}H(Y_{i}\,|\,X_{i}=x_{i})
=iuH(Yi|Xi=u)Pr(Xi=u).\displaystyle=\sum_{i}\sum_{u}H(Y_{i}\,|\,X_{i}=u)Pr(X_{i}=u).

Thus

H(𝐘|𝐗)=i=1nH(Yi|Xi)=nH(p,q)=H(𝐗|𝐘).H(\mathbf{Y}\,|\,\mathbf{X})=\sum_{i=1}^{n}H(Y_{i}\,|\,X_{i})=nH(p,q)=H(\mathbf{X}\,|\,\mathbf{Y}).

Example  Let n=2n=2 and (x1x2)=(01)\left(\begin{array}[]{c}x_{1}\\ x_{2}\end{array}\right)=\left(\begin{array}[]{c}0\\ 1\end{array}\right). Then (y1y2)=(10)\left(\begin{array}[]{c}y_{1}\\ y_{2}\end{array}\right)=\left(\begin{array}[]{c}1\\ 0\end{array}\right) or (01)\left(\begin{array}[]{c}0\\ 1\end{array}\right) or (11)\left(\begin{array}[]{c}1\\ 1\end{array}\right) or (00)\left(\begin{array}[]{c}0\\ 0\end{array}\right). Using independence we get that the coresponding entropy term is [q2logq2+p2logp2+qplogqp+pqlogpq]-[q^{2}\log q^{2}+p^{2}\log p^{2}+qp\log qp+pq\log pq]. This simplifies to 2[plogp+qlogq]=2H(p,q)-2[p\log p+q\log q]=2H(p,q). Note that the probability that (x1x2)=(01)\left(\begin{array}[]{c}x_{1}\\ x_{2}\end{array}\right)=\left(\begin{array}[]{c}0\\ 1\end{array}\right) is 14\frac{1}{4}.

(d) The capacity of CC is the maximum value, over all inputs, of H(X)H(X|Y)H(X)-H(X\,|\,Y). Since XX is random, the input probability of a 1 or 0 is 0.5. This input distribution maximizes H(X)H(X|Y)H(X)-H(X\,|\,Y) for CC, the maximum value being 1H(p,q)1-H(p,q). Then the capacity of C(n)C^{(n)} is n(1H(p,q))n(1-H(p,q)). It represents the information about 𝐗\mathbf{X} conveyed by 𝐘\mathbf{Y} or the amount of information about 𝐘\mathbf{Y} conveyed by 𝐗\mathbf{X}. \blacksquare

3 An Entropy Equality

First we need some additional discussion on entropy.


A. Extending a basic result.

A fundamental result for random variables X,YX,Y is that H(X)+H(Y|X)=H(Y)+H(X|Y)H(X)+H(Y|X)=H(Y)+H(X|Y). A corresponding argument may be used to establish similar identities involving more than two random variables. For example,

H(X,Y,Z)\displaystyle H(X,Y,Z) =H(X)+H(Y|X)+H(Z|X,Y)\displaystyle=H(X)+H(Y|X)+H(Z\,|\,X,Y)
=H(X,Y)+H(Z|X,Y).\displaystyle=H(X,Y)+H(Z\,|\,X,Y).

Also

H(X,Y,Z)=H(X)+H(Y,Z|X).H(X,Y,Z)=H(X)+H(Y,Z\,|\,X).

B. From Random Variables to Random Vectors.

For any random variable XX taking only a finite number of values with probabilities p1,p2,,pnp_{1},p_{2},\ldots,\allowbreak p_{n} such that

pi=1andpi>0(1in),\sum p_{i}=1\qquad\hbox{and}\qquad p_{i}>0\quad(1\leq i\leq n),

we define the entropy of XX using the Shannon formula

H(X)=k=1npklogpk=k=1npklog1pk.H(X)=-\sum_{k=1}^{n}p_{k}\log p_{k}=\sum_{k=1}^{n}p_{k}\log\frac{1}{p_{k}}.

Analogously, if 𝐗\mathbf{X} is a random vector which takes only a finite number of values 𝐮𝟏,𝐮𝟐,,𝐮𝐦\mathbf{u_{1}},\mathbf{u_{2}},\ldots,\allowbreak\mathbf{u_{m}}, we define its entropy by the formula

H(𝐗)=k=1mPr(𝐮𝐤)logPr(𝐮𝐤).H(\mathbf{X})=-\sum_{k=1}^{m}Pr(\mathbf{u_{k}})\log Pr(\mathbf{u_{k}}).

For example, when 𝐗\mathbf{X} is a two-dimensional random vector, say 𝐗=(U,V)\mathbf{X}=(U,V) with pij=Pr(U=ai,V=bj)p_{ij}=Pr(U=a_{i},V=b_{j}), then we can write

H(𝐗)=H(U,V)=i,jpijlogpij.H(\mathbf{X})=H(U,V)=-\sum_{i,j}p_{ij}\log p_{ij}.

Note that i,jpij=1\sum_{i,j}p_{ij}=1.

More generally, if X1,X2,,XmX_{1},X_{2},\ldots,X_{m} is a collection of random variables each taking only a finite set of values, then we can regard 𝐗=(X1,X2,,Xm)\mathbf{X}=(X_{1},X_{2},\ldots,X_{m}) as a random vector taking a finite set of values and we define the joint entropy of X1,,XmX_{1},\ldots,X_{m} by

H(X1,X2,,Xm)\displaystyle H(X_{1},X_{2},\ldots,X_{m})
=H(𝐗)\displaystyle=H(\mathbf{X})
=Pr(X1=x1,X2=x2,,Xm=xm)logPr(X1=x1,X2=x2,,Xm=xm).\displaystyle=-\sum Pr(X_{1}=x_{1},X_{2}=x_{2},\ldots,X_{m}=x_{m})\log Pr(X_{1}=x_{1},X_{2}=x_{2},\ldots,X_{m}=x_{m}).

Standard results for random variables then carry over to random vectors — see [1], [9].


C. The Grouping Axiom for Entropy.

This axiom or identity can shorten calculations. It reads as follows ([9, p. 2], [1, p. 8], [3, Section 9.6]).

Let p=p1+p2++pmp=p_{1}+p_{2}+\cdots+p_{m} and q=q1+q2++qnq=q_{1}+q_{2}+\cdots+q_{n} where each pip_{i} and qjq_{j} is non-negative. Assume that p,qp,q are positive with p+q=1p+q=1. Then

H(p1,p2,,pm,q1,q2,,qn)\displaystyle H(p_{1},p_{2},\ldots,p_{m},q_{1},q_{2},\ldots,q_{n})
=H(p,q)+pH(p1p,p2p,,pmp)+qH(q1q,q2q,,qnq).\displaystyle\ \ =H(p,q)+pH\left(\frac{p_{1}}{p},\frac{p_{2}}{p},\ldots,\frac{p_{m}}{p}\right)+qH\left(\frac{q_{1}}{q},\frac{q_{2}}{q},\ldots,\frac{q_{n}}{q}\right).

For example, suppose m=1m=1 so p1=pp_{1}=p. Then we get

H(p1,q1,q2,,qn)=H(p,q)+qH(q1q,,qnq).H(p_{1},q_{1},q_{2},\ldots,q_{n})=H(p,q)+qH\left(\frac{q_{1}}{q},\ldots,\frac{q_{n}}{q}\right).

This is because p1H(p1p1,0)=p1H(1,0)=p1(1log1)=p1(0)=0p_{1}H\bigl{(}\frac{p_{1}}{p_{1}},0\bigr{)}=p_{1}H(1,0)=p_{1}(1\log 1)=p_{1}(0)=0.


Theorem 3.1.

Let 𝐗,𝐘,𝐙\mathbf{X},\mathbf{Y},\mathbf{Z} be random vectors such that H(𝐙|𝐗,𝐘)=0H(\mathbf{Z}\,|\,\mathbf{X},\mathbf{Y})=0. Then

  1. (a)

    H(𝐗|𝐘)=H(𝐗,𝐙|𝐘)H(\mathbf{X}|\mathbf{Y})=H(\mathbf{X},\mathbf{Z}\,|\,\mathbf{Y}).

  2. (b)

    H(𝐗|𝐘)=H(𝐗|𝐘,𝐙)+H(𝐘|𝐙)H(\mathbf{X}|\mathbf{Y})=H(\mathbf{X}\,|\,\mathbf{Y},\mathbf{Z})+H(\mathbf{Y}\,|\,\mathbf{Z}).

  • Proof.
    H(𝐗|𝐘)\displaystyle H(\mathbf{X}\,|\,\mathbf{Y}) =H(𝐗,𝐘)H(𝐘)\displaystyle=H(\mathbf{X},\mathbf{Y})-H(\mathbf{Y})
    =H(𝐗,𝐘,𝐙)H(𝐙|𝐗,𝐘)H(𝐘)\displaystyle=H(\mathbf{X},\mathbf{Y},\mathbf{Z})-H(\mathbf{Z}\,|\,\mathbf{X},\mathbf{Y})-H(\mathbf{Y})
    =H(𝐗,𝐘,𝐙)H(𝐘)[ since H(𝐙|𝐗,𝐘)=0 ]\displaystyle=H(\mathbf{X},\mathbf{Y},\mathbf{Z})-H(\mathbf{Y})\qquad\hbox{[\,since $H(\mathbf{Z}\,|\,\mathbf{X},\mathbf{Y})=0$\,]}
    =H(𝐘)+H(𝐗,𝐙|𝐘)H(𝐘)\displaystyle=H(\mathbf{Y})+H(\mathbf{X},\mathbf{Z}\,|\,\mathbf{Y})-H(\mathbf{Y})
    =H(𝐗,𝐙|𝐘),proving (a).\displaystyle=H(\mathbf{X},\mathbf{Z}\,|\,\mathbf{Y}),\qquad\hbox{proving (a).}

    For (b),

    H(𝐗|𝐘)\displaystyle H(\mathbf{X}\,|\,\mathbf{Y}) =H(𝐗,𝐙,𝐘)H(𝐘) from (a)\displaystyle=H(\mathbf{X},\mathbf{Z},\mathbf{Y})-H(\mathbf{Y})\hbox{ from (a)}
    =H(𝐗,𝐙,𝐘)H(𝐘,𝐙)+H(𝐘,𝐙)H(𝐘)\displaystyle=H(\mathbf{X},\mathbf{Z},\mathbf{Y})-H(\mathbf{Y},\mathbf{Z})+H(\mathbf{Y},\mathbf{Z})-H(\mathbf{Y})
    =H(𝐗|𝐘,𝐙)+H(𝐙|𝐘).\displaystyle=H(\mathbf{X}\,|\,\mathbf{Y},\mathbf{Z})+H(\mathbf{Z}\,|\,\mathbf{Y}).

    \blacksquare

4 The New Identities

We will use the above identity, i.e.,

H(𝐗|𝐘)=H(𝐗|𝐘,𝐙)+H(𝐙|𝐘)H(\mathbf{X}\,|\,\mathbf{Y})=H(\mathbf{X}\,|\,\mathbf{Y},\mathbf{Z})+H(\mathbf{Z}\,|\,\mathbf{Y}) (4.1)

which holds under the assumption that H(𝐙|𝐗,𝐘)=0H(\mathbf{Z}\,|\,\mathbf{X},\mathbf{Y})=0. We begin with arrays A=(a1a2an)A=\left(\begin{array}[]{c}a_{1}\\ a_{2}\\ \vdots\\ a_{n}\end{array}\right), B=(b1b2bn)B=\left(\begin{array}[]{c}b_{1}\\ b_{2}\\ \vdots\\ b_{n}\end{array}\right), where nn is even. We assume that A,BA,B are random binary strings subject to the condition that, for each ii, we have Pr(ai=bi)=pPr(a_{i}=b_{i})=p. We also assume that the events {(ai=bi)}\{(a_{i}=b_{i})\} form an independent set. We divide A,BA,B into blocks of size 2.

To start, put 𝐗=(x1x2)\mathbf{X}=\left(\begin{array}[]{c}x_{1}\\ x_{2}\end{array}\right), 𝐘=(y1y2)\mathbf{Y}=\left(\begin{array}[]{c}y_{1}\\ y_{2}\end{array}\right), 𝐙=x1+x2\mathbf{Z}=x_{1}+x_{2}.

Lemma 4.2.
H(𝐙|𝐗,𝐘)=0.H(\mathbf{Z}\,|\,\mathbf{X},\mathbf{Y})=0.
  • Proof.

    We want to calculate 𝐱,𝐲H(𝐙|𝐱,𝐲)Pr(X=𝐱,Y=𝐲)\sum_{\mathbf{x},\mathbf{y}}H(\mathbf{Z}\,|\,\mathbf{x},\mathbf{y})Pr(X=\mathbf{x},Y=\mathbf{y}). Given 𝐱,𝐲\mathbf{x},\mathbf{y}, say 𝐱=(α1α2)\mathbf{x}=\left(\begin{array}[]{c}\alpha_{1}\\ \alpha_{2}\end{array}\right), 𝐲=(β1β2)\mathbf{y}=\left(\begin{array}[]{c}\beta_{1}\\ \beta_{2}\end{array}\right) the value of 𝐙\mathbf{Z} is α1+α2\alpha_{1}+\alpha_{2}. There is no uncertainty in the value of 𝐙\mathbf{Z} given 𝐱,𝐲\mathbf{x},\mathbf{y}, i.e., each term in the above sum for HH is H(1,0)=0H(1,0)=0. Therefore H(𝐙|𝐗,𝐘)=0H(\mathbf{Z}\,|\,\mathbf{X},\mathbf{Y})=0. \blacksquare

From this we can use formula (4.1) for this block of size two. We can think of a channel from 𝐗\mathbf{X} to 𝐘\mathbf{Y} (or from 𝐘\mathbf{Y} to 𝐗\mathbf{X}) which is the second extension of a binary symmetric channel where pp is the probability of success. We have

H(𝐗|𝐘)=H(𝐗|𝐘,𝐙)+H(𝐙|𝐘).H(\mathbf{X}\,|\,\mathbf{Y})=H(\mathbf{X}\,|\,\mathbf{Y},\mathbf{Z})+H(\mathbf{Z}\,|\,\mathbf{Y}).

From Theorem 2.1 part (c) the left side, i.e., H(𝐗|𝐘)H(\mathbf{X}\,|\,\mathbf{Y}) is equal to 2H(p,q)2H(p,q). Next we calculate the right side beginning with H(𝐙|𝐘)H(\mathbf{Z}\,|\,\mathbf{Y}), i.e., H(𝐙|(y1y2))H\left(\mathbf{Z}\,\Big{|}\,\left(\begin{array}[]{c}y_{1}\\ y_{2}\end{array}\right)\right). We have

H(𝐙|𝐘)\displaystyle H(\mathbf{Z}\,|\,\mathbf{Y}) =H(𝐙|(y1+y2=x1+x2))Pr(y1+y2=x1+x2)\displaystyle=H\bigl{(}\mathbf{Z}\,|\,(y_{1}+y_{2}=x_{1}+x_{2})\bigr{)}Pr(y_{1}+y_{2}=x_{1}+x_{2})
+H(𝐙|(y1+y2x1+x2))Pr(y1+y2x1+x2).\displaystyle\qquad+H\bigl{(}\mathbf{Z}\,|\,(y_{1}+y_{2}\neq x_{1}+x_{2})\bigr{)}Pr(y_{1}+y_{2}\neq x_{1}+x_{2}).

We know that Pr(x1+x2=y1+y2)=p2+q2Pr(x_{1}+x_{2}=y_{1}+y_{2})=p^{2}+q^{2} and Pr(x1+x2y1+y2)=1(p2+q2)=2pqPr(x_{1}+x_{2}\neq y_{1}+y_{2})=1-(p^{2}+q^{2})=2pq since p+q=1p+q=1. From the standard formula we have H(𝐙|𝐘)=(p2+q2)log(1p2+q2)+2pqlog(12pq)H(\mathbf{Z}\,|\,\mathbf{Y})=(p^{2}+q^{2})\log\left(\frac{1}{p^{2}+q^{2}}\right)+2pq\log\left(\frac{1}{2pq}\right) since H(𝐙|𝐘)=H(p2+q2,2pq)H(\mathbf{Z}\,|\,\mathbf{Y})=H(p^{2}+q^{2},2pq).

Next we calculate

H((x1x2)|(y1y2),(x1+x2))=H(𝐗𝐘,𝐙).H\left(\left(\begin{array}[]{cc}x_{1}\\ x_{2}\end{array}\right)\,\Big{|}\,\left(\begin{array}[]{c}y_{1}\\ y_{2}\end{array}\right),(x_{1}+x_{2})\right)=H(\mathbf{X}\mid\mathbf{Y},\mathbf{Z}).

Again we have two possibilities, i.e., y1+y2=x1+x2y_{1}+y_{2}=x_{1}+x_{2} and y1+y2x1+x2y_{1}+y_{2}\neq x_{1}+x_{2}. The corresponding probabilities are p2+q2p^{2}+q^{2} and 2pq2pq respectively. We obtain

H(𝐗|𝐘,𝐙)=(p2+q2)H(p2p2+q2,q2p2+q2)+2pqH(pq2pq,pq2pq).H(\mathbf{X}|\mathbf{Y},\mathbf{Z})=(p^{2}+q^{2})H\left(\frac{p^{2}}{p^{2}+q^{2}},\frac{q^{2}}{p^{2}+q^{2}}\right)+2pqH\left(\frac{pq}{2pq},\frac{pq}{2pq}\right).

This comes about from the facts that

  1. (a)

    If y1+y2=x1+x2y_{1}+y_{2}=x_{1}+x_{2} then we either have y1=x1y_{1}=x_{1} and y2=x2y_{2}=x_{2} or y1=1x1y_{1}=1-x_{1}, y2=1x2y_{2}=1-x_{2}.

  2. (b)

    If y1+y2x1+x2y_{1}+y_{2}\neq x_{1}+x_{2} then either y1=x1y_{1}=x_{1} and y2x2y_{2}\neq x_{2} or y1x1y_{1}\neq x_{1} and y2=x2y_{2}=x_{2}.

  3. (c)

    H(12,12)=1H(\frac{1}{2},\frac{1}{2})=1.

Then from equation (4.1) we have our first identity as follows

2H(p,q)=(p2+q2)H(p2p2+q2,q2p2+q2)+2pq+H(p2+q2,2pq).2H(p,q)=(p^{2}+q^{2})H\left(\frac{p^{2}}{p^{2}+q^{2}},\frac{q^{2}}{p^{2}+q^{2}}\right)+2pq+H(p^{2}+q^{2},2pq). (4.3)

Blocks of Size Three.

Here 𝐗=(x1x2x3)\mathbf{X}=\left(\begin{array}[]{c}x_{1}\\ x_{2}\\ x_{3}\end{array}\right), 𝐘=(y1y2y3)\mathbf{Y}=\left(\begin{array}[]{c}y_{1}\\ y_{2}\\ y_{3}\end{array}\right), 𝐙=x1+x2+x3\mathbf{Z}=x_{1}+x_{2}+x_{3}. As in Lemma 4.2 we have H(𝐙|𝐗,𝐘)=0H(\mathbf{Z}\,|\,\mathbf{X},\mathbf{Y})=0 so we can use formula (4.1) again, i.e.,

H(𝐗|𝐘)=H(𝐗|𝐘,𝐙)+H(𝐙|𝐘).H(\mathbf{X}\,|\,\mathbf{Y})=H(\mathbf{X}\,|\,\mathbf{Y},\mathbf{Z})+H(\mathbf{Z}\,|\,\mathbf{Y}).

We have a channel from 𝐗\mathbf{X} to 𝐘\mathbf{Y} (or from 𝐘\mathbf{Y} to 𝐗\mathbf{X}) which is the third extension C(3)C^{(3)} of a binary symmetric channel CC, where pp is the probability that 0 (or 1) is transmitted to itself.

From Theorem 2.1 we have H(𝐗|𝐘)=3H(p,q)H(\mathbf{X}|\mathbf{Y})=3H(p,q).

Similar to the case of blocks of size 2, we have H(𝐙|𝐘)=H(p3+3pq2,q3+3qp2)H(\mathbf{Z}|\mathbf{Y})=H(p^{3}+3pq^{2},q^{3}+3qp^{2}). This is because the probabilities that Z=y1+y2+y3Z=y_{1}+y_{2}+y_{3} or Zy1+y2+y3Z\neq y_{1}+y_{2}+y_{3} are, respectively, p3+3pq2p^{3}+3pq^{2} or q3+3qp2q^{3}+3qp^{2}, as follows.

If Z=y1+y2+y3Z=y_{1}+y_{2}+y_{3}, then either x1=y1x_{1}=y_{1}, x2=y2x_{2}=y_{2}, x3=y3x_{3}=y_{3} or else, for some ii, 1i31\leq i\leq 3 (3 possibilities) xi=yix_{i}=y_{i} and, for the other two indices j,kj,k, xjyjx_{j}\neq y_{j} and xkykx_{k}\neq y_{k}.

A similar analysis can be carried out for the case where Zy1+y2+y3Z\neq y_{1}+y_{2}+y_{3}. We then get H(𝐗|𝐘,𝐙)=f(p,q)+f(q,p)H(\mathbf{X}\,|\,\mathbf{Y},\mathbf{Z})=f(p,q)+f(q,p) where

f(p,q)=(p3+3pq2){H(p3p3+3pq2,pq2p3+3pq2,pq2p3+3pq2,pq2p3+3pq2)}.f(p,q)=(p^{3}+3pq^{2})\left\{H\left(\frac{p^{3}}{p^{3}+3pq^{2}},\frac{pq^{2}}{p^{3}+3pq^{2}},\frac{pq^{2}}{p^{3}+3pq^{2}},\frac{pq^{2}}{p^{3}+3pq^{2}}\right)\right\}.

We now use the grouping axiom for m=1m=1. The pp in the formula refers to p3p3+3pq2\frac{p^{3}}{p^{3}+3pq^{2}} here and the qq there is now replaced by 3pq2p3+3pq2\frac{3pq^{2}}{p^{3}+3pq^{2}}. Then

f(p,q)\displaystyle f(p,q) =(p3+3pq2){H(p3p3+3pq2,3pq2p3+3pq2)+3pq2p3+3pq2H(13,13,13)}\displaystyle=(p^{3}+3pq^{2})\left\{H\left(\frac{p^{3}}{p^{3}+3pq^{2}},\frac{3pq^{2}}{p^{3}+3pq^{2}}\right)+\frac{3pq^{2}}{p^{3}+3pq^{2}}H(\frac{1}{3},\frac{1}{3},\frac{1}{3})\right\}
=(p3+3pq2)H(p3p3+3pq2,3pq2p3+3pq2)+3pq2log3.\displaystyle=(p^{3}+3pq^{2})H\left(\frac{p^{3}}{p^{3}+3pq^{2}},\frac{3pq^{2}}{p^{3}+3pq^{2}}\right)+3pq^{2}\log 3.

f(q,p)f(q,p) is obtained by interchanging pp with qq. We note that, since p+q=1p+q=1, 3pq2log3+3qp2log3=3pqlog33pq^{2}\log 3+3qp^{2}\log 3=3pq\log 3.

From working with blocks of size 3 we get

3H(p,q)\displaystyle 3H(p,q) =H(p3+3pq2,q3+3qp2)+(p3+3pq2)H(p3p3+3pq2,3pq2p3+3pq2)\displaystyle=H(p^{3}+3pq^{2},q^{3}+3qp^{2})+(p^{3}+3pq^{2})H\left(\frac{p^{3}}{p^{3}+3pq^{2}},\frac{3pq^{2}}{p^{3}+3pq^{2}}\right)
+(q3+3qp2)H(q3q3+3qp2,3qp2q3+3qp2)+3pqlog3.\displaystyle\hphantom{=H(p^{3}+3pq^{2},}+(q^{3}+3qp^{2})H\left(\frac{q^{3}}{q^{3}+3qp^{2}},\frac{3qp^{2}}{q^{3}+3qp^{2}}\right)+3pq\log 3. (4.4)

For blocks of size 2 formula (4.3) can be put in a more compact form in terms of capacities, namely,

2(1H(p,q))=[1H(p2+q2,2pq)]+[(p2+q2)(1H(p2p2+q2,q2p2+q2))].2\bigl{(}1-H(p,q)\bigr{)}=\biggl{[}1-H(p^{2}+q^{2},2pq)\biggr{]}+\left[(p^{2}+q^{2})\left(1-H\left(\frac{p^{2}}{p^{2}+q^{2}},\frac{q^{2}}{p^{2}+q^{2}}\right)\right)\right]. (4.5)

Using the same method we can find a formula analogous to formulae (4.3), (4) for obtaining nH(p,q)nH(p,q) as a linear combination of terms of the form H(u,v)H(u,v) where u,vu,v involve terms in pn,pn2q2,,qn,qn2p2,p^{n},p^{n-2}q^{2},\ldots,q^{n},q^{n-2}p^{2},\ldots plus extra terms such as 3pqlog33pq\log 3 as in formula (4).

5 Generalizations, an Addition Formula

The result of Theorem 2.1 can be extended to the more general case where we take the product of nn binary symmetric channels even if the channel matrices can be different corresponding to differing pp-values.

As an example, suppose we use the product of 2 binary symmetric channels with channel matrices

(p1q1q1p1),(p2q2q2p2).\left(\begin{array}[]{cc}p_{1}&q_{1}\\ q_{1}&p_{1}\end{array}\right),\quad\left(\begin{array}[]{cc}p_{2}&q_{2}\\ q_{2}&p_{2}\end{array}\right).

Then the argument in Section 4 goes through. To avoid being overwhelmed by symbols we made a provisional notation change.

Notation 5.1.

We denote by h(p)h(p) the quantity H(p,q)=plog1p+qlog1qH(p,q)=p\log\frac{1}{p}+q\log\frac{1}{q}.

Then we arrive at the following addition formula

h(p1)+\displaystyle h(p_{1})+ h(p2)\displaystyle h(p_{2})
=h(p1p2+q1q2)+(p1p2+q1q2)h(p1p2p1p2+q1q2)\displaystyle=h(p_{1}p_{2}+q_{1}q_{2})+(p_{1}p_{2}+q_{1}q_{2})\,h\left(\frac{p_{1}p_{2}}{p_{1}p_{2}+q_{1}q_{2}}\right)
+(p1q2+p2q1)h(p1q2p1q2+p2q1).\displaystyle\qquad+(p_{1}q_{2}+p_{2}q_{1})\,h\left(\frac{p_{1}q_{2}}{p_{1}q_{2}+p_{2}q_{1}}\right). (5.2)

Similarly to the above we can derive a formula for h(p1)+h(p2)++h(pn)h(p_{1})+h(p_{2})+\cdots+h(p_{n}).

6 The Shannon Limit and Applications to Cryptography

The above method of using blocks of various sizes is reminiscent of the algorithm for the key exchange in [4] which relates to earlier work in [2], [6] and others. Indeed the identities above were informed by the details of the algorithm.

The algorithm starts with two arrays (a1,,an)(a_{1},\ldots,a_{n}) and (b1,,bn)(b_{1},\ldots,b_{n}). We assume that the set of events {ai=bi}\{a_{i}=b_{i}\} is an independent set with p=Pr(ai=bi)p=Pr(a_{i}=b_{i}). We subdivide A,BA,B into corresponding sub-blocks of size tt, where tt divides nn. Exchanging parities by public discussion we end up with new shorter sub-arrays A1,B1A_{1},B_{1}, where the probabilities of corresponding entries being equal are independent with probability p1>pp_{1}>p. Eventually after mm iterations we end up with a common secret key Am=BmA_{m}=B_{m}.

Let us take an example. Start with two binary arrays (a1,,an)(a_{1},\ldots,a_{n}) and (b1,,bn)(b_{1},\ldots,b_{n}) of length nn with nn even, n=2tn=2t, say. We subdivide the arrays into corresponding blocks of size 2. If the two blocks are (a1a2)\left(\begin{array}[]{c}a_{1}\\ a_{2}\end{array}\right) and (b1b2)\left(\begin{array}[]{c}b_{1}\\ b_{2}\end{array}\right) we discard those blocks if the parities disagree. If the parities agree, which happens with probability p2+q2p^{2}+q^{2}, we keep a1a_{1} and b1b_{1}, discarding a2a_{2} and b2b_{2}. Thus, on average, we keep (p2+q2)n2(p^{2}+q^{2})\frac{n}{2} partial blocks and discard [1(p2+q2)]n2\bigl{[}1-(p^{2}+q^{2})\bigr{]}\frac{n}{2} blocks of size 2.

Let us suppose n=100n=100 and p=0.7p=0.7. From Theorem 2.1 part (d), the information that YY has about XX, i.e., that BB has about AA is 100[1H(0.7,0.3)]100(10.8813)11.87100[1-H(0.7,0.3)]\approx 100(1-0.8813)\approx 11.87 Shannon bits.

We are seeking to find a sub-array of A,BA,B such that corresponding bits are equal. Our method is to publicly exchange parities. The length of this secret key will be at most 11.

Back to the algorithm. A,BA,B keep on average (50)(p2+q2)(50)(p^{2}+q^{2}) blocks of size 2, i.e., (50)(0.58)=29(50)(0.58)=29 blocks of size 2. AA and BB remove the bottom element of each block. We are left with 29 pairs of elements (a1,b1)(a_{1},b_{1}). The probability that a1=b1a_{1}=b_{1} given that a1+a2=b1+b2a_{1}+a_{2}=b_{1}+b_{2} is p2p2+q2\frac{p^{2}}{p^{2}+q^{2}}, i.e., (0.7)2(0.7)2+(0.3)2=0.490.580.845\frac{(0.7)^{2}}{(0.7)^{2}+(0.3)^{2}}=\frac{0.49}{0.58}\approx 0.845. Next, 1H(0.845,0.155)(10.6221)0.37791-H(0.845,0.155)\approx(1-0.6221)\approx 0.3779. To summarize, we started with 100 pairs (ai,bi)(a_{i},b_{i}) with Pr(ai=bi)=0.7Pr(a_{i}=b_{i})=0.7. The information revealed to BB by AA is (100)(1H(0.7,0.3))=11.87(100)(1-H(0.7,0.3))=11.87 Shannon bits of information. After the first step of the algorithm we are left with 29 pairs (aj,bj)(a_{j},b_{j}) with Pr(aj=bj)=0.845Pr(a_{j}=b_{j})=0.845. The amount of information revealed to the remnant of BB by the remnant of AA is 29(0.3779)10.9629(0.3779)\approx 10.96 Shannon bits of information. So we have “wasted” less than 1 bit, i.e. the wastage is about 8%. Mathematically we have 100(1H(0.7,0.3))=29(1H(0.845,0.155))+100\bigl{(}1-H(0.7,0.3)\bigr{)}=29\bigl{(}1-H(0.845,0.155)\bigr{)}+ Wastage. In general we have

n[1H(p,q)]=n2(p2+q2)[1H(p2p2+q2,q2p2+q2)]+W,n[1-H(p,q)]=\frac{n}{2}(p^{2}+q^{2})\left[1-H\left(\frac{p^{2}}{p^{2}+q^{2}},\frac{q^{2}}{p^{2}+q^{2}}\right)\right]+W,

where WW denotes the wastage. Dividing by n2\frac{n}{2} we get

2[1H(p,q)]=(p2+q2)[1H(p2p2+q2,q2p2+q2)]+2Wn.2[1-H(p,q)]=(p^{2}+q^{2})\left[1-H\left(\frac{p^{2}}{p^{2}+q^{2}},\frac{q^{2}}{p^{2}+q^{2}}\right)\right]+\frac{2W}{n}.

Comparing with formula(4.5) we see that W=n2[1H(p2+q2,2pq)]W=\frac{n}{2}\bigl{[}1-H(p^{2}+q^{2},2pq)\bigr{]}. In this case W=50[1H(0.58,0.42)]50(10.9815)=(50)(0.0185)=0.925W=50\bigl{[}1-H(0.58,0.42)\bigr{]}\approx 50(1-0.9815)=(50)(0.0185)=0.925.

To sum up then the new identities tell us exactly how much information was wasted and not utilized. They also tell us, in conjunction with the algorithm, the optimum size of the sub-blocks at each stage.

One of the original motivations for work in coding theory was that the Shannon fundamental theorem showed how capacity was the bound for accurate communication but the problem was to construct linear codes or other codes such as turbo codes that came close to the bound.

Here we have an analogous situation. In the example just considered the maximum length of a cryptographic common secret key obtained as a common subset of A,BA,B is bounded by n(1H(p))n(1-H(p)). The problem is to find algorithms which produce such a common secret key coming close to this Shannon bound of n(1H(p))n\bigl{(}1-H(p)\bigr{)}.

This work nicely illustrates the inter-connections between codes, cryptography, and information theory. Information theory tells us the bound. The identities tell us the size of the sub-blocks for constructing a common secret key which attains, or gets close to, the information theory bound.

Coding theory is then used to ensure that the two communicating parties have a common secret key by using the hash function described in the algorithm using a code CC. Error correction can ensure that the common secret key can be obtained without using another round of the algorithm (thereby shortening the common key) if the difference between the keys of AA and BB is less than the minimum distance of the dual code of CC. This improves on the standard method of checking parities of random subsets of the keys Am,BmA_{m},B_{m} at the last stage.

7 Concluding Remarks.

  1. 1.

    Please see Chapter 25 of the forthcoming Cryptography, Information Theory, and Error-Correction: A Handbook for the Twenty-First Century, by Bruen, Forcinito, and McQuillan, [3], for background information, additional details, and related material.

  2. 2.

    Standard tables for entropy list values to two decimal places. When pp is close to 1 interpolation for three decimal places is difficult as h(p)h(p) is very steeply sloped near p=1p=1. Formula 4.3 may help since p2p2+q2\frac{p^{2}}{p^{2}+q^{2}} is less than pp, and the formula can be re-iterated.

  3. 3.

    In [4] the emphasis is on the situation where the eavesdropper has no initial information. The case where the eavesdropper has initial information is discussed in [6]. In Section 7 of [4] the quoted result uses Renyi entropy rather than Shannon entropy.

  4. 4.

    The methods in this note suggest possible generalizations which we do not pursue here.


Acknowledgement: The author thanks Drs Mario Forcinito, James McQuillan and David Wehlau for their help and encouragement with this work.


References

  • [1] R. B. Ash. Information Theory. Dover, 1990.
  • [2] C. H. Bennett and G. Brassard. Quantum cryptography: Public key distribution and coin tossing. In International Conference on Computers, Systems & Signal Processing, December 1984.
  • [3] A. A. Bruen, M. A. Forcinito, and J. M. McQuillan. Cryptography, Information Theory, and Error-Correction: A Handbook for the Twenty-First Century. Wiley, second edition, 2021. to appear.
  • [4] A. A. Bruen, D. L. Wehlau, and M. Forcinito. Error correcting codes, block designs, perfect secrecy and finite fields. Acta Applicandae Mathematica, 93, September 2006.
  • [5] G. A. Jones and J. M. Jones. Information and Coding Theory. Springer, 2000.
  • [6] U. Maurer and S. Wolf. Privacy amplification secure against active adversaries. In Advances in Cryptology, Proceedings of Crypto’97, pages 307–321. Springer-Verlag, August 1997.
  • [7] R. J. McEliece. The Theory of Information and Coding. Addison-Wesley, 1978.
  • [8] A. Renyi. A Diary on Information Theory. Wiley, 1987.
  • [9] D. Welsh. Codes and Cryptography. Oxford, 1988.