This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

On Multi-Channel Huffman Codes for Asymmetric-Alphabet Channels

Hoover H. F. Yin, Xishi Wang, Ka Hei Ng, Russell W. F. Lai, Lucien K. L. Ng, and Jack P. K. Ma H. Yin is with the Institute of Network Coding, The Chinese University of Hong Kong. X. Wang, L. Ng and J. Ma are with the Department of Information Engineering, The Chinese University of Hong Kong. K. Ng is with the Department of Physics, The Chinese University of Hong Kong. R. Lai is with the Chair of Applied Cryptography, Friedrich-Alexander-Universität Erlangen-Nürnberg. He is supported by the State of Bavaria at the Nuremberg Campus of Technology (NCT).
Abstract

Zero-error single-channel source coding has been studied extensively over the past decades. Its natural multi-channel generalization is however not well investigated. While the special case with multiple symmetric-alphabet channels was studied a decade ago, codes in such setting have no advantage over single-channel codes in data compression, making them worthless in most applications. With essentially no development since the last decade, in this paper, we break the stalemate by showing that it is possible to beat single-channel source codes in terms of compression assuming asymmetric-alphabet channels. We present the multi-channel analog of several classical results in single-channel source coding, such as that a multi-channel Huffman code is an optimal tree-decodable code. We also show some evidences that finding an efficient construction of multi-channel Huffman codes may be hard. Nevertheless, we propose a suboptimal code construction whose redundancy is guaranteed to be no larger than that of an optimal single-channel source code.

I Introduction

Zero-error source coding is one of the oldest branches in information theory. In a traditional (single-channel) source coding problem, a transmitter wishes to send an information source to a receiver with zero error through an error-free channel. To better utilize the channel, it is desirable to encode the information source in such a way that minimizes the number of symbols transmitted through the channel while ensuring correct decoding. In technical terms, the goal is to minimize the expected codeword length of a source code for an information source. Given an arbitrary information source, the well-known Huffman procedure [1] can produce an optimal111In this paper, the optimality we considered is for symbol-by-symbol coding with known probability masses of an information source. code known as a Huffman code. On the other hand, the entropy bound shows that the information theoretical lower bound of the expected codeword length of a source code is the entropy of the source information. Despite the optimality of the Huffman code, it does not achieve the entropy bound in general.

As a natural generalization, zero-error multi-channel source coding was proposed in [2] where each source symbol is mapped to a codeword which spreads across multiple channels. This problem is more complicated than its single-channel counterpart, as the sequence of received symbols in individual channels are not necessary uniquely decodable, even though the overall source code is uniquely decodable. In [2], only the case where all channels use the same alphabet size, i.e., symmetric-alphabet channels, was investigated. Among many other results, [2] showed that a multi-channel code (for symmetric-alphabet channels) can be equated to a single-channel code, so that the existing tools of source coding can be applied to analyze the multi-channel code. However, this implies that such a multi-channel code cannot improve the compression ability. It is thus natural to ask: What if at least one channel has an alphabet size different from that of another channel, i.e., we have asymmetric-alphabet channels? Are there any new opportunities and challenges under this extended setting?

I-A Motivating Examples

st22-ary33-ary
Figure 1: A 22-channel zero-error communication system where the two channels use a binary and a ternary alphabets respectively.

Our starting point is the observation that, in some cases, a multi-channel code with asymmetric alphabets achieves a better compression ability than an optimal single-channel code. Specifically, we consider the network in Fig. 1 and compare the expected codeword (description) length222The description length of a codeword is the number of information unit (e.g., nat) required to represent this codeword. of an optimal (2,3)(2,3)-ary tree-decodable code333A multi-channel prefix-free code might not have a decoding tree. We defer the detailed discussion regarding this issue to Section II-D., the binary Huffman code, and the ternary Huffman code for the information sources with probability masses {1/6,1/6,1/3,1/3}\{1/6,1/6,1/3,1/3\} and {1/6,1/6,1/6,1/2}\{1/6,1/6,1/6,1/2\} respectively in the following examples. For convenience, we use the information unit nat (base ee, the Euler’s number) throughout this paper.

\Tree\edge\edge1/3\edge1/3
(a)
\Tree\edge\edge1/2
(b)
Figure 2: Examples of optimal (2,3)(2,3)-ary codes achieving the entropy bound.
Example 1.

In Fig. 2(a), the codeword lengths for the 1/61/6 and 1/31/3 masses are (ln2+ln3)(\ln 2+\ln 3) nats and ln3\ln 3 nats respectively. The expected codeword length of the (2,3)(2,3)-ary code is (13ln2+ln3)(\frac{1}{3}\ln 2+\ln 3) nats, which is the entropy of the information source. The expected codeword length of the corresponding binary and ternary Huffman codes are 2ln22\ln 2 nats and 43ln3\frac{4}{3}\ln 3 nats respectively.

Example 2.

In Fig. 2(b), the codeword lengths for the 1/61/6 and 1/21/2 masses are (ln2+ln3)(\ln 2+\ln 3) nats and ln2\ln 2 nats respectively. The expected codeword length is (ln2+12ln3)(\ln 2+\frac{1}{2}\ln 3) nats, which is the entropy of the information source. The expected codeword length of the corresponding binary and ternary Huffman codes are 116ln2\frac{11}{6}\ln 2 nats and 43ln3\frac{4}{3}\ln 3 nats respectively.

TABLE I: Expected codeword lengths of optimal codes in nats
(2,3)(2,3)-ary 22-ary 33-ary
Fig. 2(a) 1.32966134885 1.38629436112 1.46481638489
Fig. 2(b) 1.24245332489 1.27076983103 1.46481638489

We summarize the above two examples and give the numerical values of the expected codeword length in Table I. We can see that the expected codeword length when both channels are in used is shorter than the other two single-channel cases, i.e., the optimal multi-channel source code outperforms the optimal single-channel source codes.

I-B Our Contributions

Given the above examples, it is natural to ask if a multi-channel generalization of the Huffman procedure systematically produces an optimal code. We give a partial affirmative answer to the above question. In particular, we formulate the generalized Huffman procedure and show that it indeed produces an optimal tree-decodable code. The generalized Huffman procedure follows the same idea of iteratively merging the smallest probability masses into bigger ones, until reaching probability 11. However, unlike the single-channel case, there is more than one choice of the number of masses to be merged in each iteration. This results in an exponential number of potential merge sequences, and only choosing a “correct” merging step in every iteration results in an optimal code. It is therefore unclear how the generalized Huffman procedure can be computed in polynomial time.

Towards designing an algorithm which computes or approximates the generalized Huffman procedure in polynomial time, we devise and investigate several heuristics for pruning the merging steps. We first investigate pruning by some natural measures such as (our multi-channel generalization of) local redundancy, expected codeword length, and entropy. Unfortunately none of these simplistic strategies can guarantee the production of an optimal code.

One of our results is the following pruning strategy based on iterative executions of the (single-channel) Huffman procedure: For a multiset of probability masses either given initially or produced by the previous iteration, we run the Huffman procedure on these masses with respect to the alphabet size of each channel. Let qq be the alphabet size of the resulting Huffman code with the smallest expected length. We merge the qq smallest probability masses into one and proceed to the next round.

Since all single-channel Huffman codes are included as potential choices of the above minimizing procedure, the resulting code guarantees a compression ability no worse than any of the optimal single-channel codes. While we are unable to show that this heuristic produces an optimal tree-decodable code, we have yet to find a counterexample either. We leave it as an open problem to prove the (sub)optimality of the resulting code. We remark that even if the heuristic produces an optimal tree-decodable code, it is unclear if the code is also an optimal prefix code except for the 22-channel case. It is also unclear if an optimal prefix code is an optimal uniquely decodable code.

I-C Paper Organization

We first introduce the background of multi-channel source coding in Section II. Then, we present some generalizations of some well-known results for single-channel source coding in Section III and the multi-channel Huffman procedure for asymmetric-alphabet channels in Section IV. We show some evidence that an efficient construction of multi-channel Huffman codes is not trivial and propose a suboptimal code construction in Section V. Lastly, we conclude the paper and propose some future directions in Section VI.

II Background

Unless otherwise specified, we use the notation defined in this section throughout this paper. For n+n\in\mathbb{Z}^{+}, define [n]={1,2,,n}[n]=\{1,2,\ldots,n\}.

II-A Multi-Channel Source Coding

The definition of a multi-channel source code was proposed in [2] for symmetric-alphabet channels and extended in [3] for asymmetric-alphabet channels. Consider an nn-channel system where the iith channel uses an alphabet 𝒵i\mathcal{Z}_{i} of size qiq_{i}, i[n]i\in[n]. Without loss of generality, we assume q1q2qnq_{1}\leq q_{2}\leq\ldots\leq q_{n} in this paper. Let 𝒵\mathcal{Z} be the alphabet of the information source ZZ. Denote by ϵ\epsilon the empty string. Define 𝒵i0={ϵ}\mathcal{Z}_{i}^{0}=\{\epsilon\} and 𝒵ij={wv:w𝒵ij1,v𝒵i}\mathcal{Z}_{i}^{j}=\{wv\colon w\in\mathcal{Z}_{i}^{j-1},v\in\mathcal{Z}_{i}\} for j+j\in\mathbb{Z}^{+}. That is, 𝒵ij\mathcal{Z}_{i}^{j} contains all the strings of length jj over the alphabet 𝒵i\mathcal{Z}_{i}. The set of all the strings of countable length over the alphabet 𝒵i\mathcal{Z}_{i} is denoted by 𝒵i=j=0𝒵ij\mathcal{Z}_{i}^{\ast}=\bigcup_{j=0}^{\infty}\mathcal{Z}_{i}^{j}. Every element in i=1n𝒵i\prod_{i=1}^{n}\mathcal{Z}_{i}^{\ast} is called a word.

Definition 1 (Multi-Channel Source Codes).

A (q1,,qn)(q_{1},\ldots,q_{n})-ary source code 𝒬\mathcal{Q} for the information source ZZ is a mapping from 𝒵\mathcal{Z} to i=1n𝒵i\prod_{i=1}^{n}\mathcal{Z}_{i}^{\ast}.

For any source symbol z𝒵z\in\mathcal{Z}, the ordered nn-tuple 𝒬(z)\mathcal{Q}(z) is the codeword for zz. The iith component of the codeword is transmitted through the iith channel. When we transmit more than one codeword, the codewords are concatenated channel-wise, i.e., the boundaries of the codewords are not explicit.

Definition 2 (Uniquely Decodable Codes).

If the finite sequences of codewords of any two distinct finite sequences of source symbols are different, then the source code is called a uniquely decodable code.

Definition 3 (Multi-Channel Prefix-Free Codes).

Two words are prefix free (to each other) if there exists at least one channel ii such that the iith component of the two words are prefix free. A (q1,,qn)(q_{1},\ldots,q_{n})-ary prefix-free code is a (q1,,qn)(q_{1},\ldots,q_{n})-ary source code such that every pair of codewords are prefix free.

By convention, a prefix-free code is also called a prefix code. Any codeword of a uniquely decodable multi-channel source code can be decoded without referring to the symbols of any future codewords if and only if the code is a multi-channel prefix code [2, 3].

II-B Single-Channel Representation

As a channel does not interfere the symbols in another channel, we can express a multi-channel word as a single-channel word with a dynamic alphabet. For example, let 𝒵1={0,1}\mathcal{Z}_{1}=\{0,1\} and 𝒵2={a,b,c}\mathcal{Z}_{2}=\{a,b,c\}. A 22-channel word (01,bca)(01,bca) can be expressed as either 01bca01bca, 0b1ca0b1ca, 0bc1a0bc1a, 0bca10bca1, b01cab01ca, b0c1ab0c1a, b0ca1b0ca1, bc01abc01a, bc0a1bc0a1, or bca01bca01. Note that 𝒵1\mathcal{Z}_{1} and 𝒵2\mathcal{Z}_{2} have different letter costs.

However, this single-channel representation is different from a single-channel word with unequal letter cost, a concept proposed in [4]. For example, in a single-channel code with unequal letter cost for the alphabet {0,1,a,b,c}\{0,1,a,b,c\}, the words 0 and aa are prefix-free. However, the 22-channel words (0,ϵ)(0,\epsilon) and (ϵ,a)(\epsilon,a) are not prefix-free although their single-channel representations are 0 and aa respectively.

When the alphabet sizes of the channels are the same, we can see that the single-channel representation of a multi-channel code can be reduced to a single-channel code with equal letter cost. That is why the compression ability cannot be enhanced in the setting proposed in [2].

II-C Kraft Inequality and Entropy Bound

Let nn and mm be the number of channels and number of codewords respectively. Denote by H()H(\cdot) the entropy measured in nats. All results can be trivially generalized to use other bases of logarithm.

The length of the jjth codeword in the iith channel is denoted by ij\ell_{i}^{j}. The length tuple of the jjth codeword is (1j,,nj)(\ell_{1}^{j},\ldots,\ell_{n}^{j}). The (description) length of the codeword, defined by j:=iijlnqi\ell^{j}:=\sum_{i}\ell_{i}^{j}\ln q_{i}, is the number of nats required to represent it. Define imax:=maxj=1mij\ell^{\text{max}}_{i}:=\max_{j=1}^{m}\ell_{i}^{j}.

The multi-channel analog to Kraft inequality and entropy bound were proposed in [2] for symmetric-alphabet channels. They were extended for asymmetric-alphabet channels in the conference paper [3] but their proofs were separated in the preprint [5]. For the sake of completeness, we reproduce the proofs in this paper.

Theorem 1 (Kraft Inequality [3]).

If 𝒬\mathcal{Q} is uniquely decodable, then the length tuples of its codewords satisfy

j=1mi=1nqiij1.\sum_{j=1}^{m}\prod_{i=1}^{n}q_{i}^{-\ell_{i}^{j}}\leq 1. (1)
Proof:

We use a similar technique as in [6]. Let NN be an arbitrary positive integer. Consider

(j=1mi=1nqiij)N=j1=1mj2=1mjN=1m(i=1nqik=1Nijk)\displaystyle\left(\sum_{j=1}^{m}\prod_{i=1}^{n}q_{i}^{-\ell_{i}^{j}}\right)^{N}=\sum_{j_{1}=1}^{m}\sum_{j_{2}=1}^{m}\ldots\sum_{j_{N}=1}^{m}\left(\prod_{i=1}^{n}q_{i}^{-\sum_{k=1}^{N}\ell_{i}^{j_{k}}}\right)
=\displaystyle= k1=1N1maxk2=1N2maxkn=1NnmaxAk1,k2,,kni=1nqiki,\yesnumber\displaystyle\sum_{k_{1}=1}^{N\ell^{\text{max}}_{1}}\sum_{k_{2}=1}^{N\ell_{2}^{\text{max}}}\ldots\sum_{k_{n}=1}^{N\ell^{\text{max}}_{n}}A_{k_{1},k_{2},\ldots,k_{n}}\prod_{i=1}^{n}q_{i}^{-k_{i}},\yesnumber

where Ak1,k2,,knA_{k_{1},k_{2},\ldots,k_{n}} is the coefficient of i=1nqiki\prod_{i=1}^{n}q_{i}^{-k_{i}} in (j=1mi=1nqiij)N(\sum_{j=1}^{m}\prod_{i=1}^{n}q_{i}^{-\ell_{i}^{j}})^{N}.

Note that Ak1,k2,,knA_{k_{1},k_{2},\ldots,k_{n}} gives the total number of sequences of NN codewords with a total length of kik_{i} symbols in the iith channel for all i=1,2,,ni=1,2,\ldots,n. Since the code is uniquely decodable, these code sequences must be distinct. So, the number Ak1,k2,,knA_{k_{1},k_{2},\ldots,k_{n}} must be no more than the total number of distinct sequences where there are kik_{i} symbols in the iith channel for all i=1,2,,ni=1,2,\ldots,n. That is, we have Ak1,k2,,kni=1nqikiA_{k_{1},k_{2},\ldots,k_{n}}\leq\prod_{i=1}^{n}q_{i}^{k_{i}}. By substituting this inequality into (II-C), we get

j=1mi=1nqiij(k1=1N1maxk2=1N2maxkn=1Nnmax1)1/N.\sum_{j=1}^{m}\prod_{i=1}^{n}q_{i}^{-\ell_{i}^{j}}\leq\left(\sum_{k_{1}=1}^{N\ell^{\text{max}}_{1}}\sum_{k_{2}=1}^{N\ell_{2}^{\text{max}}}\ldots\sum_{k_{n}=1}^{N\ell^{\text{max}}_{n}}1\right)^{1/N}.

Since this inequality holds for any NN, we let NN\to\infty and obtain our desired Kraft inequality. ∎

Theorem 2 (Entropy Bound [3]).

Let 𝒬\mathcal{Q} be a uniquely decodable code for a source random variable ZZ with probability {p1,p2,,pm}\{p_{1},p_{2},\ldots,p_{m}\} and entropy H(Z)H(Z). Then,

j=1mpji=1nijlnqiH(Z),\sum_{j=1}^{m}p_{j}\sum_{i=1}^{n}\ell_{i}^{j}\ln q_{i}\geq H(Z), (2)

where the equality holds if and only if i=1nijlnqi=lnpj\sum_{i=1}^{n}\ell_{i}^{j}\ln q_{i}=-\ln p_{j}.

Proof:

Here we use a standard technique. Consider

j=1mpji=1nlnqiijH(Z)\displaystyle\sum_{j=1}^{m}p_{j}\sum_{i=1}^{n}\ln q_{i}^{\ell_{i}^{j}}-H(Z)
=\displaystyle= j=1mpjln(pji=1nqiij)\displaystyle\sum_{j=1}^{m}p_{j}\ln\left(p_{j}\prod_{i=1}^{n}q_{i}^{\ell_{i}^{j}}\right)
\displaystyle\geq j=1mpj(1(pji=1nqiij)1)\yesnumber\displaystyle\sum_{j=1}^{m}p_{j}\left(1-\left(p_{j}\prod_{i=1}^{n}q_{i}^{\ell_{i}^{j}}\right)^{-1}\right)\yesnumber
=\displaystyle= (1j=1mi=1nqiij)0,\yesnumber\displaystyle\left(1-\sum_{j=1}^{m}\prod_{i=1}^{n}q_{i}^{-\ell_{i}^{j}}\right)\geq 0,\yesnumber

where (II-C) follows the inequality lna11/a\ln a\geq 1-1/a for any a>0a>0, and (II-C) follows the multi-channel Kraft inequality.

The equality in (II-C) holds if and only if pji=1nqiij=1p_{j}\prod_{i=1}^{n}q_{i}^{\ell_{i}^{j}}=1 for all jj, or equivalently, i=1nlnqiij=lnpj\sum_{i=1}^{n}\ln q_{i}^{\ell_{i}^{j}}=-\ln p_{j} for all jj. Under this condition, we have j=1mi=1nqiij=j=1mpj=1\sum_{j=1}^{m}\prod_{i=1}^{n}q_{i}^{-\ell_{i}^{j}}=\sum_{j=1}^{m}p_{j}=1. So, the equality in (II-C) also holds, which means that the bound is tight. ∎

The single-channel Kraft inequality gives a necessary and sufficient condition for the existence of a single-channel uniquely decodable code. However in the multi-channel analog, the sufficiency does not hold in general. The reason can be easily seen from the rectangle packing formulation for prefix codes in [3]: The Kraft inequality only states that the sum of areas of blocks (codewords) is no larger than the area of the container (all possible words). The geometry of the blocks (the packability of the blocks in the container) is not captured. On the other hand, the interpretation of the entropy bound is consistent with the traditional version that the expected codeword (description) length is no less than the entropy.

II-D Multi-Channel Tree-Decodable Codes

A single-channel prefix code can be represented by a decoding tree. Each branch of an internal node (including the root node) corresponds to a symbol in the alphabet. When we decode a codeword, we traverse the tree from the root node. We read the symbols in the codeword one by one. For each symbol, we traverse through the corresponding branch. We can decode a codeword once we reach a leaf of the tree.

When it comes to the multi-channel case, each internal node is assigned to a channel. An internal node belongs to class ii if it is assigned to the iith channel, and is only allowed to have at most qiq_{i} children. The generalized depth of a leaf is defined as the length of the codeword that the leaf corresponds to. See Figs. 2(a) and 2(b) as examples of a multi-channel decoding tree. A codeword can be decoded by traversing the tree. Suppose the current node being traversed is of class ii, we read a symbol from the iith channel of the codeword and traverse to the corresponding branch. Clearly decoding is sequential, i.e., without referring to the symbols of any future codewords. A code which has a decoding tree is called a tree-decodable code. The decoding tree of an optimal tree-decodable code is called an optimal decoding tree. Note that a tree-decodable code can have more than one decoding tree.

A tree-decodable code is a prefix code [2]. It is because for any two distinct leaves in a decoding tree, the codewords are not prefix of each other on the channel their lowest common ancestor corresponds to. The converse is not true in general for n>2n>2 channels. For each channel, if there is a codeword of a prefix code having an empty string in this channel, then we cannot draw a decoding tree for the code. Below is an example given in [2].

Example 3.

Consider a binary 33-channel source code, i.e., q1=q2=q3=2q_{1}=q_{2}=q_{3}=2. The codewords {(0,0,ϵ),(1,ϵ,0),(ϵ,1,1)}\{(0,0,\epsilon),(1,\epsilon,0),(\epsilon,1,1)\} form a prefix code which is not tree-decodable.

The 22-channel setting is special in the sense that there cannot exist a codeword having an empty string in the first channel, and another codeword having an empty string in the second channel, since they are not prefix-free. With a more careful argument, we have the following theorem.

Theorem 3.

Any 22-channel prefix code is tree-decodable.

Proof:

We first argue that, for any 22-channel prefix code with at least two codewords, there exists at least one channel i{1,2}i\in\{1,2\} such that for all codewords the iith component is not an empty string. We call such channel ii always non-empty. Suppose not, then there exist codewords (c1,ϵ)(c_{1},\epsilon) and (ϵ,c2)(\epsilon,c_{2}) for some strings c1,c2c_{1},c_{2}. Since ϵ\epsilon is a prefix of both c1c_{1} and c2c_{2}, they are not prefix-free.

Next, given a 22-channel prefix code, we construct a decoding tree as follows. If the code has only one codeword, then we construct a tree with a single node associated to this codeword. Otherwise, suppose there are at least 2 codewords. By the argument above, there exists an always non-empty channel ii. Let the root node be of class ii. We partition the codewords depending on the first symbol of the codewords in the iith channel into the chunks P1,,PqiP_{1},\ldots,P_{q_{i}}. For each chunk PjP_{j}, we remove the first symbol of the codewords in the iith channel. This results in a sub-code P~j\tilde{P}_{j}.

We argue that P~j\tilde{P}_{j} is prefix-free for all jj. Suppose not, let (c1,c2)(c_{1},c_{2}) and (c1,c2)(c_{1}^{\prime},c_{2}^{\prime}) be codewords in P~j\tilde{P}_{j} which are not prefix-free. Without loss of generality, suppose channel 1 was used for partitioning. Then before the truncation the codewords were (ac1,c2)(a\|c_{1},c_{2}) and (ac1,c2)(a\|c_{1}^{\prime},c_{2}^{\prime}) for some symbol aa. Clearly, since (c1,c2)(c_{1},c_{2}) and (c1,c2)(c_{1}^{\prime},c_{2}^{\prime}) are not prefix-free, (ac1,c2)(a\|c_{1},c_{2}) and (ac1,c2)(a\|c_{1}^{\prime},c_{2}^{\prime}) are also not prefix-free. This contradicts to the assumption that we start with a prefix code.

Since each P~j\tilde{P}_{j} is prefix-free, we can run the above procedure on P~j\tilde{P}_{j} recursively and link the resulting root node to the root node constructed above with the jjth edge. This results in a decoding tree. ∎

III General Results on Multi-Channel Source Codes

Towards a better general understanding of multi-channel source codes, we state a collection of elementary results regarding multi-channel source / tree-decodable codes which were not explicitly stated before. These results could help defining, analyzing, and approximating the multi-channel Huffman procedure in the remaining sections. These results are not necessary for understanding the multi-channel Huffman procedure, and can be skipped if the reader desires.

III-A Multi-Channel is No Worse than Single-Channel

Definition 4.

The trivial nn-channel extension of a single-channel source code expends each codeword into a nn-tuple where all the new components are empty strings.

Example 4.

If the channel used by a single-channel source code is the 22nd channel of a 33-channel system, then the trivial 33-channel extension of the code is to substitute every codeword cc into (ϵ,c,ϵ)(\epsilon,c,\epsilon).

We first state the following trivial but important result by viewing qiq_{i}-ary source codes as special cases of (q1,,qn)(q_{1},\ldots,q_{n})-ary source codes, which suggests the worthiness of studying multi-channel source codes.

Theorem 4.

The expected codeword length of an optimal (q1,,qn)(q_{1},\ldots,q_{n})-ary source code is no worse than that of an optimal qiq_{i}-ary source code for all i[n]i\in[n].

Proof:

The trivial multi-channel extension of an optimal single-channel source code is a multi-channel source code. Note that the length of every codeword is conserved, so the expected codeword length of an optimal multi-channel source code is no worse than the one of the above trivial multi-channel extension. ∎

III-B Source Coding Theorem for Symbol Codes

Let LHuffL_{\text{Huff}} be the expected codeword length of an optimal qq-ary single-channel uniquely decodable code. One of the classical results by Shannon [7] which is known as the source coding theorem for symbol codes is that H(Z)LHuff<H(Z)+lnqH(Z)\leq L_{\text{Huff}}<H(Z)+\ln q. That is, we can use no more than lnq\ln q nats from the entropy to represent an optimal code. The following theorem, which is a natural multi-channel analog, extends a similar idea that we can use no more than lnq1\ln q_{1} nats (recall that q1=mini=1nqiq_{1}=\min_{i=1}^{n}q_{i}) from the entropy to represent an optimal code.

Theorem 5 (Source Coding Theorem for Symbol Codes).

Let LoptL_{\text{opt}} be the expected codeword length of an optimal (q1,,qn)(q_{1},\ldots,q_{n})-ary uniquely decodable code. Then, H(Z)Lopt<H(Z)+lnq1H(Z)\leq L_{\text{opt}}<H(Z)+\ln q_{1}.

Proof:

The first inequality is the entropy bound. For the second inequality, we construct an optimal single-channel qiq_{i}-ary uniquely decodable code for the iith channel. Let LHuffiL_{\text{Huff}_{i}} qiq_{i}-its be its expected codeword length. We have LoptLHuffi<H(Z)+lnqiL_{\text{opt}}\leq L_{\text{Huff}_{i}}<H(Z)+\ln q_{i}, where the first inequality is by Theorem 4. As we know that LoptLHuffiL_{\text{opt}}\leq L_{\text{Huff}_{i}} for all ii, we have Lopt<H(Z)+lnqiL_{\text{opt}}<H(Z)+\ln q_{i} for all ii, i.e., we have Lopt<H(Z)+lnq1L_{\text{opt}}<H(Z)+\ln q_{1}. ∎

It is also not hard to show that the above upper bound is tight, which is stated in the following theorem.

Theorem 6.

H(Z)+lnq1H(Z)+\ln q_{1} is the tightest upper bound on LoptL_{\text{opt}} which depends only on H(Z)H(Z).

Proof:

Consider a multiset of probability masses P={1(q11)/k}(i=1q11{1/k})P=\{1-(q_{1}-1)/k\}\uplus(\biguplus_{i=1}^{q_{1}-1}\{1/k\}) where kq1k\geq q_{1}. Note that |P|=q1|P|=q_{1} and xPx=1\sum_{x\in P}x=1. The optimal nn-channel code is the trivial nn-channel extension of the optimal q1q_{1}-ary single-channel code. The codeword length is 11 q1q_{1}-it, i.e., lnq1\ln q_{1} nats. By taking kk\to\infty, we have H(Z)0H(Z)\to 0 so that LoptH(Z)+lnq1L_{\text{opt}}\to H(Z)+\ln q_{1}. ∎

III-C Local Redundancy of Tree-Decodable Codes

The concept of local redundancy was introduced in [8], which is a tool for understanding the redundancy of a tree-decodable code. An internal node is a non-leaf node in a decoding tree. In the decoding tree of a single-channel qq-ary source code, the local redundancy of an internal node kk is defined by rk=sk(lnqhk)r_{k}=s_{k}(\ln q-h_{k}) where sks_{k} and hkh_{k} are the reaching probability and the entropy of the conditional branching probabilities respectively of node kk. An interpretation is that we use lnq\ln q nats to represent the branches of node kk due to the restriction of the alphabet size, i.e., there are at most qq branches in a qq-ary source code. However, minimally it can be done by hkh_{k} nats. So, lnqhk\ln q-h_{k} is the amount of nats wasted at node kk.

In the decoding tree of a multi-channel tree-decodable code, an internal node corresponds to a channel. Let αk\alpha_{k} be the alphabet size of the channel node kk corresponds to. The number of branches of node kk is αk\alpha_{k}. We use lnαk\ln\alpha_{k} nats to represent the αk\alpha_{k} branches at node kk but minimally it can be done by hkh_{k} nats. Therefore, we have the following straightforward extension of local redundancy.

Definition 5.

The local redundancy rkr_{k} of an internal node kk is rk:=sk(lnαkhk)r_{k}:=s_{k}(\ln\alpha_{k}-h_{k}) nats.

Denote by \mathcal{I} the index set of the internal nodes. Let LL be the expected codeword length of a multi-channel tree-decodable code measured in nats. We can show that H(Z)=kskhkH(Z)=\sum_{k\in\mathcal{I}}s_{k}h_{k} by a conditional entropy argument. On the other hand, we can show that L=ksklnαkL=\sum_{k\in\mathcal{I}}s_{k}\ln\alpha_{k} by a weighted bookkeeping argument. Then, by considering the difference LH(Z)L-H(Z), we arrive at a multi-channel generalization of the local redundancy theorem.

Lemma 1.

H(Z)=kskhkH(Z)=\sum_{k\in\mathcal{I}}s_{k}h_{k}.

Proof:

Let dd be the height of the decoding tree. An outcome of ZZ can be represented by a path from the root node to a leaf node of the decoding tree. Let SiS_{i} be the random variable for the node having depth ii which leads to the outcome of ZZ. If the outcome of ZZ is reached, then the random variables for the larger depths are deterministic, i.e., they have zero entropy. Then, Z=(S0,S1,Sd)Z=(S_{0},S_{1},\ldots S_{d}) and S0S1SdS_{0}\to S_{1}\to\ldots\to S_{d} forms a Markov chain. As S0S_{0} must be the root node, we have H(S0)=0H(S_{0})=0. Let i\mathcal{I}_{i} be the index set of the internal nodes having depth ii. We have H(Si|Si1)=ki1Pr(Si1=k)H(Si|Si1=k)=ki1skhkH(S_{i}|S_{i-1})=\sum_{k\in\mathcal{I}_{i-1}}\Pr(S_{i-1}=k)H(S_{i}|S_{i-1}=k)=\sum_{k\in\mathcal{I}_{i-1}}s_{k}h_{k}. Hence, H(Z)=i=0dH(Si|Si1)=kskhkH(Z)=\sum_{i=0}^{d}H(S_{i}|S_{i-1})=\sum_{k\in\mathcal{I}}s_{k}h_{k}. ∎

Lemma 2.

L=ksklnαkL=\sum_{k\in\mathcal{I}}s_{k}\ln\alpha_{k}.

Proof:

Each internal node kk uses lnαk\ln\alpha_{k} nats to represent the αk\alpha_{k} branches, which means that the node adds lnαk\ln\alpha_{k} nats to the codeword length of each leaf. That is, the node increases the expected codeword length by sklnαks_{k}\ln\alpha_{k} nats. The proof is done by summing up all the increments made by the internal nodes. ∎

Theorem 7 (Local Redundancy Theorem).

The redundancy of a multi-channel tree-decodable code is LH(Z)=krkL-H(Z)=\sum_{k\in\mathcal{I}}r_{k} nats.

Proof:

By Lemmas 1 and 2, we have LH(Z)=ksk(lnαkhk)=krkL-H(Z)=\sum_{k\in\mathcal{I}}s_{k}(\ln\alpha_{k}-h_{k})=\sum_{k\in\mathcal{I}}r_{k}. ∎

IV Multi-Channel Huffman Procedure

Let {p1,p2,,pm}\{p_{1},p_{2},\ldots,p_{m}\} be the probability distribution of the source random variable ZZ. Without loss of generality, assume that p1p2pmp_{1}\leq p_{2}\leq\ldots\leq p_{m}. It is well-known that the Huffman procedure [1] allows us to efficiently construct an optimal qq-ary single-channel source code, called a qq-ary Huffman code, for the information source ZZ [9]. Every iteration of the procedure selects the smallest qq probability masses and merges them into one mass, i.e., the smallest qq probability masses are removed from the multiset and then their sum is added back to the multiset. The codeword lengths for the selected masses are increased by lnq\ln q nat. Every merge reduces the size of the multiset by q1q-1. In order to merge all the probability masses into one single mass, we need to inject 0w<q10\leq w<q-1 dummy symbols to the information source, i.e., inject ww dummy masses with zero probability into the multiset, before we start the procedure. The value of ww is the smallest non-negative integer such that m+w1(modq1)m+w\equiv 1\pmod{q-1}, i.e.,

w=[q1((m1)mod(q1))]mod(q1).w=[q-1-((m-1)\bmod(q-1))]\bmod(q-1).

We call the leaves in the decoding tree which are assigned to dummy symbols the dummy leaves.

The core idea of the Huffman procedure is that the smaller the probability mass the longer the codeword length, so we can assign codewords of shorter lengths to those source symbols which are more likely to appear. This idea is also valid in the multi-channel case. A traditional proof, e.g., [10, Lem. 4.15], can be used to prove the following lemma.

Lemma 3.

In an optimal multi-channel uniquely decodable code, codewords with shorter lengths are assigned to larger probabilities.

Proof:

Suppose in an optimal code we have codeword lengths a>b\ell^{a}>\ell^{b} for the probabilities pa>pbp_{a}>p_{b} respectively. By exchanging the codewords assigned to these two probabilities, the expected codeword length of the code is changed by (pab+pba)(paa+pbb)<0(p_{a}\ell^{b}+p_{b}\ell^{a})-(p_{a}\ell^{a}+p_{b}\ell^{b})<0, which contradicts to the optimality of the code. ∎

When we adopt the Huffman procedure to a multi-channel source code, we have to decide how many probability masses we should merge with the constraint that the number of probability masses to be selected is qiq_{i} for some i[n]i\in[n]. Note that the procedure can only produce tree-decodable codes.

Definition 6.

A merge sequence is a finite sequence where the iith term is the number of probability masses (including dummy masses) to be merged by the Huffman procedure in the iith iteration.

When we want to merge a certain number of masses in an iteration but there is more than one channel that can be used, i.e., they have the same alphabet size, we can arbitrarily choose a channel as the expected length is not affected and the code is still a prefix code.

The merge sequences for Figs. 2(a) and 2(b) are (2,3)(2,3) and (3,2)(3,2) respectively. We can see that it is not necessary to merge a smaller or larger number of masses in an iteration. The wrong decision may produce a code which is not optimal.

If a merge sequence (m1,m2,)(m_{1},m_{2},\ldots) is given, then the number of dummy symbols ww can be implicitly derived by w=i(mi1)+1mw=\sum_{i}(m_{i}-1)+1-m. When the merge sequence is undetermined, however, it is unclear that how many dummy symbols are needed exactly. Nevertheless, we are able to derive an upper bound of the number of required dummy symbols. For some specific values of (q1,,qn)(q_{1},\ldots,q_{n}), e.g., (q1,q2)=(2,3)(q_{1},q_{2})=(2,3), the upper bound forces ww to be 0.

We now state two intermediate lemmas regarding the dummy leaves in an optimal decoding tree, followed by the theorem about the upper bound on the number of required dummy symbols.

Lemma 4.

There exists an optimal decoding tree where the siblings of a dummy leaf are also leaves.

Proof:

Consider an optimal decoding tree. If the siblings of a dummy leaf are also leaves, then we are done. Otherwise, suppose there exists a dummy leaf such that one of its siblings is an internal node. Suppose there exists a non-dummy leaf in the sub-tree under this internal node. Then swapping the dummy leaf with this non-dummy leaf reduces the expected length of the corresponding code. Since we start with an optimal tree, this cannot happen. We can therefore assume that the leaves of the sub-tree under this internal node are all dummy leaves, then replacing the internal node with a dummy leaf also gives an optimal decoding tree. By repeating this procedure, we obtain an optimal decoding tree where the siblings of a dummy leaf are also leaves. ∎

Lemma 5.

There exists an optimal decoding tree where all the dummy leaves are siblings.

Proof:

Consider an optimal decoding tree. Suppose the set of all dummy leaves are siblings, then we are done. Otherwise, suppose there exists two dummy leaves which are not siblings. By Lemma 4, the siblings of them are all leaves respectively.

Consider one of these two dummy leaves. If all siblings of this dummy leaf are also dummy leaves, then we can construct another optimal decoding tree by replacing the parent of the dummy leaf by a dummy leaf. Therefore, without loss of generality, we can assume that some siblings of both dummy leaves are not dummy leaves.

We next argue that these two sets of siblings all have the same generalized depth. Suppose not, then swapping a non-dummy leaf of higher generalized depth with a dummy leaf of lower generalized depth reduces the expected length of the corresponding code. Since we start with an optimal tree, this cannot happen. We can therefore assume that the two sets of siblings all have the same generalized depth.

Since the two sets of siblings all have the same generalized depth, swapping leaves from the two sets do not affect the expected length of the corresponding code. We can therefore swap the leaves in such a way that we pack as many dummy leaves into one set as possible. There are two possible outcomes. Either one set is rid of dummy leaves, or one set is full of dummy leaves. For the latter case, we can use the argument above and replace the parent of these dummy leaves with a dummy leaf, and repeat the subsequent arguments again. For the former case, we can repeat the above argument until the set of all dummy leaves are siblings. ∎

Lemma 6.

Let k{2,3,,qn}[m]k\in\{2,3,\ldots,q_{n}\}\cap[m] be the number of non-dummy probability masses to be merged in the first round. Then the parent node of the kk leaves associated to the kk merged masses is assigned to class ii^{*}, where qiq_{i^{*}} is the smallest among {q1,,qn}\{q_{1},\ldots,q_{n}\} such that qikq_{i^{*}}\geq k. Furthermore, the number of dummy leaves required is w=qikw=q_{i^{*}}-k.

Proof:

By Lemmas 4 and 5, we know that there exists an optimal decoding tree where all dummy leaves are siblings, and all other siblings are non-dummy leaves. Let ww be the number of dummy leaves, and kk be the number of non-dummy leaves in this set of siblings. We argue that the parent of these siblings is assigned to class ii^{*}, where qiq_{i^{*}} is the smallest among {q1,,qn}\{q_{1},\ldots,q_{n}\} with qikq_{i^{*}}\geq k, and w=qikw=q_{i^{*}}-k. Suppose not, then the parent is assigned to some class ii with qi>qiq_{i}>q_{i^{*}}, and w=qik>qikw=q_{i}-k>q_{i^{*}}-k. By changing the class of the parent node from ii to ii^{*}, and reducing the number of dummy leaves from qikq_{i}-k to qikq_{i^{*}}-k, we obtain a decoding tree whose corresponding code has a lower expected codeword length. As we start with an optimal decoding tree, this is impossible. We therefore conclude that w=qikw=q_{i^{*}}-k. ∎

Theorem 8.

The number ww of dummy symbols needed is bounded by 0w<maxi[n]{qiqi1}0\leq w<\max_{i\in[n]}\{q_{i}-q_{i-1}\} where q0:=1q_{0}:=1.

Proof:

Using the notation in Lemma 6, suppose i>1i^{*}>1. Note that k>qi1k>q_{i^{*}-1}, for otherwise qi1q_{i^{*}-1} is smaller than qiq_{i^{*}} and yet kqi1k\leq q_{i^{*}-1}, violating the definition of ii^{*}. Suppose otherwise that i=1i^{*}=1. Note that k>1k>1 for otherwise we can replace the parent of the k=1k=1 node to be “merged” by the node itself. To summarize, we have w=qik<qiqi1maxi[n]{qiqi1}w=q_{i^{*}}-k<q_{i^{*}}-q_{i^{*}-1}\leq\max_{i\in[n]}\{q_{i}-q_{i-1}\}, where q0:=1q_{0}:=1. ∎

We now know the range of the number of dummy symbols we need to add but not the exact number. On the other hand, we need to know how to merge these dummy symbols, say, should we merge all the dummy symbols in a single iteration, or should we merge certain number of dummy symbols for a specific channel? The following lemma describes the structure of a specific optimal tree-decodable code. Note that there may be other optimal tree-decodable codes which do not meet this structure. However from the lemma, we know that we can merge all the dummy masses in the first iteration.

Lemma 7.

There exists optimal decoding tree where all dummy leaves and codewords assigned to a certain number (k{2,3,,qn}[m]}k\in\{2,3,\ldots,q_{n}\}\cap[m]\}) of the smallest non-dummy probabilities are siblings.

Proof:

Now we continue with the optimal decoding tree stated in Lemma 5. Let ii^{*} be the class the parent node of the codeword for p1p_{1} belongs to. This codeword must have at least one sibling assigned to an non-zero probability or otherwise we can remove the last symbol of the codeword in the ii^{*}th channel to reduce the expected codeword length. Let k1k-1 be the number of siblings of this codeword which are assigned to non-zero probabilities, where 2kqi2\leq k\leq q_{i^{*}}. Let one of these siblings be assigned to pjp_{j} where p1p2pjp_{1}\leq p_{2}\leq p_{j}. By Lemma 3, we have 12j=1\ell^{1}\geq\ell^{2}\geq\ell^{j}=\ell^{1}, i.e., 1=2=j\ell^{1}=\ell^{2}=\ell^{j}. If pjp2p_{j}\neq p_{2}, then we can swap their assigned codewords so that the codewords for p1p_{1} and p2p_{2} are siblings without changing the expected codeword length. We can repeat the above arguments to show that the kk smallest non-zero probabilities are siblings.

The final step is to show that when there is a dummy leaf, then it is a sibling of the codeword of p1p_{1}. It can be proved by a similar argument that if the dummy leaves are not the siblings of the codeword of p1p_{1}, then we can swap a dummy leaf with a non-dummy sibling of the codeword of p1p_{1} so that the expected codeword length is either unchanged or reduced. ∎

At last, we can use the above lemma to argue with induction that one of the merge sequences produces an optimal decoding tree. That is, we know that the generalized Huffman procedure can indeed produce an optimal decoding tree, but we do not know the exact merge sequence.

Theorem 9.

There exists a merge sequence which can produce an optimal multi-channel tree-decodable code.

Proof:

Fix k{2,3,,qn}[m]k\in\{2,3,\ldots,q_{n}\}\cap[m]. By Lemma 6, we can determine i[n]i^{*}\in[n], where qiq_{i^{*}} is the smallest among {q1,,qn}\{q_{1},\ldots,q_{n}\} with qikq_{i^{*}}\geq k, and w=qikw=q_{i^{*}}-k. When we merge the kk smallest probability masses, we obtain a reduced multiset of probabilities {j=1kpj,pk+1,,pm}\{\sum_{j=1}^{k}p_{j},p_{k+1},\ldots,p_{m}\}. Let LkL^{\prime}_{k} be the expected codeword length of the optimal tree-decodable code for the reduced multiset of probabilities. Based on this reduced tree, we can reverse the merge to obtain a tree for the original multiset of probabilities where its expected codeword length LkL_{k} is Lk=Lk+j=1kpjlnqiL_{k}=L^{\prime}_{k}+\sum_{j=1}^{k}p_{j}\ln q_{i^{*}}. Note that since LkL^{\prime}_{k} is the expected codeword length of the optimal tree-decodable code for the reduced multiset of probabilities, LkL_{k} is optimal for the original multiset of probabilities, for this specific kk. Since the method of creating the trees conforms to the format in Lemma 7, there must exist one kk which gives the overall optimal tree. By considering all possible kk, the expected codeword length of the optimal tree-decodable code for the original multiset of probabilities is mink=2min{qn,m}Lk\min_{k=2}^{\min\{q_{n},m\}}L_{k}. We can apply the above arguments inductively (but fixing k{q1,,qn}[m]k\in\{q_{1},\ldots,q_{n}\}\cap[m^{\prime}] in subsequent rounds where mm^{\prime} is the number of remaining masses) to all the possible reduced multisets of probabilities to conclude that there exists a merge sequence which can produce an optimal multi-channel tree-decodable code. ∎

The importance of the above result is that we do not have to try arbitrary merges on the probabilities: we only combine the smallest probabilities in each merge. This way, the search space is greatly reduced.

V Approximate Multi-Channel Huffman Procedure

It is not known whether an optimal tree-decodable code is an optimal prefix code except for the 22-channel case. If we further restrict ourselves to optimal tree-decodable codes, the multi-channel Huffman procedure can be applied.444A testing tree like the one discussed in [11] requires a source symbol to be the specific branches of the internal nodes. Also, the cost of an internal node is independent of the number of branches. That is, constructing an optimal tree-decodable code is different from constructing an optimal testing tree. However, the procedure raises another issue that in each iteration we have to decide how many masses are merged. Trying all possible merge sequences produces exponential number of decoding trees which is inefficient. Take (2,3)(2,3)-ary codes as an example, the number of possible merge sequences for mm probabilities is the mmth Fibonacci number.

One way to reduce complexity is to prune the merge sequences according to some metric. Specifically, we start by trying all possible choices in each iteration and keeping track of the resulting reduced multiset sizes. Whenever we obtain two merge subsequences producing reduced multisets of the same size, we eliminate the under-performing one according to some metric and continue with the remaining one. This strategy avoids unfair comparison between multisets of different sizes. For example, we should compare the reduced multisets generated by the merge sequences (2,2)(2,2) and (3)(3) as both of them reduced the number of masses by 22. In the following we show that using this strategy with several natural metrics is sub-optimal.

Given the probabilities {0.13,\{0.13, 0.199,0.199, 0.212,0.212, 0.217,0.217, 0.242}0.242\}, we apply the Huffman procedure to generate a (2,3)(2,3)-ary tree-decodable code. Note that we are worse off choosing the 22nd channel in the 1st iteration and merging a dummy mass with 22 non-dummy masses, since we can instead choose the 11st channel to merge the 22 non-dummy masses which results in a shorter expected codeword length. Therefore in the example here, we do not need to consider dummy masses. The candidate optimal merge sequences are (2,2,2,2)(2,2,2,2), (2,2,3)(2,2,3), (2,3,2)(2,3,2), (3,2,2)(3,2,2), and (3,3)(3,3), while the winner is (3,2,2)(3,2,2).

TABLE II: Prune by Redundancy
merge number of remaining masses
sequence 44 33 22 11
(2,2,2,2) 0.0072895611 0.0073186993 0.0139724304 0.024088589
(2,2,3) 0.0072895611 0.0073186993 0.0337666569
(2,3,2) 0.0072895611 0.0084312615 0.0681102540
(3,2,2) 0.0113528472 0.0120340121 0.0153997900
(3,3) 0.0113528472 0.1027103422
TABLE III: Prune by Expected Codeword Description Length
merge number of remaining masses
sequence 44 33 22 11
(2,2,2,2) 0.2280454224 0.5254055629 0.9211926030 1.6143397835
(2,2,3) 0.2280454224 0.5254055629 1.6240178515
(2,3,2) 0.2280454224 0.9652142681 1.6583614487
(3,2,2) 0.5943492482 0.9125038040 1.6056509846
(3,3) 0.5943492482 1.6929615368
TABLE IV: Prune by Entropy
merge number of remaining masses
sequence 44 33 22 11
(2,2,2,2) 1.3694953333 1.0721643310 0.6830310220 0.0000000000
(2,2,3) 1.3694953333 1.0721643310 0.0000000000
(2,3,2) 1.3694953333 0.6334681881 0.0000000000
(3,2,2) 1.0072547937 0.6897814027 0.0000000000
(3,3) 1.0072547937 0.0000000000
TABLE V: Prune by Expected Resultant Codeword Length ++ Entropy
merge number of remaining masses
sequence 44 33 22 11
(2,2,2,2) 1.5975407557 1.5975698939 1.6042236250 1.6143397835
(2,2,3) 1.5975407557 1.5975698939 1.6240178515
(2,3,2) 1.5975407557 1.5986824562 1.6583614487
(3,2,2) 1.6016040418 1.6022852068 1.6056509846
(3,3) 1.6016040418 1.6929615368

During the Huffman procedure, we record the redundancy (sum of local redundancies) and the expected codeword length of the already constructed subtree(s) in Tables II and III. We also record the entropy of the reduced multisets of probabilities in Table IV. Table V shows the entry-wise sum of Tables III and IV. The values in Tables IV and V are the (lower and upper) bounds of the expected codeword length of the not-yet-constructed part and the expected codeword length of the resultant code respectively (by Theorems 2 and 5). All the values are measured in nats.

The numbers in bold are the minimums of the corresponding columns. For each table, we compare the steps of different merge sequences when the numbers of remaining masses are the same. If a merge sequence does not attend the minimum during a comparison, we prune the merge sequence away and highlight the corresponding cells in gray. The merge sequence with a white cell in the rightmost column is the output of the procedure for the specific metric of the table. We can see that none of these tables produce the optimal (3,2,2)(3,2,2).

We now propose a straightforward suboptimal code construction which can guarantee a redundancy no larger than the one of a single-channel Huffman code. We follow a similar idea used in Table V except that, instead of adding the entropy of the reduced multiset of probabilities to the expected codeword length of the already constructed subtree(s), we add the smallest expected codeword length of the single-channel Huffman codes constructed for the reduced multiset of probabilities on different channels. This sum is the expected codeword length of a code which can be constructed explicitly. Note that a single-channel Huffman code falls in one of the merge sequences, so that we can ensure the code we produce this way has a redundancy no larger than a single-channel Huffman code.

TABLE VI: Suboptimal Code Construction
merge number of remaining masses
sequence 44 33 22 11
(2,2,2,2) 1.6143397835 1.6143397835 1.6143397835 1.6143397835
(2,2,3) 1.6143397835 1.6143397835 1.6240178515
(2,3,2) 1.6143397835 1.6583614487 1.6583614487
(3,2,2) 1.6056509846 1.6056509846 1.6056509846
(3,3) 1.6056509846 1.6929615368

VI Conclusion and Future Directions

In this paper, we showed that it is possible for a multi-channel source code to achieve a better compression than an optimal single-channel source code. We also presented the multi-channel analog of some classical results in single-channel source coding. Some future research directions on multi-channel source coding are:

  1. 1.

    Is an optimal multi-channel tree-decodable code an optimal prefix (or uniquely decodable) code?

  2. 2.

    Is there a metric with which the pruning strategy results in an optimal multi-channel Huffman code?

  3. 3.

    How to extend adaptive, canonical and run-length Huffman codes into their multi-channel version?

  4. 4.

    How to restrict the expected codeword length for each channel and to balance the usage of each channel?

References

  • [1] D. A. Huffman, “A method for the construction of minimum-redundancy codes,” Proceedings of the IRE, vol. 40, no. 9, pp. 1098–1101, 1952.
  • [2] H. Yao and R. W. Yeung, “Zero-error multichannel source coding,” in 2010 IEEE Information Theory Workshop on Information Theory (ITW), Jan. 2010, pp. 1–5.
  • [3] H. H. F. Yin, K. H. Ng, Y. T. Shing, R. W. F. Lai, and X. Wang, “Decision procedure for the existence of two-channel prefix-free codes,” in 2019 IEEE International Symposium on Information Theory (ISIT), Jul. 2019, pp. 1522–1526.
  • [4] M. J. Golin, C. Kenyon, and N. E. Young, “Huffman coding with unequal letter costs,” in Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing (STOC), May 2002, pp. 785–791.
  • [5] H. H. F. Yin, K. H. Ng, Y. T. Shing, R. W. F. Lai, and X. Wang, “Decision procedure for the existence of two-channel prefix-free codes,” arXiv preprint arXiv:1904.12112, Apr. 2019.
  • [6] J. Karush, “A simple proof of an inequality of McMillan,” IRE Transactions on Information Theory, vol. 7, no. 2, pp. 118–118, 1961.
  • [7] C. E. Shannon, “A mathematical theory of communication,” Bell System Technical Journal, vol. 27, no. 3, pp. 379–423, 1948.
  • [8] R. W. Yeung, “Local redundancy and progressive bounds on the redundancy of a Huffman code,” IEEE Transactions on Information Theory, vol. 37, no. 3, pp. 687–691, May 1991.
  • [9] J. van Leeuwen, “On the construction of Huffman trees,” in The Third International Colloquium on Automata, Languages and Programming (ICALP), Jul. 1976, pp. 382–410.
  • [10] R. W. Yeung, Information Theory and Network Coding.   Springer, 2008.
  • [11] R. W. Yeung, “On noiseless diagnosis,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 24, no. 7, pp. 1074–1082, Jul. 1994.