Maxflow-Based Bounds for Low-Rate
Information Propagation over Noisy Networks

Yan Hao Ling and Jonathan Scarlett

Abstract

We study error exponents for the problem of low-rate communication over a directed graph, where each edge in the graph represents a noisy communication channel, and there is a single source and destination. We derive maxflow-based achievability and converse bounds on the error exponent that match when there are two messages and all channels satisfy a symmetry condition called pairwise reversibility. More generally, we show that the upper and lower bounds match to within a factor of 4. We also show that with three messages there are cases where the maxflow-based error exponent is strictly suboptimal, thus showing that our tightness result cannot be extended beyond two messages without further assumptions.

⁰⁰0The authors are with the Department of Computer Science and Department of Mathematics, School of Computing, National University of Singapore (NUS). Jonathan Scarlett is also with the Institute of Data Science, NUS. Emails: [email protected]; [email protected] This work was supported by the Singapore National Research Foundation (NRF) under grant number R-252-000-A74-281.

I Introduction

Problems of low-rate (e.g., 1-bit) information propagation in noisy networks have recently gained increasing attention, with prominent examples including the following:

•

The transmission of a single bit over a tandem of channels (the 2-hop setting) was studied in [1, 2, 3], and the multi-bit variant was studied in [4]. The goal is to characterize the associated error exponent; as we further discuss below, this line of works is the one most relevant to the present paper.
•

The transmission of a single bit over a long chain of noisy channels (connected by relays) dates back to the work of Schulman and Rajagopalan [5] and recently regained attention [1, 6, 7] after being posed by Polyanskiy (who termed it the information velocity problem) and connected to the 2-hop setting by Huleihel, Polyanskiy, and Shayevitz [1].
•

Another line of works studied broadcasting problems on various kinds of graphs, such as trees [8], random graphs [9], and grids [10]. Here the goal is to reconstruct the original bit from the information received at the leaves, and the focus has been on simple intermediate nodes performing memoryless operations. (In contrast, the problems described above allow the intermediate/relay nodes to perform complicated coding strategies.)

In this paper, we address a notable gap in this list by extending the 2-hop problem to transmission over a fixed but arbitrary directed graph, with a single source and a single destination.¹¹1General graphs also fall under the framework of [5], even without the constraint that its size is fixed with respect to the block length, but the focus therein is only on scaling laws and not precise constants or error exponents. An extensive range of channel capacity results are known for such settings [11, Ch. 15], but 1-bit and other low-rate settings have remained less explored.

Specifically, we generalize the 2-hop multi-bit setup to arbitrary networks and show a maxflow-based achievability bound (Theorem 3), which supersedes our main previous multi-bit result for the 2-hop setting [4]. We also provide a converse bound (Theorem 4) based on the cutset idea, which matches the achievability bound in the 1-bit case when every channel satisfies a symmetry condition called pairwise reversibility (see Definition 7 below). While we are unable to find the optimal error exponent more generally, our achievability and converse bounds provide a 4-approximation (Lemma 122).

We proceed by outlining the related work in more detail, and then formally describe the problem setup and state our main results.

I-A Related Work

Tandem of channels (2-hop). The problem of transmitting a single bit over a tandem of channels was introduced by Huleihel, Polyanskiy, and Shayevitz [1], as well as Jog and Loh using different motivation/terminology based on teaching and learning in multi-agent problems [2]. Both of these works gave various protocols with varying trade-offs between their simplicity and their associated error exponents. We then showed in [3] that the 1-hop and 2-hop error exponents coincide whenever the two channels have the same transition law. In [4], we generalized this problem to the multi-bit setting and showed that whenever both channels are pairwise reversible, the 2-hop error exponent is simply given by the bottleneck (i.e., the worse of the two 1-hop error exponents).

Information velocity (many-hop). As we outlined above, the information velocity problem seeks to establish the minimum possible ratio of the number of hops to the block length to reliably send a single bit over a long chain of relays connected by binary symmetric channels (or more generally, other channels). This problem also has interesting connections to the 2-hop setting [1].

A general result of Schulman and Rajagopalan [5] translates noiseless protocols (on arbitrary graphs) to noisy ones, and when specialized to the line graph, this proves that positive information velocity is attainable. We gave a simple and efficient protocol achieving positive information velocity in [6], though precisely characterizing the best possible information velocity remains very much open. The information velocity of erasure-type channels was studied in [7], and for this class of channels an exact characterization was attained.

Broadcasting problems. As mentioned above, problems of broadcasting [8, 9, 10] have a similar flavor to the information velocity problem, but with some substantial differences including (i) the consideration of more complicated graphs, (ii) the combination of many nodes’ information to form the final estimate, and (iii) the consideration of simple memoryless intermediate nodes without the use of coding.

Capacity results on graphs. Another notable line of works has sought to characterize the channel capacity of noisy networks, as summarized in [12, Sec. 15.10] and [11, Ch. 15]. The study of capacity is largely quite distinct from that of 1-bit (or other low-rate) communication, but we will see some similarities throughout the paper. In particular, our results are based on maxflow and mincut ideas, which are widely used in capacity results. Instead of using 1-hop channel capacities as edge weights for the maxflow problem, we will use 1-hop error exponents. Our protocols and analysis will be substantially different compared to capacity results and will require further assumptions such as pairwise reversibility, but we will use an existing idea of replacing individual edges by multiple parallel edges of smaller weight in order to form edge-disjoint paths (e.g., see [11, Sec. 15.3]).

II Problem Setup

We first describe the classical channel coding setup on a discrete memoryless channel (DMC) $P$ , whose alphabets are denoted by $\mathcal{X}$ and $\mathcal{Y}$ (or sometimes $\mathcal{X}_{P}$ and $\mathcal{Y}_{P}$ ). The source node is given a message $m\in\{1,2,\ldots,M\}$ , and has $n$ uses of the channel $P$ to send information across to the destination node, after which the destination node guesses the value of $m$ . An error is said to have occurred if the destination’s guess is different from $m$ . The minimum probability of an error across all protocols with $n$ time steps is denoted by $p_{\rm err}(n,M,P)$ , and the error exponent of the channel with $M$ messages is defined as

\mathcal{E}_{M,P}=\liminf_{n\rightarrow\infty}-\frac{1}{n}p_{\rm err}(n,M,P),

(1)

where the limit is as $n\rightarrow\infty$ while $M$ remains fixed. We will use $\mathcal{E}$ , $\mathcal{E}_{M}$ or $\mathcal{E}_{P}$ whenever the omitted parameters are clear from the context.

In this paper, we consider a communication over a directed graph with edges representing noisy channels. Specifically, we are given a directed graph $G$ (not necessarily acyclic), with two special nodes called source and destination. Without loss of generality, we assume that there exists at least one path from the source to the destination. For every edge $e$ , there is a DMC $P_{e}$ . We note that for a given pair of nodes, say $(i,j)$ , we allow for the possibility of multiple parallel edges connecting $i$ to $j$ (though this could equivalently be viewed as a single edge whose associated channel is a product channel).

At time 0, the source node receives a message $m\in\{1,2,\ldots,M\}$ . For each time step, every node gets one use of the channel associated with its outgoing edge. This process occurs for a total of $n$ time steps. After $n$ time steps, the destination node forms an estimate of the message; let $p_{\rm err}(n,M,G)$ be the resulting error probability.²²2Our results are unaffected by whether this error probability is averaged over all $m$ or taken with respect to the worst-case $m$ , as is the case for the vast majority of error exponent studies (e.g, [13, 14]). We are interested in the error exponent of the network, given by

\mathcal{E}_{M,G}=\liminf_{n\to\infty}-\frac{1}{n}\log p_{\rm err}(n,M,G).

(2)

Note that we are taking the limit as $n\rightarrow\infty$ , while $M$ and $G$ remain fixed. When the context is clear, we will simply denote this as $\mathcal{E}_{G}$ .

For an edge $e$ , we let $\mathcal{E}_{e}$ be short for $\mathcal{E}_{P_{e}}$ (i.e., the 1-hop error exponent of the associated channel). For a fixed $(M,G)$ pair, we can create a network flow problem (see Section III-A) that assigns a capacity³³3This is “capacity” in a generic sense, and not the Shannon capacity. of $\mathcal{E}_{e}$ on every edge $e$ , and consider the maximum flow across this network. We will refer to this as the maxflow for the $(M,G)$ pair, and denote it by $|f_{M,G}|$ . We will also shorten this to $|f|$ when the context is clear.

Our main result is the following achievability result for channels that are pairwise reversible; this is a classical notion of symmetry [14] that notably includes the binary symmetric channel (BSC), binary erasure channel (BEC), and their generalizations to non-binary inputs. Briefly, the property is that the quantity $\sum_{y}P(y|x)^{1-s}P(y|x^{\prime})^{s}$ attains its minimum at $s=\frac{1}{2}$ for all $(x,x^{\prime})$ , which implies that the Chernoff distance between two inputs simplifies to the Bhattacharyya distance. See Definition 7 below for further details, and [14] for a more comprehensive treatment with numerous examples.

Theorem 1.

If every channel in $G$ is pairwise reversible, then the error exponent $|f_{M,G}|$ is achievable, i.e.,

\mathcal{E}_{M,G}\geq|f_{M,G}|.

(3)

Proof.

We provide the proof for the series case (i.e., a line graph) in Section IV, which we then use as a stepping stone for the general case in Section V. ∎

We note that this is an information-theoretic achievability result, and we do not attempt to establish an associated computationally efficient protocol (though doing so would be of interest). By comparison, our related works on 2-hop settings contained various results based on both efficient and inefficient protocols [3, 4].

We also prove the following converse bound:

Theorem 2.

For any pair $(M,G)$ , we have

\mathcal{E}_{M,G}\leq|f_{2,G}|.

(4)

Proof.

See Section VI. ∎

Combining Theorems 3 and 4, we establish that equality holds whenever every channel is pairwise reversible and $M=2$ :

Corollary 3.

If every channel in $G$ is pairwise reversible, then

\mathcal{E}_{2,G}=|f_{2,G}|.

(5)

We are also interested in the zero-rate error exponent. Momentarily returning to classical 1-hop communication, for $R>0$ , we define the error exponent of the channel $P$ at rate $R$ by

E_{P}(R)=\liminf_{n\rightarrow\infty}\left\{-\frac{1}{n}\log p_{\rm err}(n,e^{nR},P)\right\},

(6)

and we extend this function to $R=0$ via $E_{P}(0)=\lim_{R\to 0^{+}}E_{P}(R)$ . Analogously, in the setting of the present paper, we let $E_{G}(R)$ and $E_{G}(0)$ denote the zero-rate error exponent of the network. We can again consider the network flow problem which assigns a capacity of $E_{e}(0)$ on every edge $e$ , which we will denote by $|f_{G}(0)|$ or sometimes simply $|f|$ . The following result extends Theorem 3 to zero-rate error exponents, even without requiring the pairwise reversible assumption:

Theorem 4.

For any $G$ , whose channels need not be pairwise reversible, we have

E_{G}(0)\geq|f_{G}(0)|.

(7)

Proof.

See Section VII-B. ∎

Some additional results are given in Sections VII and VIII, and are briefly outlined as follows:

•

Lemma 27 states an achievability result for channels that may not be pairwise reversible; the idea is to use a weaker “symmetrized” error exponent on each edge in the graph, rather than the optimal 1-hop exponent.
•

In Theorem 29, we provide conditions (albeit somewhat restrictive) under which a matching converse can be obtained for pairwise reversible channels with $M>2$ .
•

In Lemma 122 we establish the optimal exponent to within a factor of $4$ for general $M$ and $G$ , and in Lemma 33 we provide conditions under which this can be improved to a factor of $2$ .
•

In Theorem 35, we show that in general there exist scenarios in which the maxflow-based error exponent is strictly suboptimal when $M=3$ , even under the constraint of an acyclic graph and pairwise reversible channels. Thus, we place a strong restriction on the extent to which our tightness result for $M=2$ can be generalized.

III Preliminaries

In this section, we introduce some additional notation and definitions, and provide a number of useful auxiliary results that will be used for proving our main results.

III-A Flows and Cuts

We momentarily depart from the above setup and consider generic notions on graphs. Consider a directed graph $G=(V,E)$ , with two special nodes called the source $v_{s}$ and destination $v_{t}$ . Suppose that on every edge $e$ , there is a non-negative real-valued capacity, denoted $c_{e}$ .

Let $f=\{f_{e}\}_{e\in E}$ be a function from $E$ to $\mathbb{R}_{\geq 0}$ . We say that $f$ is a flow if the following conditions hold:

•

(Capacity constraints) For each $e$ , $0\leq f_{e}\leq c_{e}$ .
•

(Flow conservation) For each $v\in V$ , except for the source and destination, we have

$\sum_{u:(u,v)\in E}f_{u,v}=\sum_{v:(v,w)\in E}f_{v,w}.$ (8)

Define the size of a flow $f$ as

|f|=\sum_{u:(u,v_{s})\in E}f_{u,v_{s}}.

(9)

A flow $f$ is maximal if for any other flow $f^{\prime}$ , $|f|\geq|f^{\prime}|$ .

Closely related to flows is the concept of cuts. A cut $c$ is a partition of $V$ into two sets $(A,B)$ such that $v_{s}\in A$ and $v_{t}\in B$ . Define the size of a cut $c$ as

|c|=\sum_{u\in A,v\in B}f_{u,v},

(10)

where we emphasize that this only counts edges from $A$ to $B$ and not back-edges. A cut $c$ is minimal if for any cut $c^{\prime}$ , $|c|\leq|c^{\prime}|$ . We now state the well-known maxflow-mincut duality theorem:

Theorem 5.

(Maxflow-mincut duality theorem, e.g., [15]) The size of the maximum flow is equal to the size of the minimum cut.

The following result is also well known, but we provide a short proof for the interested reader.

Theorem 6.

(Flow decomposition theorem, e.g., [15]) Let $f$ be any flow. There exist finitely many simple paths⁴⁴4A path is simple if it never visits the same node more than once. from $v_{s}$ to $v_{t}$ , denoted by $p_{1},p_{2},\ldots,p_{k}$ , with associated flows $f_{1},f_{2},\ldots,f_{k}$ , such that

•

Each $f_{i}$ flows only along path $p_{i}$ . In other words, $(f_{i})_{e}=|f_{i}|$ for all $e\in p_{i}$ and $(f_{i})_{e}=0$ otherwise.
•

The total sum of all $f_{i}$ is equal to $f$ . In other words, we have for all $e$ that

$\sum_{i}(f_{i})_{e}=f_{e}.$ (11)

Proof.

We describe a procedure for iteratively building up a list of $(p_{i},f_{i})$ pairs. Given $f$ , let $p_{i}$ be any path that has non-zero flow across every edge. Let $|f_{i}|$ be the smallest among these (non-zero) values, and let $f_{i}$ be the flow formed by pushing $|f_{i}|$ units of flow along the path $p_{i}$ . We then append $(p_{i},f_{i})$ to the list of pairs, subtract $|f_{i}|$ along every edge on $p_{i}$ , and repeat the preceding steps until no flow remains. Since the number of active edges (i.e., those with non-zero flow) strictly decreases on every step of this process, it must terminate and produce a finite number of paths, as desired. ∎

III-B Chernoff Divergence

Let $P(y|x)$ be a generic discrete memoryless channel. To lighten notation, we let $P_{x}(\cdot)$ denote the conditional distribution $P(\cdot|x)$ . An $(M,\ell)$ -codebook is defined to be a collection of $M$ codewords each having length $\ell$ , and when utilizing such a codebook, we will use the notation $(x^{(1)},\ldots,x^{(M)})$ for the associated codewords.

For two probability distributions $Q,Q^{\prime}$ over some finite set $\mathcal{X}$ , the Bhattacharyya distance is defined as

d_{\rm B}(Q,Q^{\prime})=-\log\sum_{x\in\mathcal{X}}\sqrt{Q(x)Q^{\prime}(x)}.

(12)

For random variables $X_{1},X_{2}$ , we will also use $d_{\rm B}$ to refer to the Bhattacharyya distance between their distributions $P_{X_{1}}$ and $P_{X_{2}}$ , i.e.

d_{\rm B}(X_{1},X_{2})=d_{\rm B}(P_{X_{1}},P_{X_{2}}),

(13)

and for $x,x^{\prime}\in\mathcal{X}_{P}$ , we define the Bhattacharyya distance associated with two channel inputs as

d_{\rm B}(x,x^{\prime},P)=d_{\rm B}(P_{x},P_{x^{\prime}})

(14)

with a slight abuse of notation.

Generalizing the Bhattacharyya distance, the Chernoff divergence with parameter $s$ is given by

d_{\rm C}(Q,Q^{\prime},s)=-\log\sum_{x\in\mathcal{X}}Q(x)^{1-s}Q^{\prime}(x)^{s},

(15)

and the Chernoff divergence (with optimized $s$ ) is given by

d_{\rm C}(Q,Q^{\prime})=\max_{0\leq s\leq 1}d_{\rm C}(Q,Q^{\prime},s).

(16)

Analogous to (13)–(14), we also write

	$\displaystyle d_{\rm C}(X_{1},X_{2})=d_{\rm C}(P_{X_{1}},P_{X_{2}}),$		(17)
	$\displaystyle d_{\rm C}(x,x^{\prime},P)=d_{\rm C}(P_{x},P_{x^{\prime}}).$		(18)

For a certain class of channels, the quantities $d_{\rm C}$ and $d_{\rm B}$ coincide, giving us the following definition:

Definition 7.

[14] A discrete memoryless channel is pairwise reversible if, for all $x,x^{\prime}\in\mathcal{X}_{P}$ ,

d_{\rm B}(x,x^{\prime},P)=d_{\rm C}(x,x^{\prime},P),

(19)

which occurs when the quantity

\sum_{y\in\mathcal{Y}_{P}}P(y|x)^{1-s}P(y|x^{\prime})^{s}

(20)

attains its minimum at $s=\frac{1}{2}$ .

The pairwise reversibility assumption allows for a number of useful results and constructions that are not available for general channels. In particular, Lemma 12 below gives an explicit expression for the 1-hop error exponent, and Lemma 29 below is based on an explicit construction that attains that exponent. We again refer the reader to [14] for a detailed discussion on pairwise reversibility with several examples.

For any positive integer $k$ , we let $P^{k}$ denote the $k$ -fold product of $P$ , with probability mass function

P^{k}(\vec{y}|\vec{x})=\prod_{i=1}^{k}P(y_{i}|x_{i}).

(21)

For two sequences $\vec{x},\vec{x}^{\prime}$ of length $k$ , we also use the notation $d_{\rm B}(\vec{x},\vec{x}^{\prime},P^{k})$ and $d_{\rm C}(\vec{x},\vec{x}^{\prime},P^{k})$ similarly to (14) and (18), with the understanding that $\vec{x},\vec{x}^{\prime}$ are treated as inputs to $P^{k}$ . We will use a well-known tensorization property of $d_{\rm C}$ (and hence also $d_{\rm B}$ ), stated as follows.

Lemma 8.

(e.g., see [4, Lemma 8]) For any sequences $\vec{x}=(x_{1},\dotsc,x_{k})$ and $\vec{x}^{\prime}=(x^{\prime}_{1},\dotsc,x^{\prime}_{k})$ , we have

d_{\rm C}(\vec{x},\vec{x}^{\prime},P^{k},s)=\sum_{i=1}^{k}d_{\rm C}(x_{i},x^{\prime}_{i},P,s),

(22)

and

d_{\rm C}(\vec{x},\vec{x}^{\prime},P^{k})=\max_{0\leq s\leq 1}\sum_{i=1}^{k}d_{\rm C}(x_{i},x^{\prime}_{i},P,s).

(23)

III-C 1-hop Error Exponents

We will use several results from [14] on the 1-hop error exponents. We first state two closely-related results that are implicit in [14, Thm. 2] and its proof.

Lemma 9.

(Implicit in [14, Thm. 2]) Letting $\vec{y}\in\mathcal{Y}^{n}$ be the received sequence from $n$ uses of the channel $P$ , we have for any fixed $M$ and any sequence of codebooks (indexed by $n$ ) that

\liminf_{n\to\infty}-\frac{1}{n}\log p_{\rm err}\geq\liminf_{n\to\infty}\min_{m_{1}\neq m_{2}}d_{\rm C}((\vec{y}|m=m_{1}),(\vec{y}|m=m_{2})).

(24)

Theorem 10.

(Implicit in [14, Thm. 2]) For any $\mathcal{E}^{\dagger}<\mathcal{E}^{(1)}_{M}$ , it holds for all sufficiently large $\ell$ that there exists an $(M,\ell)$ -codebook $\mathcal{C}$ such that for all pairs of codewords ( $\vec{x},\vec{x}^{\prime})$ in $\mathcal{C}$ , we have

d_{\rm C}(\vec{x},\vec{x}^{\prime},P^{\ell})\geq\ell\cdot\mathcal{E}^{\dagger}.

(25)

Next, we state explicit formulas/bounds for error exponents in the zero-rate and fixed- $M$ settings, respectively.

Theorem 11.

[14, Thm. 4] For any $P$ , the zero-rate error exponent is given by

E_{P}(0)=\max_{q\in\mathcal{P}(\mathcal{X})}\sum_{x,x^{\prime}\in\mathcal{X}}q_{x}q_{x^{\prime}}d_{\rm B}(x,x^{\prime},P).

(26)

Lemma 12.

[14, Thm. 3] For any $(M,P)$ , we have

\mathcal{E}_{M,P}\geq\frac{2}{M(M-1)}\max_{(x_{1},\ldots,x_{M})\in\mathcal{X}^{M}}\sum_{1\leq m_{1}<m_{2}\leq M}d_{\rm B}(x_{m_{1}},x_{m_{2}},P),

(27)

with equality when $P$ is pairwise reversible.

In accordance with Lemma 12, we will use the following definition:

Definition 13.

Define the Bhattacharyya distance based error exponent by

\tilde{\mathcal{E}}_{M,P}=\frac{2}{M(M-1)}\max_{(x_{1},\ldots,x_{M})\in\mathcal{X}^{M}}\sum_{1\leq m_{1}<m_{2}\leq M}d_{\rm B}(x_{m_{1}},x_{m_{2}},P),

(28)

and note that $\tilde{\mathcal{E}}_{M,P}=\mathcal{E}_{M,P}$ for pairwise reversible channels.

In the proof of Lemma 12, [14] also showed the following:

Lemma 14.

For any DMC $P$ , there exists an $(M,\ell)$ -codebook $(\vec{x}_{1},\ldots,\vec{x}_{M})$ with $\ell=M!$ such that for any pair $m_{1}\neq m_{2}$ , it holds that

d_{\rm B}(\vec{x}_{m_{1}},\vec{x}_{m_{2}},P^{\ell})=\ell\cdot\tilde{\mathcal{E}}_{M,P}.

(29)

Since we will make use of codes over product channels, the following lemma is useful:

Lemma 15.

Let $\ell$ be a fixed integer and $\vec{x}_{1},\vec{x}_{2},\ldots,\vec{x}_{M}$ be length- $\ell$ codewords, and let $Q$ be the restriction of $P^{\ell}$ to these inputs. Then $\mathcal{E}_{P}\geq\frac{1}{\ell}\mathcal{E}_{Q}$ . Furthermore, if $P$ is pairwise reversible, then $Q$ is also pairwise reversible.

Proof.

Let $\vec{w}_{1},\ldots,\vec{w}_{M}$ be codewords achieving an error exponent arbitrarily close to $\mathcal{E}_{Q}$ . Since each character in $\vec{w}_{i}$ is a string of $\ell$ characters over $X$ , we can expand the strings $\vec{w}_{1},\ldots,\vec{w}_{M}$ to form $\vec{w}_{1}^{\prime},\ldots,\vec{w}_{M}^{\prime}$ , achieving the same error probability when used as codewords for $P$ . The $\frac{1}{\ell}$ scaling factor comes from the factor of $\ell$ difference in the block length.

For the second claim, we apply Lemma 23 to obtain

d_{\rm C}(\vec{x}_{m_{1}},\vec{x}_{m_{2}},Q)=\sum_{i=1}^{\ell}d_{\rm C}(\vec{x}_{m_{1}}(i),\vec{x}_{m_{2}}(i),P),

(30)

Since $P$ is pairwise reversible, the right-hand side attains its minimum at $s=1/2$ , which means that $Q$ is also pairwise reversible. ∎

The next lemma allows us to break down the error exponent of a product channel into its individual components when $M=2$ . (We note that this result does not extend to $M>2$ .)

Lemma 16.

Let $P^{(1)},P^{(2)},\ldots,P^{(k)}$ be arbitrary channels, and let $\hat{P}=P^{(1)}\times P^{(2)}\times\ldots\times P^{(k)}$ be the product channel. Then

\mathcal{E}_{2,\hat{P}}\leq\sum_{i}\mathcal{E}_{2,P^{(i)}}.

(31)

Proof.

For any DMC, the error exponent for $M=2$ is given as follows [14, Eq. (1.19)]:

\mathcal{E}_{2,P}=\max_{x_{1},x_{2}\in\mathcal{X}_{P}}d_{\rm C}(x_{1},x_{2},P).

(32)

Accordingly, we choose $\vec{x}_{1},\vec{x}_{2}\in\prod_{i}{\mathcal{X}_{P^{(i)}}}$ attaining the maximum in

\mathcal{E}_{2,\hat{P}}=\max_{\vec{x}_{1},\vec{x}_{2}}d_{\rm C}(\vec{x}_{1},\vec{x}_{2},\hat{P}),

(33)

and let $\vec{x}_{1}(i)$ and $\vec{x}_{2}(i)$ represent the $i$ -th component of $\vec{x}_{1}(i)$ and $\vec{x}_{2}(i)$ respectively. Then,

$\displaystyle d_{\rm C}(\vec{x}_{1},\vec{x}_{2},P)$	$\displaystyle=$	$\displaystyle\max_{0\leq s\leq 1}d_{\rm C}(\vec{x}_{1},\vec{x}_{2},P,s)$	(34)
	$\displaystyle\stackrel{{\scriptstyle Lem.~{}\ref{lem:iid}}}{{=}}$	$\displaystyle\max_{0\leq s\leq 1}\sum_{i}d_{\rm C}(\vec{x}_{1}(i),\vec{x}_{2}(i),P^{(i)},s)$	(35)
	$\displaystyle\leq$	$\displaystyle\sum_{i}\max_{0\leq s\leq 1}d_{\rm C}(\vec{x}_{1}(i),\vec{x}_{2}(i),P^{(i)},s)$	(36)
	$\displaystyle\stackrel{{\scriptstyle\eqref{eq:two_codewords}}}{{\leq}}$	$\displaystyle\sum_{i}\mathcal{E}_{2,P^{(i)}}$	(37)

as required. ∎

The following result shows that we have equality in Lemma 31 if we restrict all $P^{(i)}$ to be pairwise reversible:

Lemma 17.

Let $P^{(1)},P^{(2)},\ldots,P^{(k)}$ be pairwise reversible channels and let $\hat{P}=P^{(1)}\times P^{(2)}\times\ldots\times P^{(k)}$ be the product channel. Then

\mathcal{E}_{M,\hat{P}}=\sum_{i}\mathcal{E}_{M,P^{(i)}}.

(38)

Proof.

We first note from the second part of Lemma 15 that $\hat{P}$ is pairwise reversible. As a result, by Lemma 12, there exists some $(\vec{x}_{1},\vec{x}_{2},\ldots,\vec{x}_{M})$ such that

\mathcal{E}_{\hat{P}}=\frac{2}{M(M-1)}\sum_{1\leq m_{1}<m_{2}\leq M}d_{\rm B}(\vec{x}_{m_{1}},\vec{x}_{m_{2}},\hat{P}).

(39)

Moreover, by Lemma 23, we can break this down into the Chernoff exponents for the individual channels:

\mathcal{E}_{\hat{P}}=\frac{2}{M(M-1)}\sum_{1\leq m_{1}<m_{2}\leq M}\sum_{i}d_{\rm B}(\vec{x}_{m_{1}}(i),\vec{x}_{m_{2}}(i),P^{(i)})\stackrel{{\scriptstyle Lem.~{}\ref{lem:opt_rev}}}{{\leq}}\sum_{i}\mathcal{E}_{P^{(i)}}.

(40)

To prove the inequality in the other direction, we simply reverse the entire argument. For each channel $P^{(i)}$ , choose $(x_{1}^{(i)},\ldots,x_{M}^{(i)})$ to attain equality in (27). Defining $\vec{x}_{m}=(x_{m}^{(i)})_{i=1}^{k}$ then gives

\mathcal{E}_{\hat{P}}\geq\frac{2}{M(M-1)}\sum_{1\leq m_{1}\leq m_{2}\leq M}d_{\rm B}(\vec{x}_{m_{1}},\vec{x}_{m_{2}},\hat{P})=\sum_{i}\mathcal{E}_{P^{(i)}},

(41)

where both steps are based on (27) holding with equality (first for $\hat{P}$ , and then for the individual $\mathcal{E}_{P^{(i)}}$ ). ∎

III-D Feedback Error Exponent

In the well-studied problem of 1-hop communication with feedback, the sender is given access to all of the previous channel outputs before sending the next channel input. For a discrete memoryless channel $P$ , we define the minimum probability of error with feedback $p_{\rm err}^{f}(n,M,P)$ , and define the error exponent of the channel with $M$ messages via

\mathcal{E}^{f}_{M,P}=\liminf_{T\rightarrow\infty}-\frac{1}{n}p_{\rm err}^{f}(n,M,P).

(42)

Clearly, $\mathcal{E}^{f}_{M,P}\geq\mathcal{E}_{M,P}$ . In general, feedback error exponents may be strictly higher than non-feedback ones, even for the binary symmetric channel [16]. However, the two error exponents match for arbitrary DMCs when $M=2$ :

Theorem 18.

[16, 17] For any DMC $P$ , feedback does not improve the error exponent when $M=2$ , i.e.

\mathcal{E}^{f}_{2,P}=\mathcal{E}_{2,P}.

(43)

III-E Properties of Chernoff Divergence

The following lemma concerns the Chernoff divergence of a composite channel:

Lemma 19.

[4, Lemma 7] Let $P_{1},P_{2}$ be channels such that $\mathcal{Y}_{P_{1}}\subseteq\mathcal{X}_{P_{2}}$ . For all $x,x^{\prime}\in\mathcal{X}_{P_{1}}$ , we have

d_{\rm C}(x,x^{\prime},P_{2}\circ P_{1},s)\geq\min_{y,y^{\prime}\in\mathcal{Y}_{P_{1}}}\big{\{}d_{\rm C}(y,y^{\prime},P_{2},s)-(1-s)\log P_{1}(y|x)-s\log P_{1}(y^{\prime}|x^{\prime})\big{\}}-2\log|\mathcal{Y}_{P_{1}}|.

(44)

The next result relates the error probability of a likelihood ratio test with the Chernoff divergence. We expect that this result is standard, but we provide a short proof for completeness.

Lemma 20.

Let $P$ and $Q$ be two distributions over the same alphabet $\mathcal{X}$ . For each $x\in\mathcal{X}$ , define the likelihood ratio by $L(x)=\frac{P(x)}{Q(x)}$ . Then

-\log\mathbb{P}_{x\sim P}(L(x)\leq L_{0})\geq d_{\rm B}(P,Q)-\frac{1}{2}\log L_{0}.

(45)

Proof.

Consider the set

S=\{x\,|\,L(x)\leq L_{0}\}.

(46)

For all $x\in S$ , $P(x)\leq L_{0}\cdot Q(x)$ , and therefore

P(x)=\sqrt{P(x)}\cdot\sqrt{P(x)}\leq\sqrt{P(x)}\cdot\sqrt{L_{0}\cdot Q(x)},

(47)

which gives

\mathbb{P}_{x\sim P}(L(x)\leq L_{0})=\sum_{x\in S}P(x)\leq\sum_{x\in S}\sqrt{L_{0}\cdot P(x)\cdot Q(x)}\leq\exp(-d_{\rm B}(P,Q))\cdot\sqrt{L_{0}}.

(48)

∎

IV The Series Case for Theorem 3

In this section, we handle the special case in which the nodes are arranged in a series (i.e., a line graph). That is, the nodes are labeled $\{0,1,\ldots,|E|\}$ , and the edges are $\{(j,j+1):0\leq j<|E|\}$ . We refer to the corresponding channels as $P^{(1)},P^{(2)},\ldots P^{(|E|)}$ , with corresponding 1-hop error exponents $\mathcal{E}^{(1)},\mathcal{E}^{(2)},\ldots\mathcal{E}^{(|E|)}$ . All channels are assumed to be pairwise reversible in this section.

This series setup directly generalizes previous work on the 2-hop setting in [1, 2, 3, 4], particularly our work on the the multi-bit setting [4]. Specifically, Theorem 3 for the series setup generalizes the main result therein by extending it from 2 hops to any constant number of hops. The protocol that we adopt has some significant differences to [4], which we discuss in Section IV-D. On the other hand, a key similarity is that we still adopt a block-structured strategy.

IV-A Sequential Block Transmission

We momentarily turn our attention to protocols in which nodes transmit blocks to each other one after the other without any parallel transmission. This idea would be highly wasteful if used in a standalone manner, but the idea (in the next subsection) will be to chain many instances of this approach together in parallel so that a negligible fraction of the total block length $n$ is ultimately wasted.

For a series graph, define a sequential block transmission with block size $B$ as follows:

•

For each edge $e=(u,v)$ , $u$ transmits to $v$ using $B$ channels across $e$ ;
•

This process occurs sequentially; transmission over an earlier edge (i.e., closer to the source) must finish before transmission over the next edge commences.

In Section IV-C, we will prove the following key lemma for this setting:

Lemma 21.

Given a series graph, consider a sequential block transmission with block size $B$ . Let $\vec{y}$ be the length- $B$ sequence received by the final node $|E|$ , and let $|f|$ be the maximum flow (since we are in the series setup, this is equal to $\min_{j}\mathcal{E}^{(j)}$ ). Then there exists a protocol such that for all $m_{1}\neq m_{2}$ ,

d_{\rm B}\big{(}(\vec{y}|m=m_{1}),(\vec{y}|m=m_{2})\big{)}\geq B(|f|-o_{B}(1)),

(49)

where $o_{B}(1)$ is a term that approaches zero as $B$ increases while $G$ and $M$ remain constant.

IV-B Implication for our Main Problem Setting

Returning to our main problem setting in which nodes are not required to transmit one-after-the-other, we obtain the following corollary of Lemma 21.

Figure 1: An illustration of how a total of

n

time steps can be broken down into

n/B-|E|

sequential block transmissions. In this diagram,

n=5B

and

|E|=2

Corollary 22.

Given a series graph with $n$ time steps, there exists a protocol such that for any $m_{1}\neq m_{2}$ , the length- $n$ sequence $\vec{y}_{\rm all}$ received at node $|E|$ satisfies

d_{\rm B}\big{(}(\vec{y}_{\rm all}|m=m_{1}),(\vec{y}_{\rm all}|m=m_{2})\big{)}\geq(n-|E|B)\cdot(|f|-o_{B}(1)),

(50)

where $o_{B}(1)$ is a term that approaches zero as $B$ increases while $G$ and $M$ remain constant (without any dependence on $n$ ).

Proof.

Consider the process of chaining several sequential block transmissions in parallel, as depicted in Figure 1. If there are $t$ sequential block transmissions of length $B$ , then they can clearly be executed in $(t+|E|)\cdot B$ time steps. In other words, with a block length of $n$ , we can perform $n/B-|E|$ sequential block transmissions (up to asymptotically negligible rounding issues).

For each such sequential block transmission, we use the protocol that achieves Lemma 21. Since memory is not stored between the blocks, the blocks are conditionally independent given the source message. Let $\vec{y}(t)$ represent the $t$ -th block received by the final node, and let $\vec{y}_{\rm all}=(\vec{y}(1),\vec{y}(2),\ldots,\vec{y}(n/B-|E|))$ . Then, using tensorization (Lemma 23) and Lemma 21, we have for all $m_{1}\neq m_{2}$ that

	$\displaystyle d_{\rm B}\big{(}(\vec{y}_{\rm all}\|m=m_{1}),(\vec{y}_{\rm all}\|m=m_{2})\big{)}$	(51)
$\displaystyle=$	$\displaystyle\sum_{t=1}^{n/B-\|E\|}d_{\rm B}\big{(}(\vec{y}(t)\|m=m_{1}),(\vec{y}(t)\|m=m_{2})\big{)}$	(51)
$\displaystyle\geq$	$\displaystyle(n/B-\|E\|)B(\|f\|-o_{B}(1))$	(52)
$\displaystyle=$	$\displaystyle(n-\|E\|B)\cdot(\|f\|-o_{B}(1)),$	(53)

as required. ∎

Then, the error exponent is bounded by

	$\displaystyle\liminf_{n\to\infty}-\frac{1}{n}\log p_{\rm err}$	(54)
$\displaystyle\geq$	$\displaystyle\liminf_{n\to\infty}\min_{m_{1}\neq m_{2}}d_{\rm B}\big{(}(\vec{y}_{\rm all}\|m=m_{1}),(\vec{y}_{\rm all}\|m=m_{2})\big{)}$	(55)
$\displaystyle=$	$\displaystyle\liminf_{n\to\infty}\frac{n-\|E\|B}{n}(\|f\|-o_{B}(1))$	(56)
$\displaystyle=$	$\displaystyle\|f\|-o_{B}(1),$	(57)

where (55) follows from Lemma 9, and (56) follow from Corollary 22. This proves that the error exponents arbitrarily close to $|f|$ are achievable, which proves Theorem 3 for the series case.

IV-C Proof of Lemma 21

We first show that it suffices to prove Lemma 21 under an additional assumption that will be convenient to work with:

Lemma 23.

Suppose that Lemma 21 holds under the additional restriction that each channel $P^{(j)}$ has $M$ inputs $\{1,2,\ldots,M\}$ such that $d_{\rm B}(m_{1},m_{2},P^{(j)})\geq|f|$ for all $m_{1}\neq m_{2}$ . Then the general form of Lemma 21 as stated also holds.

Proof.

Consider the general setup of Lemma 21 with arbitrary $P^{(j)}$ . By Lemma 29, for each $j$ , there exist codewords $\vec{x}^{(j)}_{1},\vec{x}^{(j)}_{2},\ldots,\vec{x}^{(j)}_{M}$ of length $\ell=M!$ such that

d_{\rm B}(\vec{x}^{(j)}_{m_{1}},\vec{x}^{(j)}_{m_{2}},(P^{(j)})^{\ell})\geq\ell\cdot\mathcal{E}_{M,P^{(j)}}.

(58)

Let $Q^{(j)}$ be the restriction of $(P^{(j)})^{\ell}$ to the codewords $\vec{x}^{(j)}_{1},\vec{x}^{(j)}_{2},\ldots,\vec{x}^{(j)}_{M}$ . We first show that the channels $Q^{(j)}$ satisfy the extra requirement stated in Lemma 23.

By Lemma 15, $Q^{(j)}$ is pairwise reversible. Moreover, for each $m_{1}\neq m_{2}$ , we have from (58) that

d_{\rm B}(\vec{x}^{(j)}_{m_{1}},\vec{x}^{(j)}_{m_{2}},Q^{(j)})=d_{\rm B}(\vec{x}^{(j)}_{m_{1}},\vec{x}^{(j)}_{m_{2}},(P^{(j)})^{\ell})\geq\ell\cdot\mathcal{E}_{M,P^{(j)}}.

(59)

Then, we deduce from Lemma 12 (with the $M$ inputs of $Q^{(j)}$ being substituted for $(x_{1},x_{2},\ldots,x_{M})$ ) that $\mathcal{E}_{M,Q^{(j)}}\geq\ell\cdot\mathcal{E}_{M,P^{(j)}}$ . Since the maxflow is $|f|=\min_{j}\mathcal{E}_{M,P^{(j)}}$ in the series setup, this gives $\mathcal{E}_{M,Q^{(j)}}\geq\ell\cdot|f|$ . Then, let $|f^{\prime}|$ be the maximum flow for the sequence of channels $Q^{(1)},\ldots,Q^{(|E|)}$ . Since $\mathcal{E}_{M,Q^{(j)}}\geq\ell\cdot|f|$ for all $j$ , the new maximum flow satisfies $|f^{\prime}|\geq\ell\cdot|f|$ . On the other hand, by Lemma 15, we have $\mathcal{E}_{Q^{(j)}}\leq\ell\cdot\mathcal{E}_{P^{(j)}}$ , which gives

|f^{\prime}|=\min_{j}\mathcal{E}_{Q^{(j)}}\leq\min_{j}\ell\cdot\mathcal{E}_{P^{(j)}}\leq\ell\cdot|f|.

(60)

Combining these upper and lower bounds on $|f^{\prime}|$ , we conclude that $|f^{\prime}|=\ell\cdot|f|$ , and from (59) we deduce that the channels $Q^{(j)}$ satisfy the additional requirement imposed in the statement of Lemma 23 (with $|f|$ renamed to $|f^{\prime}|$ ).

Now suppose that Lemma 21 holds for the new sequence of channels $Q^{(1)},\ldots,Q^{(|E|)}$ , and let $\vec{y}^{\prime}$ denote the corresponding length- $B$ sequence received by the destination node. This means that there exists a sequential block transmission protocol of block size $B$ such that for any $m_{1}\neq m_{2}$ ,

d_{\rm B}\big{(}(\vec{y}^{\prime}|m=m_{1}),(\vec{y}^{\prime}|m=m_{2})\big{)}\geq B(|f^{\prime}|-o_{B}(1))\geq B(\ell|f|-o_{B}(1)).

(61)

Any sequential block transmission with block size $B$ using the new channels $Q^{(1)},\ldots,Q^{(|E|)}$ can easily be converted to a sequential block transmission with block size $B\cdot\ell$ over the original channels $P^{(1)},\ldots,P^{(|E|)}$ by simply expanding instances of the new letter $m$ by the codeword $\vec{x}_{M}$ . Therefore, there exists a sequential block transmission of block size $B\cdot\ell$ such that for any $m_{1}\neq m_{2}$ ,

d_{\rm B}\big{(}(\vec{y}|m=m_{1}),(\vec{y}|m=m_{2})\big{)}\geq B(\ell|f|-o_{B}(1)).

(62)

Since $\ell$ is a constant, we can appropriately rescale such that $B\ell$ above is renamed to $B$ (the overall sequential block transmission length), and we conclude that Lemma 21 holds for the original channels $P^{(1)},\ldots,P^{(|E|)}$ . ∎

For the rest of this section, we work under the extra assumption stated in Lemma 23. We also assume that $B$ is even to avoid rounding issues.

IV-C1 Description of the Forwarding Protocol

We consider the following protocol for a single sequential block transmission of length $B$ :

•

Every node $j$ ( $0\leq j\leq|E|$ ) is assigned a state consisting of two integers $(m^{(j)},\ell^{(j)})$ with $1\leq m^{(j)}\leq M$ and $0\leq\ell^{(j)}\leq B/2$ , which we will refer to as $(m,\ell)$ when there is no ambiguity. A state of $(m^{(j)},\ell^{(j)})$ roughly indicates that the node believes $m^{(j)}$ to be the most likely value of $m$ , where larger values of $\ell^{(j)}$ indicate higher confidence, whereas a smaller value of $\ell^{(j)}$ indicates that there is another competing value $m^{\prime}\neq m^{(j)}$ that is also relatively likely (we make this more precise below).
•

Node 0 is assigned $m^{(0)}=m$ and $\ell^{(0)}=B/2$ , where $m$ is the actual message.
•

If node $j$ is assigned a state $(m,\ell)$ , then this node sends out a $B/2+\ell$ copies of $m$ followed by $B/2-\ell$ copies of $m+1$ (unless $m=M$ , in which case we wrap around and replace $m+1$ by $1$ here). We refer to the corresponding string as $\vec{x}_{m,\ell}$ , or $\vec{x}^{(j)}_{m,\ell}$ when we want to emphasize which node is being considered.
•

Node $j$ ( $j\geq 1$ ), upon receiving $\vec{y}^{(j)}$ from node $j-1$ , computes the likelihoods $\mathbb{P}(\vec{y}^{(j)}|m^{(0)}=m)$ for each possible message $1\leq m\leq M$ . Let $m^{*},m^{**}$ be the resulting most likely and second most likely value of $m$ respectively (with arbitrary tie-breaking).

•

Node $j$ is now assigned a state of

m^{(j)}=m_{1},\ell^{(j)}=\min\left(\frac{B}{2},\Bigg{\lfloor}\frac{1}{4|f|}\cdot\log\frac{\mathbb{P}(\vec{y}^{(j)}|m^{*})}{\mathbb{P}(\vec{y}^{(j)}|m^{**})}\Bigg{\rfloor}\right).

(63)

These equations are chosen such that $m^{(j)}$ reflects the most likely source message, and $\ell^{(j)}$ appropriately reflects the confidence level such that the most likely message $m^{(j)}$ is roughly $\exp(4\ell^{(j)}|f|)$ times more likely than the next value of $m$ .

IV-C2 Analysis of the Strategy

We start by defining a pseudometric on the states by

d((m_{1},\ell_{1}),(m_{2},\ell_{2}))=\begin{cases}|f||\ell_{1}-\ell_{2}|&m_{1}=m_{2}\\ |f||\ell_{1}+\ell_{2}|&m_{1}\neq m_{2}.\end{cases}

(64)

To see that $d$ defined above is a pseudometric, note that $d$ is equivalent to the $\ell_{1}$ -norm after applying the mapping $(m,\ell)\rightarrow(0,\ldots,0,\ell|f|,0,\ldots,0)$ (with $\ell|f|$ occurring at position $m$ ). The reason for being a pseudometric rather than a metric is that $d((m_{1},0),(m_{2},0))=0$ .

Intuitively, $d$ is a measure of how far apart the beliefs $(m_{1},\ell_{1})$ and $(m_{2},\ell_{2})$ are. If $m_{1}=m_{2}$ , then these two beliefs are close when $\ell_{1}$ and $\ell_{2}$ are close. If $m_{1}\neq m_{2}$ , the beliefs are close only when $m_{1}$ and $m_{2}$ are both comparatively likely, which corresponds to the case where both $\ell_{1}$ and $\ell_{2}$ are small.

We use $d$ to bound the Bhattacharyya distance associated with two codewords:

$\displaystyle d_{\rm B}(\vec{x}^{(j)}_{m_{1},\ell_{1}},\vec{x}^{(j)}_{m_{2},\ell_{2}},(P^{(j)})^{B})$	$\displaystyle=$	$\displaystyle\sum_{i=1}^{B}d_{\rm B}(\vec{x}^{(j)}_{m_{1},\ell_{1}}(i),\vec{x}^{(j)}_{m_{2},\ell_{2}}(i),P^{(j)})$	(65)
	$\displaystyle\geq$	$\displaystyle\|f\|d_{H}(\vec{x}^{(j)}_{m_{1},\ell_{1}}(i),\vec{x}^{(j)}_{m_{2},\ell_{2}}(i))$	(66)
	$\displaystyle\geq$	$\displaystyle d((m_{1},\ell_{1}),(m_{2},\ell_{2})),$	(67)

where:

•

(65) follows from tensorization (Lemma 23).
•

(66) comes from the fact that the number of indices in which $\vec{x}^{(j)}_{m_{1},\ell_{1}}(i)$ and $\vec{x}^{(j)}_{m_{2},\ell_{2}}(i)$ differ is simply their Hamming distance, and each of these corresponding terms are at least $|f|$ due to the extra assumption in Lemma 23.

•

To see why (67) holds, we use the definition of $\vec{x}^{(j)}_{m,\ell}$ and handle the cases $m_{1}=m_{2}$ and $m_{1}\neq m_{2}$ separately. If $m_{1}=m_{2}$ , then

d_{H}(\vec{x}^{(j)}_{m_{1},\ell_{1}},\vec{x}^{(j)}_{m_{2},\ell_{2}})=|\ell_{1}-\ell_{2}|,

(68)

which matches (64). If $m_{1}\neq m_{2}$ , then the first $B/2+\min(\ell_{1},\ell_{2})$ symbols of the two codewords consist of only $m_{1}$ and only $m_{2}$ respectively, so that

d_{H}(\vec{x}^{(j)}_{m_{1},\ell_{1}},\vec{x}^{(j)}_{m_{2},\ell_{2}})\geq B/2+\min(\ell_{1},\ell_{2})\geq\max(\ell_{1},\ell_{2})+\min(\ell_{1},\ell_{2})=\ell_{1}+\ell_{2},

(69)

which again matches (64).

Observe that the sequence of states taken by the nodes forms a Markov process. We establish a bound on the probability of achieving any intermediate state given the beginning state; intuitively, the following result states that given the original state $(m^{(0)},\ell^{(0)})=(m,B/2)$ , the probability of seeing another state $(m^{(j)},\ell^{(j)})$ is exponentially small in the distance between the two states defined by (64).

Lemma 24.

For all $j$ , all $m_{1}\neq m_{2}$ , and all $(m^{\prime},\ell^{\prime})$ , we have

-\log\mathbb{P}\big{(}m^{(j)}=m^{\prime},\ell^{(j)}=\ell^{\prime}|m=m_{1}\big{)}\geq 2d\big{(}(m_{1},B/2),(m^{\prime},\ell^{\prime})\big{)}-2j\log(M(B+1))-2j|f|-j\log(M-1)

(70)

and

d_{\rm B}\big{(}\big{(}\vec{y}^{(j+1)}|m=m_{1}\big{)},\big{(}\vec{y}^{(j+1)}|m=m_{2}\big{)}\big{)}\geq B|f|-2(j+1)\log(M(B+1))-2j|f|-j\log(M-1).

(71)

Proof.

We use an induction argument, with the base case being that (70) trivially holds when $j=0$ : Either the left-hand side is infinite due to a probability of zero, or the left-hand side is zero due to a probability of one (i.e., $(m^{\prime},\ell^{\prime})=(m_{1},B/2)$ ), in which case the first term on the right-hand side is zero.

We proceed by induction on $j$ , and break the inductive argument into the following two claims:

•

If (70) holds for $j$ , then (71) holds for $j$
•

If (71) holds for $j$ , then (70) holds for $j+1$ .

Start by assuming that (70) holds for some $j$ . Observe that $m\rightarrow(m^{(j)},\ell^{(j)})\rightarrow\vec{y}^{(j+1)}$ forms a Markov chain. Let $P_{1}$ be the channel $m\rightarrow(m^{(j)},\ell^{(j)})$ and $P_{2}$ be the channel $(m^{(j)},\ell^{(j)})\rightarrow\vec{y}^{(j+1)}$ ; applying Lemma 44 with $s=1/2$ then gives for any $m_{1}\neq m_{2}$ that

	$\displaystyle d_{\rm B}\left(\big{(}\vec{y}^{(j+1)}\|m^{(0)}=m_{1}\big{)},\big{(}\vec{y}^{(j+1)}\|m^{(0)}=m_{2}\big{)}\right)$	(72)
$\displaystyle\geq$	$\displaystyle\min_{m^{\prime},m^{\prime\prime},\ell^{\prime},\ell^{\prime\prime}}\Big{\{}d_{\rm B}(\vec{x}_{m^{\prime},\ell^{\prime}},\vec{x}_{m^{\prime\prime},\ell^{\prime\prime}},(P^{(j)})^{B})-\frac{1}{2}\log\mathbb{P}(m^{\prime},\ell^{\prime}\|m^{(0)}=m_{1})-\frac{1}{2}\log\mathbb{P}(m^{\prime\prime},\ell^{\prime\prime}\|m^{(0)}=m_{2})\Big{\}}$
	$\displaystyle-2\log(M(B+1)).$

Since (70) holds for $j$ (and for all $(m^{\prime},\ell^{\prime})$ and $m_{1}$ ), we can combine it with (67) to simplify the expression inside the minimization clause:

	$\displaystyle d_{\rm B}(\vec{x}_{m^{\prime},\ell^{\prime}},\vec{x}_{m^{\prime\prime},\ell^{\prime\prime}},(P^{(j)})^{B})-\frac{1}{2}\log\mathbb{P}(m^{\prime},\ell^{\prime}\|m=m_{1})-\frac{1}{2}\mathbb{P}(m^{\prime\prime},\ell^{\prime\prime}\|m=m_{2})$
$\displaystyle\geq$	$\displaystyle d\big{(}(m^{\prime},\ell^{\prime}),(m^{\prime\prime},\ell^{\prime\prime})\big{)}+d\big{(}(m_{1},B/2),(m^{\prime},\ell^{\prime})\big{)}+d\big{(}(m_{2},B/2),(m^{\prime},\ell^{\prime})\big{)}$
	$\displaystyle-2j\log(M(B+1))-2j\|f\|-j\log(M-1)$
$\displaystyle\geq$	$\displaystyle d\big{(}(m_{1},B/2),(m_{2},B/2)\big{)}-2j\log(M(B+1))-2j\|f\|-j\log(M-1)$	(74)
$\displaystyle=$	$\displaystyle B\|f\|-2j\log(M(B+1))-2j\|f\|-j\log(M-1),$	(75)

where (74) follows from non-negativity (first $d$ term) and the triangle inequality (second and third $d$ terms) for the pseudometric $d$ , and the last step uses the definition of $d$ with $m_{1}\neq m_{2}$ . By substituting (75) into (72), we conclude that (71) holds for $j$ .

We now turn to the second dot point mentioned at the start of the proof. We will split this into two cases. We note that in (70) we are interested in the event that $(m^{(j)},\ell^{(j)})=(m^{\prime},\ell^{\prime})$ , and for this to happen, $m^{\prime}$ must be the value assigned to $m^{*}$ used in (63), i.e., $m^{\prime}$ must be the most likely message given $\vec{y}^{(j)}$ . We will use this fact in both cases.

Case 1 ( $m^{\prime}=m_{1}$ ). If $\ell^{\prime}=B/2$ , then $d((m_{1},B/2),(m^{\prime},\ell^{\prime}))=0$ , so that the RHS of (70) is upper bounded by 0, and therefore (70) trivially holds. Therefore, we assume that $\ell^{\prime}<B/2$ .

For each $\vec{y}^{(j+1)}$ and each $m^{\prime\prime}\neq m_{1}$ , define

L_{m^{\prime\prime}}(\vec{y}^{(j+1)})=\frac{\mathbb{P}(\vec{y}^{(j+1)}|m^{(0)}=m_{1})}{\mathbb{P}(\vec{y}^{(j+1)}|m^{(0)}=m^{\prime\prime})}.

(76)

By (63) and the assumption $\ell^{(j+1)}<B/2$ , there exists $m^{\prime\prime}$ (which can be chosen to be the second most likely value of $m^{(0)}$ given $\ell^{(j+1)}$ , noting that the most likely value is $m^{\prime}=m_{1}$ ) such that

\frac{1}{4|f|}\log L_{m^{\prime\prime}}(\vec{y}^{(j+1)})<\ell^{(j+1)}+1.

(77)

Combining this with a union bound over all $m^{\prime\prime}$ , we have

			$\displaystyle\mathbb{P}(m^{(j+1)}=m^{\prime},\ell^{(j+1)}=\ell^{\prime}\|m^{(0)}=m_{1})$		(78)
		$\displaystyle\leq$	$\displaystyle(M-1)\max_{m^{\prime\prime}\neq m_{1}}\mathbb{P}\left(L_{m^{\prime\prime}}(\vec{y}^{(j+1)})<\exp(4\|f\|(\ell^{\prime}+1))\,\Big{\|}\,m^{(0)}=m_{1}\right).$		(79)

We can now conclude that

	$\displaystyle-\log\mathbb{P}(m^{(j+1)}=m^{\prime},\ell^{(j+1)}=\ell^{\prime}\|m^{(0)}=m_{1})$	(80)
$\displaystyle\geq$	$\displaystyle-\log\bigg{(}\max_{m^{\prime\prime}\neq m_{1}}\mathbb{P}\left(L_{m^{\prime\prime}}(\vec{y}^{(j+1)})<\exp(4\|f\|(\ell^{\prime}+1))\,\Big{\|}\,m^{(0)}=m_{1}\right)\bigg{)}-\log(M-1)$	(80)
$\displaystyle\geq$	$\displaystyle\min_{m^{\prime\prime}\neq m_{1}}\bigg{(}d_{\rm B}\left((\vec{y}^{(j+1)}\|m^{(0)}=m_{1}),(\vec{y}^{(j+1)}\|m^{(0)}=m^{\prime\prime})\right)\bigg{)}-2\|f\|(\ell^{\prime}+1)-\log(M-1)$	(81)
$\displaystyle\geq$	$\displaystyle B\|f\|-2(j+1)\log(M(B+1))-2j\|f\|-j\log(M-1)-2\|f\|(\ell^{\prime}+1)-\log(M-1)$	(82)
$\displaystyle=$	$\displaystyle(B-2\ell^{\prime})\|f\|-2(j+1)\log(M(B+1))-2(j+1)\|f\|-(j+1)\log(M-1)$	(83)
$\displaystyle=$	$\displaystyle 2d\big{(}(m_{1},B/2),(m^{\prime},\ell^{\prime})\big{)}-2(j+1)\log(M(B+1))-2(j+1)\|f\|-(j+1)\log(M-1).$	(84)

where (81) follows from Lemma 45, (82) follows from the inductive assumption (71), and (84) follows from the definition of $d$ with $m^{\prime}=m_{1}$ . Thus, (70) holds for $j+1$ as desired.

Case 2 ( $m^{\prime}\neq m_{1}$ ). For each $\vec{y}^{(j+1)}$ , define

L(\vec{y}^{(j+1)})=\frac{\mathbb{P}(\vec{y}^{(j+1)}|m^{(0)}=m_{1})}{\mathbb{P}(\vec{y}^{(j+1)}|m^{(0)}=m^{\prime})}.

(85)

Let $m^{\prime\prime}$ (which may depend on $\vec{y}^{(j+1)}$ ) be the second most likely value of $m^{(0)}$ (noting that $m^{\prime}$ is the most likely value). We then have

\frac{1}{4|f|}\log L(\vec{y}^{(j+1)})\leq\frac{1}{4|f|}\log\frac{\mathbb{P}(\vec{y}^{(j+1)}|m^{(0)}=m^{\prime\prime})}{\mathbb{P}(\vec{y}^{(j+1)}|m^{(0)}=m^{\prime})}\leq-\ell^{(j+1)},

(86)

where the first inequality uses the fact that the likelihood for $m_{1}$ is at most that of $m^{\prime\prime}$ , and the second inequality follows from (63) after lowing bounding the minimum therein by the second term.

We now proceed similarly to Case 1:

	$\displaystyle-\log\mathbb{P}(m^{(j+1)}=m^{\prime},\ell^{(j+1)}=\ell^{\prime}\|m^{(0)}=m_{1})$
$\displaystyle\geq$	$\displaystyle-\log\mathbb{P}\left(L(\vec{y}^{(j+1)})\leq\exp(-4\|f\|\ell^{\prime})\,\Big{\|}\,m^{(0)}=m_{1}\right)$
$\displaystyle\geq$	$\displaystyle d_{\rm B}\left((\vec{y}^{(j+1)}\|m^{(0)}=m^{\prime}),(\vec{y}^{(j+1)}\|m^{(0)}=m_{1})\right)+2\|f\|\ell^{\prime}$
$\displaystyle\geq$	$\displaystyle B\|f\|-2(j+1)\log(M(B+1))-2j\|f\|-j\log(M-1)+2\|f\|\ell^{\prime}$
$\displaystyle=$	$\displaystyle(B+2\ell^{\prime})\|f\|-2(j+1)\log(M(B+1))-2j\|f\|-j\log(M-1)$
$\displaystyle\geq$	$\displaystyle 2d\big{(}(m^{\prime},\ell^{\prime}),(m_{1},B/2)\big{)}-2(j+1)\log(M(B+1))-2(j+1)\|f\|-(j+1)\log(M-1),$	(88)

where each step follow similarly to its counterpart in (80)–(84). We have thus established that (70) holds for $j+1$ , which completes the proof of Lemma 71. ∎

The key result in Lemma 21 now immediately follows from (71) applied to $j=|E|-1$ , and by recalling that the lemma is stated with respect to growing $B$ with fixed $|E|$ and $M$ .

IV-D Comparison to Previous Work

In our previous paper [4], we proved the 2-hop version of Lemma 21. In [4], the problem is reduced to the case $|\mathcal{X}_{P}|=M$ (similarly to Lemma 23) and then handled as follows:

•

The relay node calculates the KL-divergence between the empirical distribution of the $B$ received symbols against the output distribution associated with each message.
•

The relay then sends a sorted sequence with more occurrences of the symbol whose associated KL-divergence is smallest, and an equal number of occurrences of the rest.

Our approach in the present paper is instead based on a likelihood ratio (see (63)), and upon calculating this, a sorted sequence is sent containing at most two distinct values.

At first glance, one may have expected that using likelihood ratios would be difficult, as there can be $\binom{M}{2}$ likelihood ratios of interest. However, our approach here shows that using the likelihood ratios between only the two most likely values is sufficient. Overall, our approach here is more similar to the one used in [1] (for general binary-input channels rather than the BSC), which directly computes the likelihood ratio but only handles the 2-hop setting and requires $M=2$ .

V Proof of Theorem 3 (Main Achievability Bound)

In this section, we prove Theorem 3 in the general case, using the series case from Section IV as a stepping stone. We crucially make use of the flow decomposition theorem (Theorem 6) together with its associated paths $p_{1},p_{2},\ldots,p_{k}$ and flows $f_{1},f_{2},\ldots,f_{k}$ . The overall idea is to send conditionally independent messages along different paths and only accumulate them at the end.

V-A Edge-Disjoint Paths

The protocol is simpler when the paths $p_{1},p_{2},\ldots,p_{k}$ are edge-disjoint, so we will consider this case before turning to the general case.

The key idea is to transmit information across each path independently and aggregate them only at the end. Although we assume that paths are edge-disjoint, they may still share nodes. However, each node is designed to act independently on every path it is involved in, so that information entering a node in one path does not affect other outgoing paths.

Let $\vec{y}(1),\ldots,\vec{y}(k)$ be the sequences received by the destination node along the $k$ different paths after $n$ time steps, and let $\vec{y}_{\rm all}=(\vec{y}(1),\ldots,\vec{y}(k))$ . By Corollary 22, for all $i$ and all $m_{1}\neq m_{2}$ , there exists a (single-path) protocol such that

d_{\rm B}\big{(}(\vec{y}(i)|m=m_{1}),(\vec{y}(i)|m=m_{2})\big{)}\geq(n-|V|B)(|f_{i}|-o_{B}(1)),

(89)

where the $|V|$ term comes from the fact that a simple path cannot have more than $|V|$ edges. Therefore, again letting $B\rightarrow\infty$ , we have

	$\displaystyle d_{\rm B}\big{(}(\vec{y}_{\rm all}\|m=m_{1}),(\vec{y}_{\rm all}\|m=m_{2})\big{)}$	(90)
$\displaystyle=$	$\displaystyle\sum_{i=1}^{k}d_{\rm B}\big{(}\vec{y}(i)\|m=m_{1},\vec{y}(i)\|m=m_{2}\big{)}$
$\displaystyle\geq$	$\displaystyle\sum_{i=1}^{k}(n-\|V\|B)(\|f_{i}\|-o_{B}(1))$
$\displaystyle=$	$\displaystyle(n-\|V\|B)(\|f\|-o_{B}(1)).$	(91)

Note that here $k$ is treated as a constant, because $k$ is a property of the graph $G$ so does not depend on $B$ .

Applying Lemma 9, we deduce that the total error exponent is bounded below by

\mathcal{E}_{G}\geq\liminf_{n\to\infty}-\frac{1}{n}(n-|V|B)(|f|-o_{B}(1))=|f|-o_{B}(1),

(92)

which shows that error exponents arbitrarily close to $|f|$ are achievable.

V-B The General Case

For the general case, we will still rely on the edge decomposition as before. Our approach is to reduce to the case where paths are edge-disjoint by considering the use of parallel edges, which are allowed to exist in all of our analysis and results. This idea has also been used previously when studying channel capacity results for general networks, e.g., see [11, Sec. 15.3]

We first make two observations:

•

Suppose that there is some edge $e$ such that $P_{e}$ is a product channel, say $P_{e}=Q_{1}\times Q_{2}$ . We can delete $e$ and replace it by two parallel edges $e_{1}$ and $e_{2}$ , and put the channels $Q_{1}$ and $Q_{2}$ respectively and preserve the error exponent. This is because any symbol that can be sent across $Q_{1}\times Q_{2}$ can be separately sent across the two channels in an identical manner.
•

For any graph $G$ , let $G^{\prime}$ be the resultant graph formed by replacing each $P_{e}$ with its $\ell$ -fold product $P_{e}^{\ell}$ . Then $\mathcal{E}_{G^{\prime}}\leq\ell\mathcal{E}_{G}$ . To see this, observe that any protocol with $n$ time steps over $G^{\prime}$ can be executed with $n\cdot\ell$ time steps over $G$ .

These two observations allow us to do the following:

•

Pick a large block size $B$ , and replace every channel $P_{e}$ with the $(B+k)$ -fold product channel $P_{e}^{B+k}$

•

For each edge $e$ , let

S_{e}=\{i:e\in p_{i}\}

(93)

be the indices of the paths that pass through $e$ . For each $i\in S_{e}$ , define

B_{i}=\Bigg{\lceil}\frac{f_{i}}{\sum_{j\in S_{e}}f_{j}}\cdot B\Bigg{\rceil}.

(94)

Since there are only $k$ paths, we have $|S_{e}|\leq k$ . Moreover, we have

\sum_{i\in S_{e}}B_{i}=\sum_{i\in S_{e}}\Bigg{\lceil}\frac{f_{i}}{\sum_{j\in S_{e}}f_{j}}\cdot B\Bigg{\rceil}\leq\sum_{i\in S_{e}}\Bigg{(}\frac{f_{i}}{\sum_{j\in S_{e}}f_{j}}\cdot B+1\Bigg{)}=B+|S_{e}|\leq B+k.

(95)

For each edge $e$ with corresponding channel $P_{e}^{B+k}$ , replace this by $|S_{e}|$ parallel edges, where the corresponding channels are $P_{e}^{B_{i}}$ for each $i\in S_{e}$ . We will refer to each of these new edges by $e^{\prime}_{i}$ . This is possible since $\sum_{i\in S_{e}}B_{i}\leq B+k$ . If $\sum_{i\in S_{e}}B_{i}<B+k$ , then we simply add a dummy (unused) channel to make up for the difference. Let $G^{\prime}$ denote this new graph.

Observe that after the transformation, the capacities of the corresponding parallel edges are

\Bigg{\lceil}\frac{f_{i}}{\sum_{j\in S_{e}}f_{j}}\cdot B\Bigg{\rceil}\cdot\mathcal{E}_{e}\geq B\cdot f_{i},

(96)

noting that $\sum_{j\in S_{e}}f_{j}=f_{e}\leq\mathcal{E}_{e}$ by the flow constraints.

Let $p^{\prime}_{i}$ be the path in $G^{\prime}$ formed by joining up edges of the form $e^{\prime}_{i}$ for each $e\in p_{i}$ . Since $p_{i}$ is a path and $e^{\prime}_{i}$ joins the same pair of vertices as $e$ , $p^{\prime}_{i}$ is also a path. By (96), each of these edges $e^{\prime}_{i}$ has error exponent at least $B\cdot f_{i}$ , so that the capacity on path $p^{\prime}_{i}$ is at least $B\cdot f_{i}$ . Thus, there exist edge-disjoint paths $p^{\prime}_{1},p^{\prime}_{2},\ldots,p^{\prime}_{k}$ on $G^{\prime}$ such that the capacity on each path $p^{\prime}_{i}$ is at least $B\cdot f_{i}$ .

We have now reduced the problem to the edge-disjoint case studied in the previous subsection, and we conclude that we can achieve error exponents arbitrarily close to $B|f|$ . Since time was scaled by a factor of $B+k$ , we conclude that $\mathcal{E}_{G^{\prime}}\leq(B+k)\mathcal{E}_{G}$ and therefore

\mathcal{E}_{G}\geq\frac{1}{(B+k)}\mathcal{E}_{G^{\prime}}=\frac{B|f|}{(B+k)}

(97)

Taking $B\rightarrow\infty$ , we conclude that $\mathcal{E}_{G}\geq|f|$ .

VI Proof of Theorem 4 (Main Converse Bound)

The proof of our converse bound consists of using the max-flow min-cut duality theorem (Theorem 5) to construct a cut with the same value as $|f|$ . We will additionally use the following lemma, where we recall that $\mathcal{E}^{f}_{M,P}$ denotes the 1-hop error exponent of a channel with full feedback:

Lemma 25.

Consider any partition $(A,B)$ of the nodes in $G$ with the source node in $A$ and the destination node in $B$ . Let $P^{(1)},P^{(2)},\ldots,P^{(k)}$ be the channels associated with edges from $A$ to $B$ , and let $\hat{P}=P^{(1)}\times P^{(2)}\times\ldots\times P^{(k)}$ be the product channel. Then for all $M$ ,

\mathcal{E}_{M,G}\leq\mathcal{E}^{f}_{M,\hat{P}}.

(98)

Proof.

We compress $A$ into a single node by allowing infinite information transfer between different nodes inside $A$ , and do the same for $B$ . Clearly, this modification cannot decrease the overall error exponent.

At first glance, it may appear that the net result of doing this is simply allowing $A$ to send information to $B$ via $P^{(1)},P^{(2)},\ldots,P^{(k)}$ , each a total of $n$ times, which is the same as using $\hat{P}$ for $n$ times. However, we must also account for the possibility of back-edges, which amounts to having a certain degree of feedback. In view of this, since any protocol that can be executed under the original setup can also be executed under the new setup with $n$ uses of $\hat{P}$ with full feedback, we conclude that $\mathcal{E}_{M,G}\leq\mathcal{E}^{f}_{M,\hat{P}}$ . ∎

Since $\mathcal{E}_{M,P}^{f}$ is a decreasing function of $M$ , $\mathcal{E}_{M,P}^{f}\leq\mathcal{E}_{2,P}^{f}$ . Furthermore, feedback does not improve the error exponent when $M=2$ (Theorem 43), and thus Lemma 25 immediately gives the following corollary:

Corollary 26.

Under the definition of $\hat{P}$ in Lemma 25, we have

\mathcal{E}_{M,G}\leq\mathcal{E}_{2,\hat{P}}.

(99)

We can now use maxflow-mincut duality (Theorem 5) and consider a cut $(A,B)$ such such that $A$ contains the source, $B$ contains the sink, and

\sum_{i=1}^{k}\mathcal{E}_{2,P^{(i)}}=|f|,

(100)

where $P^{(1)},P^{(2)},\ldots,P^{(k)}$ are the channels associated with edges that begin in $A$ and end in $B$ . Letting $\hat{P}=P^{(1)}\times P^{(2)}\times\ldots\times P^{(k)}$ be the product channel, Lemma 31 gives

\mathcal{E}_{2,\hat{P}}\leq\sum_{i}\mathcal{E}_{2,P^{(i)}}

(101)

which implies

\mathcal{E}_{M,G}\leq\mathcal{E}_{2,\hat{P}}\leq\sum_{i}\mathcal{E}_{2,P^{(i)}}=|f|

(102)

as required.

VII Achievability Bounds for General Channels

In this section, we study achievability bounds in the case that the channels need not be pairwise reversible.

VII-A Constant Number of Messages

When the pairwise reversible assumption is dropped, Theorem 3 can fail to hold even in the 2-hop setting corresponding to a graph with 3 nodes and 2 edges [4, Thm. 3]. However, it is possible to prove some weaker achievability bounds. Recalling $\tilde{\mathcal{E}}$ from Definition 13, we state the following achievability bound for general channels:

Lemma 27.

Given an arbitrary graph $G$ , let $|\tilde{f}|$ be the maxflow formed when the capacity of every edge $e$ is set to $\tilde{\mathcal{E}}_{P_{e}}$ . Then

\mathcal{E}_{G}\geq|\tilde{f}|

(103)

Proof.

We use the same proof as the pairwise reversible case. The only use of the pairwise reversibility condition is in Lemma 23 when constructing the codewords $\vec{x}^{(j)}_{1},\vec{x}^{(j)}_{2},\ldots,\vec{x}^{(j)}_{M}$ of length $\ell=M!$ such that $d_{\rm B}(\vec{x}^{(j)}_{m_{1}},\vec{x}^{(j)}_{m_{2}},(P^{(j)})^{\ell})\geq\ell\cdot\mathcal{E}_{M,P^{(i)}}$ . However, such codewords of length $\ell=M!$ with $d_{\rm B}(\vec{x}^{(j)}_{m_{1}},\vec{x}^{(j)}_{m_{2}},(P^{(j)})^{\ell})\geq\ell\cdot\tilde{\mathcal{E}}_{M,P^{(i)}}$ (i.e., with $\tilde{\mathcal{E}}$ in place of $\mathcal{E}$ ) are still guaranteed to exist by Lemma 29 even when the pairwise reversibility condition is dropped. Outside of Lemma 23, the proof follows with $\mathcal{E}_{M,P^{(i)}}$ replaced by $\tilde{\mathcal{E}}_{M,P^{(i)}}$ where appropriate. ∎

VII-B Zero-Rate Error Exponents

In this subsection, we prove Theorem 7. We start by relating $\tilde{\mathcal{E}}_{M,P}$ to the zero-rate error exponent:

Lemma 28.

For any $M\geq 2$ , we have

\tilde{\mathcal{E}}_{M,P}\geq E_{P}(0)\geq\frac{M-1}{M}\tilde{\mathcal{E}}_{M,P}.

(104)

By taking $M\rightarrow\infty$ on both sides, we also have

\lim_{M\rightarrow\infty}\tilde{\mathcal{E}}_{M,P}=E_{P}(0).

(105)

Proof.

We use the expression for $E_{P}(0)$ in Theorem 26. Let $q$ maximize the expression in (26). Draw $(x_{1},x_{2},\ldots,x_{M})$ independently according to distribution $q$ , and observe that the definition of $\tilde{\mathcal{E}}$ in Definition 13 gives

\tilde{\mathcal{E}}_{M,P}\geq\mathbb{E}\left(\frac{2}{M(M-1)}\sum_{1\leq m_{1}<m_{2}\leq M}d_{\rm B}(x_{m_{1}},x_{m_{2}},P)\right)=E_{P}(0).

(106)

To show the second inequality in the lemma statement, let $q$ be the empirical distribution (type) of $(x_{1},x_{2},\ldots,x_{M})$ achieving (28). Substituting $q$ into Theorem 26, we have

$\displaystyle E_{P}(0)$	$\displaystyle\geq$	$\displaystyle\sum_{x,x^{\prime}}q_{x}q_{x^{\prime}}d_{\rm B}(x,x^{\prime},P)$	(107)
	$\displaystyle=$	$\displaystyle\frac{1}{M^{2}}\sum_{1\leq m_{1},m_{2}\leq M}d_{\rm B}(x_{m_{1}},x_{m_{2}},P)$	(108)
	$\displaystyle=$	$\displaystyle\frac{2}{M^{2}}\sum_{1\leq m_{1}<m_{2}\leq M}d_{\rm B}(x_{m_{1}},x_{m_{2}},P)$	(109)
	$\displaystyle\stackrel{{\scriptstyle\eqref{eq:rev_rate}}}{{=}}$	$\displaystyle\frac{M-1}{M}\tilde{\mathcal{E}}_{M,P}.$	(110)

∎

We now prove Theorem 4 by devising a protocol whose error exponent is arbitrarily close to $|f|$ , where the edge weights in $f$ are the respective 1-hop zero-rate error exponents (i.e., $E_{P_{e}}(0)$ for edge $e$ ).

Let $M$ be a (large) constant, and let $|\tilde{f}_{M}|$ be the error exponent formed by setting the capacity of each edge $e$ to $\tilde{\mathcal{E}}_{M,P_{e}}$ . Since $\tilde{\mathcal{E}}_{M,P_{e}}\geq E_{P_{e}}(0)$ for all $P$ , we have $|\tilde{f}_{M}|\geq|f|$ .

Building on Sections IV and V, choose a (large) block size $B$ , and generate a sequential block transmission protocol of length $B$ with $M$ messages. We can treat a single sequential block transmission as a discrete memoryless channel $Q_{B,M}$ . More precisely, the input to $Q_{B,M}$ is the message that the source node receives, and the output of $Q_{B,M}$ is what the destination receives from a single block. The analysis in Sections IV and V shows that given any $\epsilon>0$ , there is some sufficiently large $B$ such that

d_{\rm B}(m_{1},m_{2},Q_{B,M})\geq B(|\tilde{f}_{M}|-\epsilon)

(111)

for all $m_{1}\neq m_{2}$ . Substituting into (13), it follows that

\tilde{\mathcal{E}}_{M,Q_{B,M}}\geq B(|\tilde{f}_{M}|-\epsilon).

(112)

We now proceed to bound the zero-rate error exponent of $Q_{B}$ . By Lemma 105,

E_{Q_{B,M}}(0)\geq\frac{M-1}{M}\tilde{\mathcal{E}}_{M,Q_{B}}\geq\frac{M-1}{M}B(|\tilde{f}_{M}|-\epsilon)\geq\frac{M-1}{M}B(|f|-\epsilon)

(113)

Similar to the proof of Corollary 22, we can run $n/B-|V|$ conditionally independent copies of the block protocol in $n$ time steps, which is equivalent to using channel $Q_{B}$ for a total of $n/B-|V|$ times. By adopting any coding strategy for $Q_{B}$ that attains its zero-rate error exponent,⁵⁵5Such a strategy may be very complex, but we recall that our focus is on information-theoretic limits and not practical protocols. we deduce that

E_{G}(0)\geq\lim_{n\rightarrow\infty}-\frac{1}{n}\cdot\frac{n-|V|B}{B}\cdot\frac{M-1}{M}\cdot B(|f|-\epsilon)=\frac{M-1}{M}\cdot(|f|-\epsilon).

(114)

Since $\epsilon$ is arbitrarily small and $M$ is arbitrarily large, we conclude that

E_{G}(0)\geq|f|

(115)

as required.

VIII Further Results

In this section, we provide some additional results that address when it is (or is not) possible to relax the assumptions made in our main results.

VIII-A Converse Bound for $M>2$

Recall that $|f_{M,G}|$ denotes the maxflow for the graph $G$ with $M$ messages. One might hope that the proof of Theorem 4 can be modified to prove a converse bound of $\mathcal{E}_{M,G}\leq|f_{M,G}|$ (instead of $|f_{2,G}|$ ). However, there are at least two problems in doing so:

•

We cannot guarantee that the 1-hop feedback and non-feedback error exponents are equal when $M>2$ ;
•

The required analog of Lemma 31 may not hold when $M=2$ .

In this subsection, we partially address these issues by imposing additional assumptions. Specifically, we prove the following:

Theorem 29.

Suppose that the following conditions are satisfied:

•

$G$ has a mincut with no back-edges (i.e., all edges are from the source side to the destination side);
•

Every channel in $G$ is pairwise reversible.

Then, we have $\mathcal{E}_{M,G}=|f_{M,G}|$ .

We proceed with the proof. Although the bound in Lemma 25 uses the feedback error exponent, we do not need to consider feedback if the partition has no back-edges. We immediately deduce the following:

Lemma 30.

\mathcal{E}_{M,G}\leq\mathcal{E}_{M,\hat{P}}.

(116)

Starting from a mincut with no back-edges, let $P^{(1)},P^{(2)},\ldots,P^{(k)}$ be the channels going across the cut. Due to our pairwise reversible assumption, we may apply Lemma 38 to obtain

\mathcal{E}_{M,\hat{P}}=\sum_{i}\mathcal{E}_{M,P^{(i)}}=|f_{M,G}|,

(117)

where the last step holds by maxflow-mincut duality. This completes the converse part of Theorem 29. The achievability part is an immediate consequence of Theorem 3.

Next, for zero-rate error exponents, we show that the pairwise reversibility condition can be dropped:

Lemma 31.

Suppose that $G$ has a mincut with no back-edges. Then, we have $E_{G}(0)=|f|$ where $|f|$ is the maxflow formed by setting the capacity of every edge $e$ to $E_{P_{e}}(0)$ .

Proof.

Similar to (117), the proof of Lemma 38 shows that

\tilde{\mathcal{E}}_{M,\hat{P}}=\sum_{i}\tilde{\mathcal{E}}_{M,P^{(i)}}.

(118)

Taking $M\rightarrow\infty$ and applying (105), we obtain

E_{\hat{P}}(0)=\sum_{i}E_{P^{(i)}}(0).

(119)

Under the same conditions as Lemma 116, we can similarly conclude that $E_{G}(0)\leq E_{\hat{P}}(0)$ , and therefore

E_{G}(0)\leq\sum_{i}E_{P^{(i)}}(0),

(120)

which completes the converse part. The achievability part is an immediate consequence of Theorem 7. ∎

VIII-B Approximation Guarantees

Lemma 27 and Theorem 4 reveal that for any network $G$ and any $M$ , we have

|\tilde{f}_{M,G}|\leq\mathcal{E}_{M,G}\leq|f_{2,G}|,

(121)

where we recall that $\tilde{f}_{M,G}$ uses (28) for the edge weights in the maxflow calculation. We will proceed to show that this is in fact a 4-approximation:

Lemma 32.

For any $M$ and $G$ , we have

|f_{2,G}|\leq 4{|\tilde{f}_{M,G}|}.

(122)

Proof.

It is sufficient to show that $\mathcal{E}_{2,P}\leq 4\tilde{\mathcal{E}}_{M,P}$ , as this means that $f_{2,G}$ and $\tilde{f}_{M,G}$ are defined with respect to edge weights that differ by at most a factor of 4. We have

$\displaystyle\mathcal{E}_{2,P}$	$\displaystyle=$	$\displaystyle\max_{x,x^{\prime}}d_{\rm C}(x,x^{\prime},P)$	(123)
	$\displaystyle=$	$\displaystyle\max_{x,x^{\prime}}\max_{s\in[0,1]}d_{\rm C}(x,x^{\prime},P,s)$	(124)
	$\displaystyle\leq$	$\displaystyle 2\max_{x,x^{\prime}}d_{\rm C}(x,x^{\prime},P,1/2)$	(125)
	$\displaystyle=$	$\displaystyle 2\tilde{\mathcal{E}}_{2,P},$	(126)

where (123) follows from (32), (125) follows by using the concavity of $d_{\rm C}$ in $s$ [13, Thm. 5] and considering an equal combination of the values at $s$ and $1-s$ (along with $d_{\rm C}\geq 0$ for the latter), and (126) follows from (28).

Next, let $(x,x^{\prime})$ be chosen to maximize $d_{\rm B}(x,x^{\prime},P,1/2)$ . By considering the distribution assigning a mass of $1/2$ to $x$ and $1/2$ to $x^{\prime}$ , we have

\tilde{\mathcal{E}}_{M,P}\geq E_{P}(0)\geq\frac{1}{2}d_{\rm B}(x,x^{\prime},P,1/2)=\frac{1}{2}\tilde{\mathcal{E}}_{2,P},

(127)

where we first applied Lemma 105, followed by (26) with the maximum replaced by the above-mentioned distribution, then (28) with $M=2$ . Combining equations (126) and (127) gives Lemma 122. ∎

The next lemma gives conditions under which a tighter approximation is possible:

Lemma 33.

Suppose that either of the following conditions hold:

•

$M=2$ ; or
•

Every channel in $G$ is pairwise reversible.

Then, we have a 2-approximation in (121), i.e., $|f_{2,G}|\leq 2{|\tilde{f}_{M,G}|}$ .

Proof.

The case $M=2$ follows immediately from (126). Moreover, if every channel is pairwise reversible, then $\tilde{\mathcal{E}}_{2,P}=\mathcal{E}_{2,P}$ , so the 2-approximation result follows from (127). ∎

We also identify another scenario where our bounds are exactly tight. Consider the $K$ -ary symmetric channel with inputs $\{1,2,\ldots,K\}$ , outputs $\{1,2,\ldots,K\}$ , and

P(y|x)=\begin{cases}1-(K-1)p&y=x\\ p&y\neq x\end{cases}

(128)

for some $p\in\big{(}0,\frac{1}{K-1}\big{)}$ .

Lemma 34.

Suppose that every channel is a $K$ -ary symmetric channel for some $K\geq M$ (different channels can have different values of $K$ and different noise levels). Then $|\tilde{f}_{M,G}|=|f_{2,G}|$ , and the protocol given in Section V provides the achievability part.

Proof.

Similarly to the earlier results in this subsection, we simply need to show that $\tilde{\mathcal{E}}_{M,P_{e}}=\mathcal{E}_{2,P_{e}}$ for each $P_{e}$ . Recall the definition of $\tilde{\mathcal{E}}$ :

\tilde{\mathcal{E}}_{M,P_{e}}=\frac{2}{M(M-1)}\max_{(x_{1},\ldots,x_{M})}\sum_{1\leq m_{1}\leq m_{2}\leq K}d_{\rm B}(x_{m_{1}},x_{m_{2}},P_{e}).

(129)

For the $K$ -ary symmetric channel, $d_{\rm B}(x,x^{\prime},P_{e})$ ie either 0 (if $x=x^{\prime}$ ) or a fixed positive value (if $x\neq x^{\prime}$ ). Since $K\geq M$ , we can achieve the maximum in (129) by letting $x_{1},\ldots,x_{M}$ be distinct, yielding

\tilde{\mathcal{E}}_{M,P_{e}}=-\log(2\sqrt{(1-(M-1)p)\cdot p}+(M-2)p).

(130)

The computation for $\mathcal{E}_{2,P_{e}}$ is also straightforward; since $P_{e}$ is pairwise reversible, we can simply choose any two distinct inputs to obtain

\mathcal{E}_{2,P_{e}}=\max_{x_{1},x_{2}}d_{\rm B}(x_{1},x_{2},P_{e})=-\log(2\sqrt{(1-(M-1)p)\cdot p}+(M-2)p).

(131)

Since $\tilde{\mathcal{E}}_{M,P_{e}}=\mathcal{E}_{2,P_{e}}$ for all edges, it follows that $|\tilde{f}_{M,G}|=|f_{2,G}|$ . ∎

VIII-C Counterexample to a Natural Conjecture

In Theorem 29, we assumed that $G$ has a mincut with no back-edges. It might be tempting to believe that this is the case whenever $G$ contains no directed cycles. However, this is false, as we can see in Figure 2, with the mincut described in the figure caption.

Figure 2: An example where the only mincut (namely,

A=\{1,3\}

and

B=\{2,4\}

) contains a back-edge. Solid arrows have infinite capacity and dotted arrows have finite capacity.

Using the example in Figure 2, we show that even when every channel is pairwise reversible and the graph is acyclic, it is possible to achieve an error exponent greater than $|f_{M,G}|$ when $M=3$ :

Theorem 35.

Even if we assume that $G$ is acyclic and every channel is pairwise reversible, there exist scenarios in which $\mathcal{E}_{M,G}>|f_{M,G}|$ when $M=3$ . Moreover, the same statement holds even when the 1-hop error exponents defining $f_{M,G}$ are replaced by their counterparts with full feedback.

Proof.

Consider the graph in Figure 2. Set $M=3$ , and consider the setup described as follows for some $p\in\big{(}0,\frac{1}{3}\big{)}$ :

•

On the solid edges, infinite noiseless communication is allowed.
•

On the edge $1\rightarrow 2$ , we consider a ternary symmetric channel with inputs $\{0,1,2\}$ , outputs $\{0,1,2\}$ , and with $\mathbb{P}(y|x)=1-2p$ if $y=x$ , and $\mathbb{P}(y|x)=p$ otherwise.
•

On the edge $3\rightarrow 4$ , we consider a binary symmetric channel with noise parameter $p$ .

We will prove that as $p$ becomes sufficiently small, it is possible to beat the maxflow rate. For the rest of this section, asymptotic notation refers to the limit as $p\rightarrow 0$ .

Let us first compute the error exponents of all the channels. Since all channels are pairwise reversible, we may compute the error exponents using $\mathcal{E}_{M,P}=\tilde{\mathcal{E}}_{M,P}$ as per (28). For the ternary symmetric channel, (28) is maximized when $x_{1},x_{2},x_{3}$ are all distinct, so the error exponent is given by

\mathcal{E}=-\log(2\sqrt{p(1-2p)}+p)=\frac{1}{2}\log\frac{1}{p}+\mathcal{O}(1).

(132)

For the binary symmetric channel, only two of $x_{1},x_{2},x_{3}$ can be distinct, so the error exponent is given by

\mathcal{E}=-\frac{2}{3}\log(2\sqrt{p(1-p)})=\frac{1}{3}\log\frac{1}{p}+\mathcal{O}(1).

(133)

Hence, the maxflow is simply the sum of the two error exponents, i.e., $\frac{5}{6}\log\frac{1}{p}+\mathcal{O}(1)$ .

We will prove that $\log\frac{1}{p}+\mathcal{O}(1)$ is an achievable error exponent, thus disproving the reasonable conjecture that the maxflow is an upper bound to the error exponent. Similar to Section IV, define a single “sequential transmission” as follows:

•

Node 1 inputs the true message $m$ into its channel connecting to node 2, and also sends $m$ to node 3;
•

Node 2 then tells both nodes 3 and 4 what it received;
•

Node 3 then looks at both the ground truth from node 1 and the forwarded value from node 2. If they are the same, message $0$ is sent to node 4, and otherwise, message $1$ is sent to node 4.

A total of $n-3$ such transmissions can occur in the $n$ time steps by chaining them one after the other (similar to Figure 1). Moreover, we can view a single sequential transmission as a discrete memoryless channel $Q$ , where the input is $m$ , and the output consists of what node 4 receives from both nodes 2 and 3.

We can classify the result of each sequential transmission into three possible outcomes:

(i)

Node 4 receives message 1 from node 3. This means that either node 2 received the wrong message (and therefore node 3 sent 1), or the message from node 3 to node 4 got flipped. Each event happens with probability $\mathcal{O}(p)$ , so the total probability of this is $\mathcal{O}(p)$ .
(ii)

Node 4 receives message 0 from node 3, and receives the wrong message from node 2. In this case, both node the messages from 1 to 2 and 3 to 4 must be corrupted, so this happens with probability $\mathcal{O}(p^{2})$
(iii)

Node 4 receives message 0 from node 3, and receives the correct message from node 2. This occurs with probability $1-\mathcal{O}(p)$ .

∎

We summarize the end-to-end transition probabilities in Figure 3:

		(0,0)	(1,0)	(2,0)	(*,1)
		output
input	0	$1-\mathcal{O}(p)$	$\mathcal{O}(p^{2})$	$\mathcal{O}(p^{2})$	$\mathcal{O}(p)$
	1	$\mathcal{O}(p^{2})$	$1-\mathcal{O}(p)$	$\mathcal{O}(p^{2})$	$\mathcal{O}(p)$
	2	$\mathcal{O}(p^{2})$	$\mathcal{O}(p^{2})$	$1-\mathcal{O}(p)$	$\mathcal{O}(p)$

Figure 3: The input represents the message received by node 0. An output of

(a,b)

indicates that

a

was received from node 2 and

b

was received from node 3.

(*,1)

indicates that 1 is received from node 3, with arbitrary (ignored) information from node 2.

Since we are allowed $n-3$ uses of $Q$ in $n$ time steps, we have $\mathcal{E}_{3,G}\geq\mathcal{E}_{3,Q}$ . We now proceed to bound $\mathcal{E}_{3,Q}$ ; we have

d_{\rm B}(0,1,Q)=-\log\sum_{y}\sqrt{Q_{0}(y)Q_{1}(y)}=-\log\sqrt{\mathcal{O}(p^{2})}=\log\frac{1}{p}+\mathcal{O}(1)

(134)

and similarly when $(0,1)$ is replaced by any two distinct symbols. Using Lemma 12, this gives

\mathcal{E}_{3,Q}\geq\log\frac{1}{p}+O(1),

(135)

showing that an error exponent of $\log\frac{1}{p}+O(1)$ is achievable.

This same counterexample in fact also establishes the second part of Theorem 35 in which the mincut is defined with respect to edge capacities $\mathcal{E}^{f}_{P_{e}}$ instead of $\mathcal{E}_{P_{e}}$ . To see this, first note that for the ternary symmetric channel, we have $\mathcal{E}_{3}=\mathcal{E}_{2}$ ; this follows from the proof of Lemma 34, with the ternary symmetric channel being pairwise reversible. On the other hand, $\mathcal{E}_{2}=\mathcal{E}^{f}_{2}$ (Theorem 43), $\mathcal{E}^{f}_{2}\geq\mathcal{E}^{f}_{3}$ (error exponent is decreasing in $M$ ) and $\mathcal{E}^{f}_{3}\geq\mathcal{E}_{3}$ (feedback cannot decrease error exponents). Therefore, the ternary symmetric channel has $\mathcal{E}_{2}=\mathcal{E}^{f}_{2}\geq\mathcal{E}^{f}_{3}\geq\mathcal{E}_{3}=\mathcal{E}_{2}$ ; this means that all the inequalities must hold with equality, yielding

\mathcal{E}^{f}_{3}=\mathcal{E}_{3}=\mathcal{E}_{2}=-\log(2\sqrt{p(1-2p)}+\sqrt{p\cdot p})=\frac{1}{2}\log\frac{1}{p}+\mathcal{O}(1).

(136)

For the binary symmetric channel, the feedback error exponent for $M=3$ is given as follows [16, o. 64]:

\mathcal{E}^{f}_{3}=-\log(p^{1/3}(1-p)^{2/3}+p^{2/3}(1-p)^{1/3})=\frac{1}{3}\log\frac{1}{p}+\mathcal{O}(1),

(137)

which gives a maxflow of $\frac{5}{6}\log\frac{1}{p}+\mathcal{O}(1)$ . Once again, since this is strictly below $\log\frac{1}{p}+\mathcal{O}(1)$ for small enough $p$ , we obtain the desired result.

References

[1] W. Huleihel, Y. Polyanskiy, and O. Shayevitz, “Relaying one bit across a tandem of binary-symmetric channels,” IEEE International Symposium on Information Theory (ISIT), 2019.
[2] V. Jog and P. L. Loh, “Teaching and learning in uncertainty,” IEEE Transactions on Information Theory, vol. 67, no. 1, pp. 598–615, 2021.
[3] Y. H. Ling and J. Scarlett, “Optimal rates of teaching and learning under uncertainty,” IEEE Transactions on Information Theory, vol. 61, no. 11, pp. 7067–7080, 2021.
[4] ——, “Multi-bit relaying over a tandem of channels,” IEEE Transactions on Information Theory (to appear), 2023.
[5] S. Rajagopalan and L. Schulman, “A coding theorem for distributed computation,” ACM Symposium on Theory of Computing, 1994.
[6] Y. H. Ling and J. Scarlett, “Simple coding techniques for many-hop relaying,” IEEE Transactions on Information Theory, vol. 68, no. 11, pp. 7043–7053, 2022.
[7] E. Domanovitz, T. Philosof, and A. Khina, “The information velocity of packet-erasure links,” in IEEE Conference on Computer Communications, 2022.
[8] W. Evans, C. Kenyon, Y. Peres, and L. J. Schulman, “Broadcasting on trees and the Ising model,” Annals of Applied Probability, vol. 10, no. 2, pp. 410–433, 2000.
[9] A. Makur, E. Mossel, and Y. Polyanskiy, “Broadcasting on random directed acyclic graphs,” IEEE Transactions on Information Theory, vol. 66, no. 2, pp. 780–812, 2020.
[10] ——, “Broadcasting on two-dimensional regular grids,” IEEE Transactions on Information Theory, vol. 68, no. 10, pp. 6297–6334, 2022.
[11] A. El Gamal and Y.-H. Kim, Network information theory. Cambridge University Press, 2011.
[12] T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley & Sons, Inc., 2006.
[13] C. Shannon, R. Gallager, and E. Berlekamp, “Lower bounds to error probability for coding on discrete memoryless channels. i,” Information and Control, vol. 10, no. 1, p. 65–103, 1967.
[14] ——, “Lower bounds to error probability for coding on discrete memoryless channels. ii,” Information and Control, vol. 10, no. 5, p. 522–552, 1967.
[15] L. Trevisan, “Stanford CS261 lecture notes,” 2011, https://theory.stanford.edu/ trevisan/cs261/lecture11.pdf.
[16] E. R. Berlekamp, “Block coding with noiseless feedback,” Ph.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering, 1964.
[17] B. Nakiboglu, “Exponential bounds on error probability with feedback,” Ph.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering, 2011.

Maxflow-Based Bounds for Low-Rate Information Propagation over Noisy Networks

Abstract

I Introduction

I-A Related Work

II Problem Setup

Theorem 1.

Proof.

Theorem 2.

Proof.

Corollary 3.

Theorem 4.

Proof.

III Preliminaries

III-A Flows and Cuts

Theorem 5.

Theorem 6.

Proof.

III-B Chernoff Divergence

Definition 7.

Lemma 8.

III-C 1-hop Error Exponents

Lemma 9.

Theorem 10.

Theorem 11.

Lemma 12.

Definition 13.

Lemma 14.

Lemma 15.

Proof.

Lemma 16.

Proof.

Lemma 17.

Proof.

III-D Feedback Error Exponent

Theorem 18.

III-E Properties of Chernoff Divergence

Lemma 19.

Lemma 20.

Proof.

IV The Series Case for Theorem 3

IV-A Sequential Block Transmission

Lemma 21.

IV-B Implication for our Main Problem Setting

Corollary 22.

Proof.

IV-C Proof of Lemma 21

Lemma 23.

Proof.

IV-C1 Description of the Forwarding Protocol

IV-C2 Analysis of the Strategy

Lemma 24.

Proof.

IV-D Comparison to Previous Work

V Proof of Theorem 3 (Main Achievability Bound)

V-A Edge-Disjoint Paths

V-B The General Case

VI Proof of Theorem 4 (Main Converse Bound)

Lemma 25.

Proof.

Corollary 26.

VII Achievability Bounds for General Channels

VII-A Constant Number of Messages

Lemma 27.

Proof.

VII-B Zero-Rate Error Exponents

Lemma 28.

Proof.

VIII Further Results

VIII-A Converse Bound for M>2M>2

Theorem 29.

Lemma 30.

Lemma 31.

Proof.

VIII-B Approximation Guarantees

Lemma 32.

Proof.

Lemma 33.

Proof.

Lemma 34.

Proof.

Maxflow-Based Bounds for Low-Rate
Information Propagation over Noisy Networks

VIII-A Converse Bound for $M>2$