This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

Maxflow-Based Bounds for Low-Rate
Information Propagation over Noisy Networks

Yan Hao Ling and Jonathan Scarlett
Abstract

We study error exponents for the problem of low-rate communication over a directed graph, where each edge in the graph represents a noisy communication channel, and there is a single source and destination. We derive maxflow-based achievability and converse bounds on the error exponent that match when there are two messages and all channels satisfy a symmetry condition called pairwise reversibility. More generally, we show that the upper and lower bounds match to within a factor of 4. We also show that with three messages there are cases where the maxflow-based error exponent is strictly suboptimal, thus showing that our tightness result cannot be extended beyond two messages without further assumptions.

000The authors are with the Department of Computer Science and Department of Mathematics, School of Computing, National University of Singapore (NUS). Jonathan Scarlett is also with the Institute of Data Science, NUS. Emails: [email protected]; [email protected] This work was supported by the Singapore National Research Foundation (NRF) under grant number R-252-000-A74-281.

I Introduction

Problems of low-rate (e.g., 1-bit) information propagation in noisy networks have recently gained increasing attention, with prominent examples including the following:

  • The transmission of a single bit over a tandem of channels (the 2-hop setting) was studied in [1, 2, 3], and the multi-bit variant was studied in [4]. The goal is to characterize the associated error exponent; as we further discuss below, this line of works is the one most relevant to the present paper.

  • The transmission of a single bit over a long chain of noisy channels (connected by relays) dates back to the work of Schulman and Rajagopalan [5] and recently regained attention [1, 6, 7] after being posed by Polyanskiy (who termed it the information velocity problem) and connected to the 2-hop setting by Huleihel, Polyanskiy, and Shayevitz [1].

  • Another line of works studied broadcasting problems on various kinds of graphs, such as trees [8], random graphs [9], and grids [10]. Here the goal is to reconstruct the original bit from the information received at the leaves, and the focus has been on simple intermediate nodes performing memoryless operations. (In contrast, the problems described above allow the intermediate/relay nodes to perform complicated coding strategies.)

In this paper, we address a notable gap in this list by extending the 2-hop problem to transmission over a fixed but arbitrary directed graph, with a single source and a single destination.111General graphs also fall under the framework of [5], even without the constraint that its size is fixed with respect to the block length, but the focus therein is only on scaling laws and not precise constants or error exponents. An extensive range of channel capacity results are known for such settings [11, Ch. 15], but 1-bit and other low-rate settings have remained less explored.

Specifically, we generalize the 2-hop multi-bit setup to arbitrary networks and show a maxflow-based achievability bound (Theorem 3), which supersedes our main previous multi-bit result for the 2-hop setting [4]. We also provide a converse bound (Theorem 4) based on the cutset idea, which matches the achievability bound in the 1-bit case when every channel satisfies a symmetry condition called pairwise reversibility (see Definition 7 below). While we are unable to find the optimal error exponent more generally, our achievability and converse bounds provide a 4-approximation (Lemma 122).

We proceed by outlining the related work in more detail, and then formally describe the problem setup and state our main results.

I-A Related Work

Tandem of channels (2-hop). The problem of transmitting a single bit over a tandem of channels was introduced by Huleihel, Polyanskiy, and Shayevitz [1], as well as Jog and Loh using different motivation/terminology based on teaching and learning in multi-agent problems [2]. Both of these works gave various protocols with varying trade-offs between their simplicity and their associated error exponents. We then showed in [3] that the 1-hop and 2-hop error exponents coincide whenever the two channels have the same transition law. In [4], we generalized this problem to the multi-bit setting and showed that whenever both channels are pairwise reversible, the 2-hop error exponent is simply given by the bottleneck (i.e., the worse of the two 1-hop error exponents).

Information velocity (many-hop). As we outlined above, the information velocity problem seeks to establish the minimum possible ratio of the number of hops to the block length to reliably send a single bit over a long chain of relays connected by binary symmetric channels (or more generally, other channels). This problem also has interesting connections to the 2-hop setting [1].

A general result of Schulman and Rajagopalan [5] translates noiseless protocols (on arbitrary graphs) to noisy ones, and when specialized to the line graph, this proves that positive information velocity is attainable. We gave a simple and efficient protocol achieving positive information velocity in [6], though precisely characterizing the best possible information velocity remains very much open. The information velocity of erasure-type channels was studied in [7], and for this class of channels an exact characterization was attained.

Broadcasting problems. As mentioned above, problems of broadcasting [8, 9, 10] have a similar flavor to the information velocity problem, but with some substantial differences including (i) the consideration of more complicated graphs, (ii) the combination of many nodes’ information to form the final estimate, and (iii) the consideration of simple memoryless intermediate nodes without the use of coding.

Capacity results on graphs. Another notable line of works has sought to characterize the channel capacity of noisy networks, as summarized in [12, Sec. 15.10] and [11, Ch. 15]. The study of capacity is largely quite distinct from that of 1-bit (or other low-rate) communication, but we will see some similarities throughout the paper. In particular, our results are based on maxflow and mincut ideas, which are widely used in capacity results. Instead of using 1-hop channel capacities as edge weights for the maxflow problem, we will use 1-hop error exponents. Our protocols and analysis will be substantially different compared to capacity results and will require further assumptions such as pairwise reversibility, but we will use an existing idea of replacing individual edges by multiple parallel edges of smaller weight in order to form edge-disjoint paths (e.g., see [11, Sec. 15.3]).

II Problem Setup

We first describe the classical channel coding setup on a discrete memoryless channel (DMC) PP, whose alphabets are denoted by 𝒳\mathcal{X} and 𝒴\mathcal{Y} (or sometimes 𝒳P\mathcal{X}_{P} and 𝒴P\mathcal{Y}_{P}). The source node is given a message m{1,2,,M}m\in\{1,2,\ldots,M\}, and has nn uses of the channel PP to send information across to the destination node, after which the destination node guesses the value of mm. An error is said to have occurred if the destination’s guess is different from mm. The minimum probability of an error across all protocols with nn time steps is denoted by perr(n,M,P)p_{\rm err}(n,M,P), and the error exponent of the channel with MM messages is defined as

M,P=lim infn1nperr(n,M,P),\mathcal{E}_{M,P}=\liminf_{n\rightarrow\infty}-\frac{1}{n}p_{\rm err}(n,M,P), (1)

where the limit is as nn\rightarrow\infty while MM remains fixed. We will use \mathcal{E}, M\mathcal{E}_{M} or P\mathcal{E}_{P} whenever the omitted parameters are clear from the context.

In this paper, we consider a communication over a directed graph with edges representing noisy channels. Specifically, we are given a directed graph GG (not necessarily acyclic), with two special nodes called source and destination. Without loss of generality, we assume that there exists at least one path from the source to the destination. For every edge ee, there is a DMC PeP_{e}. We note that for a given pair of nodes, say (i,j)(i,j), we allow for the possibility of multiple parallel edges connecting ii to jj (though this could equivalently be viewed as a single edge whose associated channel is a product channel).

At time 0, the source node receives a message m{1,2,,M}m\in\{1,2,\ldots,M\}. For each time step, every node gets one use of the channel associated with its outgoing edge. This process occurs for a total of nn time steps. After nn time steps, the destination node forms an estimate of the message; let perr(n,M,G)p_{\rm err}(n,M,G) be the resulting error probability.222Our results are unaffected by whether this error probability is averaged over all mm or taken with respect to the worst-case mm, as is the case for the vast majority of error exponent studies (e.g, [13, 14]). We are interested in the error exponent of the network, given by

M,G=lim infn1nlogperr(n,M,G).\mathcal{E}_{M,G}=\liminf_{n\to\infty}-\frac{1}{n}\log p_{\rm err}(n,M,G). (2)

Note that we are taking the limit as nn\rightarrow\infty, while MM and GG remain fixed. When the context is clear, we will simply denote this as G\mathcal{E}_{G}.

For an edge ee, we let e\mathcal{E}_{e} be short for Pe\mathcal{E}_{P_{e}} (i.e., the 1-hop error exponent of the associated channel). For a fixed (M,G)(M,G) pair, we can create a network flow problem (see Section III-A) that assigns a capacity333This is “capacity” in a generic sense, and not the Shannon capacity. of e\mathcal{E}_{e} on every edge ee, and consider the maximum flow across this network. We will refer to this as the maxflow for the (M,G)(M,G) pair, and denote it by |fM,G||f_{M,G}|. We will also shorten this to |f||f| when the context is clear.

Our main result is the following achievability result for channels that are pairwise reversible; this is a classical notion of symmetry [14] that notably includes the binary symmetric channel (BSC), binary erasure channel (BEC), and their generalizations to non-binary inputs. Briefly, the property is that the quantity yP(y|x)1sP(y|x)s\sum_{y}P(y|x)^{1-s}P(y|x^{\prime})^{s} attains its minimum at s=12s=\frac{1}{2} for all (x,x)(x,x^{\prime}), which implies that the Chernoff distance between two inputs simplifies to the Bhattacharyya distance. See Definition 7 below for further details, and [14] for a more comprehensive treatment with numerous examples.

Theorem 1.

If every channel in GG is pairwise reversible, then the error exponent |fM,G||f_{M,G}| is achievable, i.e.,

M,G|fM,G|.\mathcal{E}_{M,G}\geq|f_{M,G}|. (3)
Proof.

We provide the proof for the series case (i.e., a line graph) in Section IV, which we then use as a stepping stone for the general case in Section V. ∎

We note that this is an information-theoretic achievability result, and we do not attempt to establish an associated computationally efficient protocol (though doing so would be of interest). By comparison, our related works on 2-hop settings contained various results based on both efficient and inefficient protocols [3, 4].

We also prove the following converse bound:

Theorem 2.

For any pair (M,G)(M,G), we have

M,G|f2,G|.\mathcal{E}_{M,G}\leq|f_{2,G}|. (4)
Proof.

See Section VI. ∎

Combining Theorems 3 and 4, we establish that equality holds whenever every channel is pairwise reversible and M=2M=2:

Corollary 3.

If every channel in GG is pairwise reversible, then

2,G=|f2,G|.\mathcal{E}_{2,G}=|f_{2,G}|. (5)

We are also interested in the zero-rate error exponent. Momentarily returning to classical 1-hop communication, for R>0R>0, we define the error exponent of the channel PP at rate RR by

EP(R)=lim infn{1nlogperr(n,enR,P)},E_{P}(R)=\liminf_{n\rightarrow\infty}\left\{-\frac{1}{n}\log p_{\rm err}(n,e^{nR},P)\right\}, (6)

and we extend this function to R=0R=0 via EP(0)=limR0+EP(R)E_{P}(0)=\lim_{R\to 0^{+}}E_{P}(R). Analogously, in the setting of the present paper, we let EG(R)E_{G}(R) and EG(0)E_{G}(0) denote the zero-rate error exponent of the network. We can again consider the network flow problem which assigns a capacity of Ee(0)E_{e}(0) on every edge ee, which we will denote by |fG(0)||f_{G}(0)| or sometimes simply |f||f|. The following result extends Theorem 3 to zero-rate error exponents, even without requiring the pairwise reversible assumption:

Theorem 4.

For any GG, whose channels need not be pairwise reversible, we have

EG(0)|fG(0)|.E_{G}(0)\geq|f_{G}(0)|. (7)
Proof.

See Section VII-B. ∎

Some additional results are given in Sections VII and VIII, and are briefly outlined as follows:

  • Lemma 27 states an achievability result for channels that may not be pairwise reversible; the idea is to use a weaker “symmetrized” error exponent on each edge in the graph, rather than the optimal 1-hop exponent.

  • In Theorem 29, we provide conditions (albeit somewhat restrictive) under which a matching converse can be obtained for pairwise reversible channels with M>2M>2.

  • In Lemma 122 we establish the optimal exponent to within a factor of 44 for general MM and GG, and in Lemma 33 we provide conditions under which this can be improved to a factor of 22.

  • In Theorem 35, we show that in general there exist scenarios in which the maxflow-based error exponent is strictly suboptimal when M=3M=3, even under the constraint of an acyclic graph and pairwise reversible channels. Thus, we place a strong restriction on the extent to which our tightness result for M=2M=2 can be generalized.

III Preliminaries

In this section, we introduce some additional notation and definitions, and provide a number of useful auxiliary results that will be used for proving our main results.

III-A Flows and Cuts

We momentarily depart from the above setup and consider generic notions on graphs. Consider a directed graph G=(V,E)G=(V,E), with two special nodes called the source vsv_{s} and destination vtv_{t}. Suppose that on every edge ee, there is a non-negative real-valued capacity, denoted cec_{e}.

Let f={fe}eEf=\{f_{e}\}_{e\in E} be a function from EE to 0\mathbb{R}_{\geq 0}. We say that ff is a flow if the following conditions hold:

  • (Capacity constraints) For each ee, 0fece0\leq f_{e}\leq c_{e}.

  • (Flow conservation) For each vVv\in V, except for the source and destination, we have

    u:(u,v)Efu,v=v:(v,w)Efv,w.\sum_{u:(u,v)\in E}f_{u,v}=\sum_{v:(v,w)\in E}f_{v,w}. (8)

Define the size of a flow ff as

|f|=u:(u,vs)Efu,vs.|f|=\sum_{u:(u,v_{s})\in E}f_{u,v_{s}}. (9)

A flow ff is maximal if for any other flow ff^{\prime}, |f||f||f|\geq|f^{\prime}|.

Closely related to flows is the concept of cuts. A cut cc is a partition of VV into two sets (A,B)(A,B) such that vsAv_{s}\in A and vtBv_{t}\in B. Define the size of a cut cc as

|c|=uA,vBfu,v,|c|=\sum_{u\in A,v\in B}f_{u,v}, (10)

where we emphasize that this only counts edges from AA to BB and not back-edges. A cut cc is minimal if for any cut cc^{\prime}, |c||c||c|\leq|c^{\prime}|. We now state the well-known maxflow-mincut duality theorem:

Theorem 5.

(Maxflow-mincut duality theorem, e.g., [15]) The size of the maximum flow is equal to the size of the minimum cut.

The following result is also well known, but we provide a short proof for the interested reader.

Theorem 6.

(Flow decomposition theorem, e.g., [15]) Let ff be any flow. There exist finitely many simple paths444A path is simple if it never visits the same node more than once. from vsv_{s} to vtv_{t}, denoted by p1,p2,,pkp_{1},p_{2},\ldots,p_{k}, with associated flows f1,f2,,fkf_{1},f_{2},\ldots,f_{k}, such that

  • Each fif_{i} flows only along path pip_{i}. In other words, (fi)e=|fi|(f_{i})_{e}=|f_{i}| for all epie\in p_{i} and (fi)e=0(f_{i})_{e}=0 otherwise.

  • The total sum of all fif_{i} is equal to ff. In other words, we have for all ee that

    i(fi)e=fe.\sum_{i}(f_{i})_{e}=f_{e}. (11)
Proof.

We describe a procedure for iteratively building up a list of (pi,fi)(p_{i},f_{i}) pairs. Given ff, let pip_{i} be any path that has non-zero flow across every edge. Let |fi||f_{i}| be the smallest among these (non-zero) values, and let fif_{i} be the flow formed by pushing |fi||f_{i}| units of flow along the path pip_{i}. We then append (pi,fi)(p_{i},f_{i}) to the list of pairs, subtract |fi||f_{i}| along every edge on pip_{i}, and repeat the preceding steps until no flow remains. Since the number of active edges (i.e., those with non-zero flow) strictly decreases on every step of this process, it must terminate and produce a finite number of paths, as desired. ∎

III-B Chernoff Divergence

Let P(y|x)P(y|x) be a generic discrete memoryless channel. To lighten notation, we let Px()P_{x}(\cdot) denote the conditional distribution P(|x)P(\cdot|x). An (M,)(M,\ell)-codebook is defined to be a collection of MM codewords each having length \ell, and when utilizing such a codebook, we will use the notation (x(1),,x(M))(x^{(1)},\ldots,x^{(M)}) for the associated codewords.

For two probability distributions Q,QQ,Q^{\prime} over some finite set 𝒳\mathcal{X}, the Bhattacharyya distance is defined as

dB(Q,Q)=logx𝒳Q(x)Q(x).d_{\rm B}(Q,Q^{\prime})=-\log\sum_{x\in\mathcal{X}}\sqrt{Q(x)Q^{\prime}(x)}. (12)

For random variables X1,X2X_{1},X_{2}, we will also use dBd_{\rm B} to refer to the Bhattacharyya distance between their distributions PX1P_{X_{1}} and PX2P_{X_{2}}, i.e.

dB(X1,X2)=dB(PX1,PX2),d_{\rm B}(X_{1},X_{2})=d_{\rm B}(P_{X_{1}},P_{X_{2}}), (13)

and for x,x𝒳Px,x^{\prime}\in\mathcal{X}_{P}, we define the Bhattacharyya distance associated with two channel inputs as

dB(x,x,P)=dB(Px,Px)d_{\rm B}(x,x^{\prime},P)=d_{\rm B}(P_{x},P_{x^{\prime}}) (14)

with a slight abuse of notation.

Generalizing the Bhattacharyya distance, the Chernoff divergence with parameter ss is given by

dC(Q,Q,s)=logx𝒳Q(x)1sQ(x)s,d_{\rm C}(Q,Q^{\prime},s)=-\log\sum_{x\in\mathcal{X}}Q(x)^{1-s}Q^{\prime}(x)^{s}, (15)

and the Chernoff divergence (with optimized ss) is given by

dC(Q,Q)=max0s1dC(Q,Q,s).d_{\rm C}(Q,Q^{\prime})=\max_{0\leq s\leq 1}d_{\rm C}(Q,Q^{\prime},s). (16)

Analogous to (13)–(14), we also write

dC(X1,X2)=dC(PX1,PX2),\displaystyle d_{\rm C}(X_{1},X_{2})=d_{\rm C}(P_{X_{1}},P_{X_{2}}), (17)
dC(x,x,P)=dC(Px,Px).\displaystyle d_{\rm C}(x,x^{\prime},P)=d_{\rm C}(P_{x},P_{x^{\prime}}). (18)

For a certain class of channels, the quantities dCd_{\rm C} and dBd_{\rm B} coincide, giving us the following definition:

Definition 7.

[14] A discrete memoryless channel is pairwise reversible if, for all x,x𝒳Px,x^{\prime}\in\mathcal{X}_{P},

dB(x,x,P)=dC(x,x,P),d_{\rm B}(x,x^{\prime},P)=d_{\rm C}(x,x^{\prime},P), (19)

which occurs when the quantity

y𝒴PP(y|x)1sP(y|x)s\sum_{y\in\mathcal{Y}_{P}}P(y|x)^{1-s}P(y|x^{\prime})^{s} (20)

attains its minimum at s=12s=\frac{1}{2}.

The pairwise reversibility assumption allows for a number of useful results and constructions that are not available for general channels. In particular, Lemma 12 below gives an explicit expression for the 1-hop error exponent, and Lemma 29 below is based on an explicit construction that attains that exponent. We again refer the reader to [14] for a detailed discussion on pairwise reversibility with several examples.

For any positive integer kk, we let PkP^{k} denote the kk-fold product of PP, with probability mass function

Pk(y|x)=i=1kP(yi|xi).P^{k}(\vec{y}|\vec{x})=\prod_{i=1}^{k}P(y_{i}|x_{i}). (21)

For two sequences x,x\vec{x},\vec{x}^{\prime} of length kk, we also use the notation dB(x,x,Pk)d_{\rm B}(\vec{x},\vec{x}^{\prime},P^{k}) and dC(x,x,Pk)d_{\rm C}(\vec{x},\vec{x}^{\prime},P^{k}) similarly to (14) and (18), with the understanding that x,x\vec{x},\vec{x}^{\prime} are treated as inputs to PkP^{k}. We will use a well-known tensorization property of dCd_{\rm C} (and hence also dBd_{\rm B}), stated as follows.

Lemma 8.

(e.g., see [4, Lemma 8]) For any sequences x=(x1,,xk)\vec{x}=(x_{1},\dotsc,x_{k}) and x=(x1,,xk)\vec{x}^{\prime}=(x^{\prime}_{1},\dotsc,x^{\prime}_{k}), we have

dC(x,x,Pk,s)=i=1kdC(xi,xi,P,s),d_{\rm C}(\vec{x},\vec{x}^{\prime},P^{k},s)=\sum_{i=1}^{k}d_{\rm C}(x_{i},x^{\prime}_{i},P,s), (22)

and

dC(x,x,Pk)=max0s1i=1kdC(xi,xi,P,s).d_{\rm C}(\vec{x},\vec{x}^{\prime},P^{k})=\max_{0\leq s\leq 1}\sum_{i=1}^{k}d_{\rm C}(x_{i},x^{\prime}_{i},P,s). (23)

III-C 1-hop Error Exponents

We will use several results from [14] on the 1-hop error exponents. We first state two closely-related results that are implicit in [14, Thm. 2] and its proof.

Lemma 9.

(Implicit in [14, Thm. 2]) Letting y𝒴n\vec{y}\in\mathcal{Y}^{n} be the received sequence from nn uses of the channel PP, we have for any fixed MM and any sequence of codebooks (indexed by nn) that

lim infn1nlogperrlim infnminm1m2dC((y|m=m1),(y|m=m2)).\liminf_{n\to\infty}-\frac{1}{n}\log p_{\rm err}\geq\liminf_{n\to\infty}\min_{m_{1}\neq m_{2}}d_{\rm C}((\vec{y}|m=m_{1}),(\vec{y}|m=m_{2})). (24)
Theorem 10.

(Implicit in [14, Thm. 2]) For any <M(1)\mathcal{E}^{\dagger}<\mathcal{E}^{(1)}_{M}, it holds for all sufficiently large \ell that there exists an (M,)(M,\ell)-codebook 𝒞\mathcal{C} such that for all pairs of codewords (x,x)\vec{x},\vec{x}^{\prime}) in 𝒞\mathcal{C}, we have

dC(x,x,P).d_{\rm C}(\vec{x},\vec{x}^{\prime},P^{\ell})\geq\ell\cdot\mathcal{E}^{\dagger}. (25)

Next, we state explicit formulas/bounds for error exponents in the zero-rate and fixed-MM settings, respectively.

Theorem 11.

[14, Thm. 4] For any PP, the zero-rate error exponent is given by

EP(0)=maxq𝒫(𝒳)x,x𝒳qxqxdB(x,x,P).E_{P}(0)=\max_{q\in\mathcal{P}(\mathcal{X})}\sum_{x,x^{\prime}\in\mathcal{X}}q_{x}q_{x^{\prime}}d_{\rm B}(x,x^{\prime},P). (26)
Lemma 12.

[14, Thm. 3] For any (M,P)(M,P), we have

M,P2M(M1)max(x1,,xM)𝒳M1m1<m2MdB(xm1,xm2,P),\mathcal{E}_{M,P}\geq\frac{2}{M(M-1)}\max_{(x_{1},\ldots,x_{M})\in\mathcal{X}^{M}}\sum_{1\leq m_{1}<m_{2}\leq M}d_{\rm B}(x_{m_{1}},x_{m_{2}},P), (27)

with equality when PP is pairwise reversible.

In accordance with Lemma 12, we will use the following definition:

Definition 13.

Define the Bhattacharyya distance based error exponent by

~M,P=2M(M1)max(x1,,xM)𝒳M1m1<m2MdB(xm1,xm2,P),\tilde{\mathcal{E}}_{M,P}=\frac{2}{M(M-1)}\max_{(x_{1},\ldots,x_{M})\in\mathcal{X}^{M}}\sum_{1\leq m_{1}<m_{2}\leq M}d_{\rm B}(x_{m_{1}},x_{m_{2}},P), (28)

and note that ~M,P=M,P\tilde{\mathcal{E}}_{M,P}=\mathcal{E}_{M,P} for pairwise reversible channels.

In the proof of Lemma 12, [14] also showed the following:

Lemma 14.

For any DMC PP, there exists an (M,)(M,\ell)-codebook (x1,,xM)(\vec{x}_{1},\ldots,\vec{x}_{M}) with =M!\ell=M! such that for any pair m1m2m_{1}\neq m_{2}, it holds that

dB(xm1,xm2,P)=~M,P.d_{\rm B}(\vec{x}_{m_{1}},\vec{x}_{m_{2}},P^{\ell})=\ell\cdot\tilde{\mathcal{E}}_{M,P}. (29)

Since we will make use of codes over product channels, the following lemma is useful:

Lemma 15.

Let \ell be a fixed integer and x1,x2,,xM\vec{x}_{1},\vec{x}_{2},\ldots,\vec{x}_{M} be length-\ell codewords, and let QQ be the restriction of PP^{\ell} to these inputs. Then P1Q\mathcal{E}_{P}\geq\frac{1}{\ell}\mathcal{E}_{Q}. Furthermore, if PP is pairwise reversible, then QQ is also pairwise reversible.

Proof.

Let w1,,wM\vec{w}_{1},\ldots,\vec{w}_{M} be codewords achieving an error exponent arbitrarily close to Q\mathcal{E}_{Q}. Since each character in wi\vec{w}_{i} is a string of \ell characters over XX, we can expand the strings w1,,wM\vec{w}_{1},\ldots,\vec{w}_{M} to form w1,,wM\vec{w}_{1}^{\prime},\ldots,\vec{w}_{M}^{\prime}, achieving the same error probability when used as codewords for PP. The 1\frac{1}{\ell} scaling factor comes from the factor of \ell difference in the block length.

For the second claim, we apply Lemma 23 to obtain

dC(xm1,xm2,Q)=i=1dC(xm1(i),xm2(i),P),d_{\rm C}(\vec{x}_{m_{1}},\vec{x}_{m_{2}},Q)=\sum_{i=1}^{\ell}d_{\rm C}(\vec{x}_{m_{1}}(i),\vec{x}_{m_{2}}(i),P), (30)

Since PP is pairwise reversible, the right-hand side attains its minimum at s=1/2s=1/2, which means that QQ is also pairwise reversible. ∎

The next lemma allows us to break down the error exponent of a product channel into its individual components when M=2M=2. (We note that this result does not extend to M>2M>2.)

Lemma 16.

Let P(1),P(2),,P(k)P^{(1)},P^{(2)},\ldots,P^{(k)} be arbitrary channels, and let P^=P(1)×P(2)××P(k)\hat{P}=P^{(1)}\times P^{(2)}\times\ldots\times P^{(k)} be the product channel. Then

2,P^i2,P(i).\mathcal{E}_{2,\hat{P}}\leq\sum_{i}\mathcal{E}_{2,P^{(i)}}. (31)
Proof.

For any DMC, the error exponent for M=2M=2 is given as follows [14, Eq. (1.19)]:

2,P=maxx1,x2𝒳PdC(x1,x2,P).\mathcal{E}_{2,P}=\max_{x_{1},x_{2}\in\mathcal{X}_{P}}d_{\rm C}(x_{1},x_{2},P). (32)

Accordingly, we choose x1,x2i𝒳P(i)\vec{x}_{1},\vec{x}_{2}\in\prod_{i}{\mathcal{X}_{P^{(i)}}} attaining the maximum in

2,P^=maxx1,x2dC(x1,x2,P^),\mathcal{E}_{2,\hat{P}}=\max_{\vec{x}_{1},\vec{x}_{2}}d_{\rm C}(\vec{x}_{1},\vec{x}_{2},\hat{P}), (33)

and let x1(i)\vec{x}_{1}(i) and x2(i)\vec{x}_{2}(i) represent the ii-th component of x1(i)\vec{x}_{1}(i) and x2(i)\vec{x}_{2}(i) respectively. Then,

dC(x1,x2,P)\displaystyle d_{\rm C}(\vec{x}_{1},\vec{x}_{2},P) =\displaystyle= max0s1dC(x1,x2,P,s)\displaystyle\max_{0\leq s\leq 1}d_{\rm C}(\vec{x}_{1},\vec{x}_{2},P,s) (34)
=Lem.23\displaystyle\stackrel{{\scriptstyle Lem.~{}\ref{lem:iid}}}{{=}} max0s1idC(x1(i),x2(i),P(i),s)\displaystyle\max_{0\leq s\leq 1}\sum_{i}d_{\rm C}(\vec{x}_{1}(i),\vec{x}_{2}(i),P^{(i)},s) (35)
\displaystyle\leq imax0s1dC(x1(i),x2(i),P(i),s)\displaystyle\sum_{i}\max_{0\leq s\leq 1}d_{\rm C}(\vec{x}_{1}(i),\vec{x}_{2}(i),P^{(i)},s) (36)
(32)\displaystyle\stackrel{{\scriptstyle\eqref{eq:two_codewords}}}{{\leq}} i2,P(i)\displaystyle\sum_{i}\mathcal{E}_{2,P^{(i)}} (37)

as required. ∎

The following result shows that we have equality in Lemma 31 if we restrict all P(i)P^{(i)} to be pairwise reversible:

Lemma 17.

Let P(1),P(2),,P(k)P^{(1)},P^{(2)},\ldots,P^{(k)} be pairwise reversible channels and let P^=P(1)×P(2)××P(k)\hat{P}=P^{(1)}\times P^{(2)}\times\ldots\times P^{(k)} be the product channel. Then

M,P^=iM,P(i).\mathcal{E}_{M,\hat{P}}=\sum_{i}\mathcal{E}_{M,P^{(i)}}. (38)
Proof.

We first note from the second part of Lemma 15 that P^\hat{P} is pairwise reversible. As a result, by Lemma 12, there exists some (x1,x2,,xM)(\vec{x}_{1},\vec{x}_{2},\ldots,\vec{x}_{M}) such that

P^=2M(M1)1m1<m2MdB(xm1,xm2,P^).\mathcal{E}_{\hat{P}}=\frac{2}{M(M-1)}\sum_{1\leq m_{1}<m_{2}\leq M}d_{\rm B}(\vec{x}_{m_{1}},\vec{x}_{m_{2}},\hat{P}). (39)

Moreover, by Lemma 23, we can break this down into the Chernoff exponents for the individual channels:

P^=2M(M1)1m1<m2MidB(xm1(i),xm2(i),P(i))Lem.12iP(i).\mathcal{E}_{\hat{P}}=\frac{2}{M(M-1)}\sum_{1\leq m_{1}<m_{2}\leq M}\sum_{i}d_{\rm B}(\vec{x}_{m_{1}}(i),\vec{x}_{m_{2}}(i),P^{(i)})\stackrel{{\scriptstyle Lem.~{}\ref{lem:opt_rev}}}{{\leq}}\sum_{i}\mathcal{E}_{P^{(i)}}. (40)

To prove the inequality in the other direction, we simply reverse the entire argument. For each channel P(i)P^{(i)}, choose (x1(i),,xM(i))(x_{1}^{(i)},\ldots,x_{M}^{(i)}) to attain equality in (27). Defining xm=(xm(i))i=1k\vec{x}_{m}=(x_{m}^{(i)})_{i=1}^{k} then gives

P^2M(M1)1m1m2MdB(xm1,xm2,P^)=iP(i),\mathcal{E}_{\hat{P}}\geq\frac{2}{M(M-1)}\sum_{1\leq m_{1}\leq m_{2}\leq M}d_{\rm B}(\vec{x}_{m_{1}},\vec{x}_{m_{2}},\hat{P})=\sum_{i}\mathcal{E}_{P^{(i)}}, (41)

where both steps are based on (27) holding with equality (first for P^\hat{P}, and then for the individual P(i)\mathcal{E}_{P^{(i)}}). ∎

III-D Feedback Error Exponent

In the well-studied problem of 1-hop communication with feedback, the sender is given access to all of the previous channel outputs before sending the next channel input. For a discrete memoryless channel PP, we define the minimum probability of error with feedback perrf(n,M,P)p_{\rm err}^{f}(n,M,P), and define the error exponent of the channel with MM messages via

M,Pf=lim infT1nperrf(n,M,P).\mathcal{E}^{f}_{M,P}=\liminf_{T\rightarrow\infty}-\frac{1}{n}p_{\rm err}^{f}(n,M,P). (42)

Clearly, M,PfM,P\mathcal{E}^{f}_{M,P}\geq\mathcal{E}_{M,P}. In general, feedback error exponents may be strictly higher than non-feedback ones, even for the binary symmetric channel [16]. However, the two error exponents match for arbitrary DMCs when M=2M=2:

Theorem 18.

[16, 17] For any DMC PP, feedback does not improve the error exponent when M=2M=2, i.e.

2,Pf=2,P.\mathcal{E}^{f}_{2,P}=\mathcal{E}_{2,P}. (43)

III-E Properties of Chernoff Divergence

The following lemma concerns the Chernoff divergence of a composite channel:

Lemma 19.

[4, Lemma 7] Let P1,P2P_{1},P_{2} be channels such that 𝒴P1𝒳P2\mathcal{Y}_{P_{1}}\subseteq\mathcal{X}_{P_{2}}. For all x,x𝒳P1x,x^{\prime}\in\mathcal{X}_{P_{1}}, we have

dC(x,x,P2P1,s)miny,y𝒴P1{dC(y,y,P2,s)(1s)logP1(y|x)slogP1(y|x)}2log|𝒴P1|.d_{\rm C}(x,x^{\prime},P_{2}\circ P_{1},s)\geq\min_{y,y^{\prime}\in\mathcal{Y}_{P_{1}}}\big{\{}d_{\rm C}(y,y^{\prime},P_{2},s)-(1-s)\log P_{1}(y|x)-s\log P_{1}(y^{\prime}|x^{\prime})\big{\}}-2\log|\mathcal{Y}_{P_{1}}|. (44)

The next result relates the error probability of a likelihood ratio test with the Chernoff divergence. We expect that this result is standard, but we provide a short proof for completeness.

Lemma 20.

Let PP and QQ be two distributions over the same alphabet 𝒳\mathcal{X}. For each x𝒳x\in\mathcal{X}, define the likelihood ratio by L(x)=P(x)Q(x)L(x)=\frac{P(x)}{Q(x)}. Then

logxP(L(x)L0)dB(P,Q)12logL0.-\log\mathbb{P}_{x\sim P}(L(x)\leq L_{0})\geq d_{\rm B}(P,Q)-\frac{1}{2}\log L_{0}. (45)
Proof.

Consider the set

S={x|L(x)L0}.S=\{x\,|\,L(x)\leq L_{0}\}. (46)

For all xSx\in S, P(x)L0Q(x)P(x)\leq L_{0}\cdot Q(x), and therefore

P(x)=P(x)P(x)P(x)L0Q(x),P(x)=\sqrt{P(x)}\cdot\sqrt{P(x)}\leq\sqrt{P(x)}\cdot\sqrt{L_{0}\cdot Q(x)}, (47)

which gives

xP(L(x)L0)=xSP(x)xSL0P(x)Q(x)exp(dB(P,Q))L0.\mathbb{P}_{x\sim P}(L(x)\leq L_{0})=\sum_{x\in S}P(x)\leq\sum_{x\in S}\sqrt{L_{0}\cdot P(x)\cdot Q(x)}\leq\exp(-d_{\rm B}(P,Q))\cdot\sqrt{L_{0}}. (48)

IV The Series Case for Theorem 3

In this section, we handle the special case in which the nodes are arranged in a series (i.e., a line graph). That is, the nodes are labeled {0,1,,|E|}\{0,1,\ldots,|E|\}, and the edges are {(j,j+1):0j<|E|}\{(j,j+1):0\leq j<|E|\}. We refer to the corresponding channels as P(1),P(2),P(|E|)P^{(1)},P^{(2)},\ldots P^{(|E|)}, with corresponding 1-hop error exponents (1),(2),(|E|)\mathcal{E}^{(1)},\mathcal{E}^{(2)},\ldots\mathcal{E}^{(|E|)}. All channels are assumed to be pairwise reversible in this section.

This series setup directly generalizes previous work on the 2-hop setting in [1, 2, 3, 4], particularly our work on the the multi-bit setting [4]. Specifically, Theorem 3 for the series setup generalizes the main result therein by extending it from 2 hops to any constant number of hops. The protocol that we adopt has some significant differences to [4], which we discuss in Section IV-D. On the other hand, a key similarity is that we still adopt a block-structured strategy.

IV-A Sequential Block Transmission

We momentarily turn our attention to protocols in which nodes transmit blocks to each other one after the other without any parallel transmission. This idea would be highly wasteful if used in a standalone manner, but the idea (in the next subsection) will be to chain many instances of this approach together in parallel so that a negligible fraction of the total block length nn is ultimately wasted.

For a series graph, define a sequential block transmission with block size BB as follows:

  • For each edge e=(u,v)e=(u,v), uu transmits to vv using BB channels across ee;

  • This process occurs sequentially; transmission over an earlier edge (i.e., closer to the source) must finish before transmission over the next edge commences.

In Section IV-C, we will prove the following key lemma for this setting:

Lemma 21.

Given a series graph, consider a sequential block transmission with block size BB. Let y\vec{y} be the length-BB sequence received by the final node |E||E|, and let |f||f| be the maximum flow (since we are in the series setup, this is equal to minj(j)\min_{j}\mathcal{E}^{(j)}). Then there exists a protocol such that for all m1m2m_{1}\neq m_{2},

dB((y|m=m1),(y|m=m2))B(|f|oB(1)),d_{\rm B}\big{(}(\vec{y}|m=m_{1}),(\vec{y}|m=m_{2})\big{)}\geq B(|f|-o_{B}(1)), (49)

where oB(1)o_{B}(1) is a term that approaches zero as BB increases while GG and MM remain constant.

IV-B Implication for our Main Problem Setting

Returning to our main problem setting in which nodes are not required to transmit one-after-the-other, we obtain the following corollary of Lemma 21.

time1BB2B2B3B3B4B4B5B5B1BB2B2B3B3B4B4B5B5Bnode 1 (source)node 2node 3 (destination)
Figure 1: An illustration of how a total of nn time steps can be broken down into n/B|E|n/B-|E| sequential block transmissions. In this diagram, n=5Bn=5B and |E|=2|E|=2.
Corollary 22.

Given a series graph with nn time steps, there exists a protocol such that for any m1m2m_{1}\neq m_{2}, the length-nn sequence yall\vec{y}_{\rm all} received at node |E||E| satisfies

dB((yall|m=m1),(yall|m=m2))(n|E|B)(|f|oB(1)),d_{\rm B}\big{(}(\vec{y}_{\rm all}|m=m_{1}),(\vec{y}_{\rm all}|m=m_{2})\big{)}\geq(n-|E|B)\cdot(|f|-o_{B}(1)), (50)

where oB(1)o_{B}(1) is a term that approaches zero as BB increases while GG and MM remain constant (without any dependence on nn).

Proof.

Consider the process of chaining several sequential block transmissions in parallel, as depicted in Figure 1. If there are tt sequential block transmissions of length BB, then they can clearly be executed in (t+|E|)B(t+|E|)\cdot B time steps. In other words, with a block length of nn, we can perform n/B|E|n/B-|E| sequential block transmissions (up to asymptotically negligible rounding issues).

For each such sequential block transmission, we use the protocol that achieves Lemma 21. Since memory is not stored between the blocks, the blocks are conditionally independent given the source message. Let y(t)\vec{y}(t) represent the tt-th block received by the final node, and let yall=(y(1),y(2),,y(n/B|E|))\vec{y}_{\rm all}=(\vec{y}(1),\vec{y}(2),\ldots,\vec{y}(n/B-|E|)). Then, using tensorization (Lemma 23) and Lemma 21, we have for all m1m2m_{1}\neq m_{2} that

dB((yall|m=m1),(yall|m=m2))\displaystyle d_{\rm B}\big{(}(\vec{y}_{\rm all}|m=m_{1}),(\vec{y}_{\rm all}|m=m_{2})\big{)} (51)
=\displaystyle= t=1n/B|E|dB((y(t)|m=m1),(y(t)|m=m2))\displaystyle\sum_{t=1}^{n/B-|E|}d_{\rm B}\big{(}(\vec{y}(t)|m=m_{1}),(\vec{y}(t)|m=m_{2})\big{)}
\displaystyle\geq (n/B|E|)B(|f|oB(1))\displaystyle(n/B-|E|)B(|f|-o_{B}(1)) (52)
=\displaystyle= (n|E|B)(|f|oB(1)),\displaystyle(n-|E|B)\cdot(|f|-o_{B}(1)), (53)

as required. ∎

Then, the error exponent is bounded by

lim infn1nlogperr\displaystyle\liminf_{n\to\infty}-\frac{1}{n}\log p_{\rm err} (54)
\displaystyle\geq lim infnminm1m2dB((yall|m=m1),(yall|m=m2))\displaystyle\liminf_{n\to\infty}\min_{m_{1}\neq m_{2}}d_{\rm B}\big{(}(\vec{y}_{\rm all}|m=m_{1}),(\vec{y}_{\rm all}|m=m_{2})\big{)} (55)
=\displaystyle= lim infnn|E|Bn(|f|oB(1))\displaystyle\liminf_{n\to\infty}\frac{n-|E|B}{n}(|f|-o_{B}(1)) (56)
=\displaystyle= |f|oB(1),\displaystyle|f|-o_{B}(1), (57)

where (55) follows from Lemma 9, and (56) follow from Corollary 22. This proves that the error exponents arbitrarily close to |f||f| are achievable, which proves Theorem 3 for the series case.

IV-C Proof of Lemma 21

We first show that it suffices to prove Lemma 21 under an additional assumption that will be convenient to work with:

Lemma 23.

Suppose that Lemma 21 holds under the additional restriction that each channel P(j)P^{(j)} has MM inputs {1,2,,M}\{1,2,\ldots,M\} such that dB(m1,m2,P(j))|f|d_{\rm B}(m_{1},m_{2},P^{(j)})\geq|f| for all m1m2m_{1}\neq m_{2}. Then the general form of Lemma 21 as stated also holds.

Proof.

Consider the general setup of Lemma 21 with arbitrary P(j)P^{(j)}. By Lemma 29, for each jj, there exist codewords x1(j),x2(j),,xM(j)\vec{x}^{(j)}_{1},\vec{x}^{(j)}_{2},\ldots,\vec{x}^{(j)}_{M} of length =M!\ell=M! such that

dB(xm1(j),xm2(j),(P(j)))M,P(j).d_{\rm B}(\vec{x}^{(j)}_{m_{1}},\vec{x}^{(j)}_{m_{2}},(P^{(j)})^{\ell})\geq\ell\cdot\mathcal{E}_{M,P^{(j)}}. (58)

Let Q(j)Q^{(j)} be the restriction of (P(j))(P^{(j)})^{\ell} to the codewords x1(j),x2(j),,xM(j)\vec{x}^{(j)}_{1},\vec{x}^{(j)}_{2},\ldots,\vec{x}^{(j)}_{M}. We first show that the channels Q(j)Q^{(j)} satisfy the extra requirement stated in Lemma 23.

By Lemma 15, Q(j)Q^{(j)} is pairwise reversible. Moreover, for each m1m2m_{1}\neq m_{2}, we have from (58) that

dB(xm1(j),xm2(j),Q(j))=dB(xm1(j),xm2(j),(P(j)))M,P(j).d_{\rm B}(\vec{x}^{(j)}_{m_{1}},\vec{x}^{(j)}_{m_{2}},Q^{(j)})=d_{\rm B}(\vec{x}^{(j)}_{m_{1}},\vec{x}^{(j)}_{m_{2}},(P^{(j)})^{\ell})\geq\ell\cdot\mathcal{E}_{M,P^{(j)}}. (59)

Then, we deduce from Lemma 12 (with the MM inputs of Q(j)Q^{(j)} being substituted for (x1,x2,,xM)(x_{1},x_{2},\ldots,x_{M})) that M,Q(j)M,P(j)\mathcal{E}_{M,Q^{(j)}}\geq\ell\cdot\mathcal{E}_{M,P^{(j)}}. Since the maxflow is |f|=minjM,P(j)|f|=\min_{j}\mathcal{E}_{M,P^{(j)}} in the series setup, this gives M,Q(j)|f|\mathcal{E}_{M,Q^{(j)}}\geq\ell\cdot|f|. Then, let |f||f^{\prime}| be the maximum flow for the sequence of channels Q(1),,Q(|E|)Q^{(1)},\ldots,Q^{(|E|)}. Since M,Q(j)|f|\mathcal{E}_{M,Q^{(j)}}\geq\ell\cdot|f| for all jj, the new maximum flow satisfies |f||f||f^{\prime}|\geq\ell\cdot|f|. On the other hand, by Lemma 15, we have Q(j)P(j)\mathcal{E}_{Q^{(j)}}\leq\ell\cdot\mathcal{E}_{P^{(j)}}, which gives

|f|=minjQ(j)minjP(j)|f|.|f^{\prime}|=\min_{j}\mathcal{E}_{Q^{(j)}}\leq\min_{j}\ell\cdot\mathcal{E}_{P^{(j)}}\leq\ell\cdot|f|. (60)

Combining these upper and lower bounds on |f||f^{\prime}|, we conclude that |f|=|f||f^{\prime}|=\ell\cdot|f|, and from (59) we deduce that the channels Q(j)Q^{(j)} satisfy the additional requirement imposed in the statement of Lemma 23 (with |f||f| renamed to |f||f^{\prime}|).

Now suppose that Lemma 21 holds for the new sequence of channels Q(1),,Q(|E|)Q^{(1)},\ldots,Q^{(|E|)}, and let y\vec{y}^{\prime} denote the corresponding length-BB sequence received by the destination node. This means that there exists a sequential block transmission protocol of block size BB such that for any m1m2m_{1}\neq m_{2},

dB((y|m=m1),(y|m=m2))B(|f|oB(1))B(|f|oB(1)).d_{\rm B}\big{(}(\vec{y}^{\prime}|m=m_{1}),(\vec{y}^{\prime}|m=m_{2})\big{)}\geq B(|f^{\prime}|-o_{B}(1))\geq B(\ell|f|-o_{B}(1)). (61)

Any sequential block transmission with block size BB using the new channels Q(1),,Q(|E|)Q^{(1)},\ldots,Q^{(|E|)} can easily be converted to a sequential block transmission with block size BB\cdot\ell over the original channels P(1),,P(|E|)P^{(1)},\ldots,P^{(|E|)} by simply expanding instances of the new letter mm by the codeword xM\vec{x}_{M}. Therefore, there exists a sequential block transmission of block size BB\cdot\ell such that for any m1m2m_{1}\neq m_{2},

dB((y|m=m1),(y|m=m2))B(|f|oB(1)).d_{\rm B}\big{(}(\vec{y}|m=m_{1}),(\vec{y}|m=m_{2})\big{)}\geq B(\ell|f|-o_{B}(1)). (62)

Since \ell is a constant, we can appropriately rescale such that BB\ell above is renamed to BB (the overall sequential block transmission length), and we conclude that Lemma 21 holds for the original channels P(1),,P(|E|)P^{(1)},\ldots,P^{(|E|)}. ∎

For the rest of this section, we work under the extra assumption stated in Lemma 23. We also assume that BB is even to avoid rounding issues.

IV-C1 Description of the Forwarding Protocol

We consider the following protocol for a single sequential block transmission of length BB:

  • Every node jj (0j|E|0\leq j\leq|E|) is assigned a state consisting of two integers (m(j),(j))(m^{(j)},\ell^{(j)}) with 1m(j)M1\leq m^{(j)}\leq M and 0(j)B/20\leq\ell^{(j)}\leq B/2, which we will refer to as (m,)(m,\ell) when there is no ambiguity. A state of (m(j),(j))(m^{(j)},\ell^{(j)}) roughly indicates that the node believes m(j)m^{(j)} to be the most likely value of mm, where larger values of (j)\ell^{(j)} indicate higher confidence, whereas a smaller value of (j)\ell^{(j)} indicates that there is another competing value mm(j)m^{\prime}\neq m^{(j)} that is also relatively likely (we make this more precise below).

  • Node 0 is assigned m(0)=mm^{(0)}=m and (0)=B/2\ell^{(0)}=B/2, where mm is the actual message.

  • If node jj is assigned a state (m,)(m,\ell), then this node sends out a B/2+B/2+\ell copies of mm followed by B/2B/2-\ell copies of m+1m+1 (unless m=Mm=M, in which case we wrap around and replace m+1m+1 by 11 here). We refer to the corresponding string as xm,\vec{x}_{m,\ell}, or xm,(j)\vec{x}^{(j)}_{m,\ell} when we want to emphasize which node is being considered.

  • Node jj (j1j\geq 1), upon receiving y(j)\vec{y}^{(j)} from node j1j-1, computes the likelihoods (y(j)|m(0)=m)\mathbb{P}(\vec{y}^{(j)}|m^{(0)}=m) for each possible message 1mM1\leq m\leq M. Let m,mm^{*},m^{**} be the resulting most likely and second most likely value of mm respectively (with arbitrary tie-breaking).

  • Node jj is now assigned a state of

    m(j)=m1,(j)=min(B2,14|f|log(y(j)|m)(y(j)|m)).m^{(j)}=m_{1},\ell^{(j)}=\min\left(\frac{B}{2},\Bigg{\lfloor}\frac{1}{4|f|}\cdot\log\frac{\mathbb{P}(\vec{y}^{(j)}|m^{*})}{\mathbb{P}(\vec{y}^{(j)}|m^{**})}\Bigg{\rfloor}\right). (63)

    These equations are chosen such that m(j)m^{(j)} reflects the most likely source message, and (j)\ell^{(j)} appropriately reflects the confidence level such that the most likely message m(j)m^{(j)} is roughly exp(4(j)|f|)\exp(4\ell^{(j)}|f|) times more likely than the next value of mm.

IV-C2 Analysis of the Strategy

We start by defining a pseudometric on the states by

d((m1,1),(m2,2))={|f||12|m1=m2|f||1+2|m1m2.d((m_{1},\ell_{1}),(m_{2},\ell_{2}))=\begin{cases}|f||\ell_{1}-\ell_{2}|&m_{1}=m_{2}\\ |f||\ell_{1}+\ell_{2}|&m_{1}\neq m_{2}.\end{cases} (64)

To see that dd defined above is a pseudometric, note that dd is equivalent to the 1\ell_{1}-norm after applying the mapping (m,)(0,,0,|f|,0,,0)(m,\ell)\rightarrow(0,\ldots,0,\ell|f|,0,\ldots,0) (with |f|\ell|f| occurring at position mm). The reason for being a pseudometric rather than a metric is that d((m1,0),(m2,0))=0d((m_{1},0),(m_{2},0))=0.

Intuitively, dd is a measure of how far apart the beliefs (m1,1)(m_{1},\ell_{1}) and (m2,2)(m_{2},\ell_{2}) are. If m1=m2m_{1}=m_{2}, then these two beliefs are close when 1\ell_{1} and 2\ell_{2} are close. If m1m2m_{1}\neq m_{2}, the beliefs are close only when m1m_{1} and m2m_{2} are both comparatively likely, which corresponds to the case where both 1\ell_{1} and 2\ell_{2} are small.

We use dd to bound the Bhattacharyya distance associated with two codewords:

dB(xm1,1(j),xm2,2(j),(P(j))B)\displaystyle d_{\rm B}(\vec{x}^{(j)}_{m_{1},\ell_{1}},\vec{x}^{(j)}_{m_{2},\ell_{2}},(P^{(j)})^{B}) =\displaystyle= i=1BdB(xm1,1(j)(i),xm2,2(j)(i),P(j))\displaystyle\sum_{i=1}^{B}d_{\rm B}(\vec{x}^{(j)}_{m_{1},\ell_{1}}(i),\vec{x}^{(j)}_{m_{2},\ell_{2}}(i),P^{(j)}) (65)
\displaystyle\geq |f|dH(xm1,1(j)(i),xm2,2(j)(i))\displaystyle|f|d_{H}(\vec{x}^{(j)}_{m_{1},\ell_{1}}(i),\vec{x}^{(j)}_{m_{2},\ell_{2}}(i)) (66)
\displaystyle\geq d((m1,1),(m2,2)),\displaystyle d((m_{1},\ell_{1}),(m_{2},\ell_{2})), (67)

where:

  • (65) follows from tensorization (Lemma 23).

  • (66) comes from the fact that the number of indices in which xm1,1(j)(i)\vec{x}^{(j)}_{m_{1},\ell_{1}}(i) and xm2,2(j)(i)\vec{x}^{(j)}_{m_{2},\ell_{2}}(i) differ is simply their Hamming distance, and each of these corresponding terms are at least |f||f| due to the extra assumption in Lemma 23.

  • To see why (67) holds, we use the definition of xm,(j)\vec{x}^{(j)}_{m,\ell} and handle the cases m1=m2m_{1}=m_{2} and m1m2m_{1}\neq m_{2} separately. If m1=m2m_{1}=m_{2}, then

    dH(xm1,1(j),xm2,2(j))=|12|,d_{H}(\vec{x}^{(j)}_{m_{1},\ell_{1}},\vec{x}^{(j)}_{m_{2},\ell_{2}})=|\ell_{1}-\ell_{2}|, (68)

    which matches (64). If m1m2m_{1}\neq m_{2}, then the first B/2+min(1,2)B/2+\min(\ell_{1},\ell_{2}) symbols of the two codewords consist of only m1m_{1} and only m2m_{2} respectively, so that

    dH(xm1,1(j),xm2,2(j))B/2+min(1,2)max(1,2)+min(1,2)=1+2,d_{H}(\vec{x}^{(j)}_{m_{1},\ell_{1}},\vec{x}^{(j)}_{m_{2},\ell_{2}})\geq B/2+\min(\ell_{1},\ell_{2})\geq\max(\ell_{1},\ell_{2})+\min(\ell_{1},\ell_{2})=\ell_{1}+\ell_{2}, (69)

    which again matches (64).

Observe that the sequence of states taken by the nodes forms a Markov process. We establish a bound on the probability of achieving any intermediate state given the beginning state; intuitively, the following result states that given the original state (m(0),(0))=(m,B/2)(m^{(0)},\ell^{(0)})=(m,B/2), the probability of seeing another state (m(j),(j))(m^{(j)},\ell^{(j)}) is exponentially small in the distance between the two states defined by (64).

Lemma 24.

For all jj, all m1m2m_{1}\neq m_{2}, and all (m,)(m^{\prime},\ell^{\prime}), we have

log(m(j)=m,(j)=|m=m1)2d((m1,B/2),(m,))2jlog(M(B+1))2j|f|jlog(M1)-\log\mathbb{P}\big{(}m^{(j)}=m^{\prime},\ell^{(j)}=\ell^{\prime}|m=m_{1}\big{)}\geq 2d\big{(}(m_{1},B/2),(m^{\prime},\ell^{\prime})\big{)}-2j\log(M(B+1))-2j|f|-j\log(M-1) (70)

and

dB((y(j+1)|m=m1),(y(j+1)|m=m2))B|f|2(j+1)log(M(B+1))2j|f|jlog(M1).d_{\rm B}\big{(}\big{(}\vec{y}^{(j+1)}|m=m_{1}\big{)},\big{(}\vec{y}^{(j+1)}|m=m_{2}\big{)}\big{)}\geq B|f|-2(j+1)\log(M(B+1))-2j|f|-j\log(M-1). (71)
Proof.

We use an induction argument, with the base case being that (70) trivially holds when j=0j=0: Either the left-hand side is infinite due to a probability of zero, or the left-hand side is zero due to a probability of one (i.e., (m,)=(m1,B/2)(m^{\prime},\ell^{\prime})=(m_{1},B/2)), in which case the first term on the right-hand side is zero.

We proceed by induction on jj, and break the inductive argument into the following two claims:

  • If (70) holds for jj, then (71) holds for jj

  • If (71) holds for jj, then (70) holds for j+1j+1.

Start by assuming that (70) holds for some jj. Observe that m(m(j),(j))y(j+1)m\rightarrow(m^{(j)},\ell^{(j)})\rightarrow\vec{y}^{(j+1)} forms a Markov chain. Let P1P_{1} be the channel m(m(j),(j))m\rightarrow(m^{(j)},\ell^{(j)}) and P2P_{2} be the channel (m(j),(j))y(j+1)(m^{(j)},\ell^{(j)})\rightarrow\vec{y}^{(j+1)}; applying Lemma 44 with s=1/2s=1/2 then gives for any m1m2m_{1}\neq m_{2} that

dB((y(j+1)|m(0)=m1),(y(j+1)|m(0)=m2))\displaystyle d_{\rm B}\left(\big{(}\vec{y}^{(j+1)}|m^{(0)}=m_{1}\big{)},\big{(}\vec{y}^{(j+1)}|m^{(0)}=m_{2}\big{)}\right) (72)
\displaystyle\geq minm,m′′,,′′{dB(xm,,xm′′,′′,(P(j))B)12log(m,|m(0)=m1)12log(m′′,′′|m(0)=m2)}\displaystyle\min_{m^{\prime},m^{\prime\prime},\ell^{\prime},\ell^{\prime\prime}}\Big{\{}d_{\rm B}(\vec{x}_{m^{\prime},\ell^{\prime}},\vec{x}_{m^{\prime\prime},\ell^{\prime\prime}},(P^{(j)})^{B})-\frac{1}{2}\log\mathbb{P}(m^{\prime},\ell^{\prime}|m^{(0)}=m_{1})-\frac{1}{2}\log\mathbb{P}(m^{\prime\prime},\ell^{\prime\prime}|m^{(0)}=m_{2})\Big{\}}
2log(M(B+1)).\displaystyle-2\log(M(B+1)).

Since (70) holds for jj (and for all (m,)(m^{\prime},\ell^{\prime}) and m1m_{1}), we can combine it with (67) to simplify the expression inside the minimization clause:

dB(xm,,xm′′,′′,(P(j))B)12log(m,|m=m1)12(m′′,′′|m=m2)\displaystyle d_{\rm B}(\vec{x}_{m^{\prime},\ell^{\prime}},\vec{x}_{m^{\prime\prime},\ell^{\prime\prime}},(P^{(j)})^{B})-\frac{1}{2}\log\mathbb{P}(m^{\prime},\ell^{\prime}|m=m_{1})-\frac{1}{2}\mathbb{P}(m^{\prime\prime},\ell^{\prime\prime}|m=m_{2})
\displaystyle\geq d((m,),(m′′,′′))+d((m1,B/2),(m,))+d((m2,B/2),(m,))\displaystyle d\big{(}(m^{\prime},\ell^{\prime}),(m^{\prime\prime},\ell^{\prime\prime})\big{)}+d\big{(}(m_{1},B/2),(m^{\prime},\ell^{\prime})\big{)}+d\big{(}(m_{2},B/2),(m^{\prime},\ell^{\prime})\big{)}
2jlog(M(B+1))2j|f|jlog(M1)\displaystyle-2j\log(M(B+1))-2j|f|-j\log(M-1)
\displaystyle\geq d((m1,B/2),(m2,B/2))2jlog(M(B+1))2j|f|jlog(M1)\displaystyle d\big{(}(m_{1},B/2),(m_{2},B/2)\big{)}-2j\log(M(B+1))-2j|f|-j\log(M-1) (74)
=\displaystyle= B|f|2jlog(M(B+1))2j|f|jlog(M1),\displaystyle B|f|-2j\log(M(B+1))-2j|f|-j\log(M-1), (75)

where (74) follows from non-negativity (first dd term) and the triangle inequality (second and third dd terms) for the pseudometric dd, and the last step uses the definition of dd with m1m2m_{1}\neq m_{2}. By substituting (75) into (72), we conclude that (71) holds for jj.

We now turn to the second dot point mentioned at the start of the proof. We will split this into two cases. We note that in (70) we are interested in the event that (m(j),(j))=(m,)(m^{(j)},\ell^{(j)})=(m^{\prime},\ell^{\prime}), and for this to happen, mm^{\prime} must be the value assigned to mm^{*} used in (63), i.e., mm^{\prime} must be the most likely message given y(j)\vec{y}^{(j)}. We will use this fact in both cases.

Case 1 (m=m1m^{\prime}=m_{1}). If =B/2\ell^{\prime}=B/2, then d((m1,B/2),(m,))=0d((m_{1},B/2),(m^{\prime},\ell^{\prime}))=0, so that the RHS of (70) is upper bounded by 0, and therefore (70) trivially holds. Therefore, we assume that <B/2\ell^{\prime}<B/2.

For each y(j+1)\vec{y}^{(j+1)} and each m′′m1m^{\prime\prime}\neq m_{1}, define

Lm′′(y(j+1))=(y(j+1)|m(0)=m1)(y(j+1)|m(0)=m′′).L_{m^{\prime\prime}}(\vec{y}^{(j+1)})=\frac{\mathbb{P}(\vec{y}^{(j+1)}|m^{(0)}=m_{1})}{\mathbb{P}(\vec{y}^{(j+1)}|m^{(0)}=m^{\prime\prime})}. (76)

By (63) and the assumption (j+1)<B/2\ell^{(j+1)}<B/2, there exists m′′m^{\prime\prime} (which can be chosen to be the second most likely value of m(0)m^{(0)} given (j+1)\ell^{(j+1)}, noting that the most likely value is m=m1m^{\prime}=m_{1}) such that

14|f|logLm′′(y(j+1))<(j+1)+1.\frac{1}{4|f|}\log L_{m^{\prime\prime}}(\vec{y}^{(j+1)})<\ell^{(j+1)}+1. (77)

Combining this with a union bound over all m′′m^{\prime\prime}, we have

(m(j+1)=m,(j+1)=|m(0)=m1)\displaystyle\mathbb{P}(m^{(j+1)}=m^{\prime},\ell^{(j+1)}=\ell^{\prime}|m^{(0)}=m_{1}) (78)
\displaystyle\leq (M1)maxm′′m1(Lm′′(y(j+1))<exp(4f(+1))|m(0)=m1).\displaystyle(M-1)\max_{m^{\prime\prime}\neq m_{1}}\mathbb{P}\left(L_{m^{\prime\prime}}(\vec{y}^{(j+1)})<\exp(4|f|(\ell^{\prime}+1))\,\Big{|}\,m^{(0)}=m_{1}\right). (79)

We can now conclude that

log(m(j+1)=m,(j+1)=|m(0)=m1)\displaystyle-\log\mathbb{P}(m^{(j+1)}=m^{\prime},\ell^{(j+1)}=\ell^{\prime}|m^{(0)}=m_{1}) (80)
\displaystyle\geq log(maxm′′m1(Lm′′(y(j+1))<exp(4f(+1))|m(0)=m1))log(M1)\displaystyle-\log\bigg{(}\max_{m^{\prime\prime}\neq m_{1}}\mathbb{P}\left(L_{m^{\prime\prime}}(\vec{y}^{(j+1)})<\exp(4|f|(\ell^{\prime}+1))\,\Big{|}\,m^{(0)}=m_{1}\right)\bigg{)}-\log(M-1)
\displaystyle\geq minm′′m1(dB((y(j+1)|m(0)=m1),(y(j+1)|m(0)=m′′)))2|f|(+1)log(M1)\displaystyle\min_{m^{\prime\prime}\neq m_{1}}\bigg{(}d_{\rm B}\left((\vec{y}^{(j+1)}|m^{(0)}=m_{1}),(\vec{y}^{(j+1)}|m^{(0)}=m^{\prime\prime})\right)\bigg{)}-2|f|(\ell^{\prime}+1)-\log(M-1) (81)
\displaystyle\geq B|f|2(j+1)log(M(B+1))2j|f|jlog(M1)2|f|(+1)log(M1)\displaystyle B|f|-2(j+1)\log(M(B+1))-2j|f|-j\log(M-1)-2|f|(\ell^{\prime}+1)-\log(M-1) (82)
=\displaystyle= (B2)|f|2(j+1)log(M(B+1))2(j+1)|f|(j+1)log(M1)\displaystyle(B-2\ell^{\prime})|f|-2(j+1)\log(M(B+1))-2(j+1)|f|-(j+1)\log(M-1) (83)
=\displaystyle= 2d((m1,B/2),(m,))2(j+1)log(M(B+1))2(j+1)|f|(j+1)log(M1).\displaystyle 2d\big{(}(m_{1},B/2),(m^{\prime},\ell^{\prime})\big{)}-2(j+1)\log(M(B+1))-2(j+1)|f|-(j+1)\log(M-1). (84)

where (81) follows from Lemma 45, (82) follows from the inductive assumption (71), and (84) follows from the definition of dd with m=m1m^{\prime}=m_{1}. Thus, (70) holds for j+1j+1 as desired.

Case 2 (mm1m^{\prime}\neq m_{1}). For each y(j+1)\vec{y}^{(j+1)}, define

L(y(j+1))=(y(j+1)|m(0)=m1)(y(j+1)|m(0)=m).L(\vec{y}^{(j+1)})=\frac{\mathbb{P}(\vec{y}^{(j+1)}|m^{(0)}=m_{1})}{\mathbb{P}(\vec{y}^{(j+1)}|m^{(0)}=m^{\prime})}. (85)

Let m′′m^{\prime\prime} (which may depend on y(j+1)\vec{y}^{(j+1)}) be the second most likely value of m(0)m^{(0)} (noting that mm^{\prime} is the most likely value). We then have

14|f|logL(y(j+1))14|f|log(y(j+1)|m(0)=m′′)(y(j+1)|m(0)=m)(j+1),\frac{1}{4|f|}\log L(\vec{y}^{(j+1)})\leq\frac{1}{4|f|}\log\frac{\mathbb{P}(\vec{y}^{(j+1)}|m^{(0)}=m^{\prime\prime})}{\mathbb{P}(\vec{y}^{(j+1)}|m^{(0)}=m^{\prime})}\leq-\ell^{(j+1)}, (86)

where the first inequality uses the fact that the likelihood for m1m_{1} is at most that of m′′m^{\prime\prime}, and the second inequality follows from (63) after lowing bounding the minimum therein by the second term.

We now proceed similarly to Case 1:

log(m(j+1)=m,(j+1)=|m(0)=m1)\displaystyle-\log\mathbb{P}(m^{(j+1)}=m^{\prime},\ell^{(j+1)}=\ell^{\prime}|m^{(0)}=m_{1})
\displaystyle\geq log(L(y(j+1))exp(4|f|)|m(0)=m1)\displaystyle-\log\mathbb{P}\left(L(\vec{y}^{(j+1)})\leq\exp(-4|f|\ell^{\prime})\,\Big{|}\,m^{(0)}=m_{1}\right)
\displaystyle\geq dB((y(j+1)|m(0)=m),(y(j+1)|m(0)=m1))+2|f|\displaystyle d_{\rm B}\left((\vec{y}^{(j+1)}|m^{(0)}=m^{\prime}),(\vec{y}^{(j+1)}|m^{(0)}=m_{1})\right)+2|f|\ell^{\prime}
\displaystyle\geq B|f|2(j+1)log(M(B+1))2j|f|jlog(M1)+2|f|\displaystyle B|f|-2(j+1)\log(M(B+1))-2j|f|-j\log(M-1)+2|f|\ell^{\prime}
=\displaystyle= (B+2)|f|2(j+1)log(M(B+1))2j|f|jlog(M1)\displaystyle(B+2\ell^{\prime})|f|-2(j+1)\log(M(B+1))-2j|f|-j\log(M-1)
\displaystyle\geq 2d((m,),(m1,B/2))2(j+1)log(M(B+1))2(j+1)|f|(j+1)log(M1),\displaystyle 2d\big{(}(m^{\prime},\ell^{\prime}),(m_{1},B/2)\big{)}-2(j+1)\log(M(B+1))-2(j+1)|f|-(j+1)\log(M-1), (88)

where each step follow similarly to its counterpart in (80)–(84). We have thus established that (70) holds for j+1j+1, which completes the proof of Lemma 71. ∎

The key result in Lemma 21 now immediately follows from (71) applied to j=|E|1j=|E|-1, and by recalling that the lemma is stated with respect to growing BB with fixed |E||E| and MM.

IV-D Comparison to Previous Work

In our previous paper [4], we proved the 2-hop version of Lemma 21. In [4], the problem is reduced to the case |𝒳P|=M|\mathcal{X}_{P}|=M (similarly to Lemma 23) and then handled as follows:

  • The relay node calculates the KL-divergence between the empirical distribution of the BB received symbols against the output distribution associated with each message.

  • The relay then sends a sorted sequence with more occurrences of the symbol whose associated KL-divergence is smallest, and an equal number of occurrences of the rest.

Our approach in the present paper is instead based on a likelihood ratio (see (63)), and upon calculating this, a sorted sequence is sent containing at most two distinct values.

At first glance, one may have expected that using likelihood ratios would be difficult, as there can be (M2)\binom{M}{2} likelihood ratios of interest. However, our approach here shows that using the likelihood ratios between only the two most likely values is sufficient. Overall, our approach here is more similar to the one used in [1] (for general binary-input channels rather than the BSC), which directly computes the likelihood ratio but only handles the 2-hop setting and requires M=2M=2.

V Proof of Theorem 3 (Main Achievability Bound)

In this section, we prove Theorem 3 in the general case, using the series case from Section IV as a stepping stone. We crucially make use of the flow decomposition theorem (Theorem 6) together with its associated paths p1,p2,,pkp_{1},p_{2},\ldots,p_{k} and flows f1,f2,,fkf_{1},f_{2},\ldots,f_{k}. The overall idea is to send conditionally independent messages along different paths and only accumulate them at the end.

V-A Edge-Disjoint Paths

The protocol is simpler when the paths p1,p2,,pkp_{1},p_{2},\ldots,p_{k} are edge-disjoint, so we will consider this case before turning to the general case.

The key idea is to transmit information across each path independently and aggregate them only at the end. Although we assume that paths are edge-disjoint, they may still share nodes. However, each node is designed to act independently on every path it is involved in, so that information entering a node in one path does not affect other outgoing paths.

Let y(1),,y(k)\vec{y}(1),\ldots,\vec{y}(k) be the sequences received by the destination node along the kk different paths after nn time steps, and let yall=(y(1),,y(k))\vec{y}_{\rm all}=(\vec{y}(1),\ldots,\vec{y}(k)). By Corollary 22, for all ii and all m1m2m_{1}\neq m_{2}, there exists a (single-path) protocol such that

dB((y(i)|m=m1),(y(i)|m=m2))(n|V|B)(|fi|oB(1)),d_{\rm B}\big{(}(\vec{y}(i)|m=m_{1}),(\vec{y}(i)|m=m_{2})\big{)}\geq(n-|V|B)(|f_{i}|-o_{B}(1)), (89)

where the |V||V| term comes from the fact that a simple path cannot have more than |V||V| edges. Therefore, again letting BB\rightarrow\infty, we have

dB((yall|m=m1),(yall|m=m2))\displaystyle d_{\rm B}\big{(}(\vec{y}_{\rm all}|m=m_{1}),(\vec{y}_{\rm all}|m=m_{2})\big{)} (90)
=\displaystyle= i=1kdB(y(i)|m=m1,y(i)|m=m2)\displaystyle\sum_{i=1}^{k}d_{\rm B}\big{(}\vec{y}(i)|m=m_{1},\vec{y}(i)|m=m_{2}\big{)}
\displaystyle\geq i=1k(n|V|B)(|fi|oB(1))\displaystyle\sum_{i=1}^{k}(n-|V|B)(|f_{i}|-o_{B}(1))
=\displaystyle= (n|V|B)(|f|oB(1)).\displaystyle(n-|V|B)(|f|-o_{B}(1)). (91)

Note that here kk is treated as a constant, because kk is a property of the graph GG so does not depend on BB.

Applying Lemma 9, we deduce that the total error exponent is bounded below by

Glim infn1n(n|V|B)(|f|oB(1))=|f|oB(1),\mathcal{E}_{G}\geq\liminf_{n\to\infty}-\frac{1}{n}(n-|V|B)(|f|-o_{B}(1))=|f|-o_{B}(1), (92)

which shows that error exponents arbitrarily close to |f||f| are achievable.

V-B The General Case

For the general case, we will still rely on the edge decomposition as before. Our approach is to reduce to the case where paths are edge-disjoint by considering the use of parallel edges, which are allowed to exist in all of our analysis and results. This idea has also been used previously when studying channel capacity results for general networks, e.g., see [11, Sec. 15.3]

We first make two observations:

  • Suppose that there is some edge ee such that PeP_{e} is a product channel, say Pe=Q1×Q2P_{e}=Q_{1}\times Q_{2}. We can delete ee and replace it by two parallel edges e1e_{1} and e2e_{2}, and put the channels Q1Q_{1} and Q2Q_{2} respectively and preserve the error exponent. This is because any symbol that can be sent across Q1×Q2Q_{1}\times Q_{2} can be separately sent across the two channels in an identical manner.

  • For any graph GG, let GG^{\prime} be the resultant graph formed by replacing each PeP_{e} with its \ell-fold product PeP_{e}^{\ell}. Then GG\mathcal{E}_{G^{\prime}}\leq\ell\mathcal{E}_{G}. To see this, observe that any protocol with nn time steps over GG^{\prime} can be executed with nn\cdot\ell time steps over GG.

These two observations allow us to do the following:

  • Pick a large block size BB, and replace every channel PeP_{e} with the (B+k)(B+k)-fold product channel PeB+kP_{e}^{B+k}

  • For each edge ee, let

    Se={i:epi}S_{e}=\{i:e\in p_{i}\} (93)

    be the indices of the paths that pass through ee. For each iSei\in S_{e}, define

    Bi=fijSefjB.B_{i}=\Bigg{\lceil}\frac{f_{i}}{\sum_{j\in S_{e}}f_{j}}\cdot B\Bigg{\rceil}. (94)

    Since there are only kk paths, we have |Se|k|S_{e}|\leq k. Moreover, we have

    iSeBi=iSefijSefjBiSe(fijSefjB+1)=B+|Se|B+k.\sum_{i\in S_{e}}B_{i}=\sum_{i\in S_{e}}\Bigg{\lceil}\frac{f_{i}}{\sum_{j\in S_{e}}f_{j}}\cdot B\Bigg{\rceil}\leq\sum_{i\in S_{e}}\Bigg{(}\frac{f_{i}}{\sum_{j\in S_{e}}f_{j}}\cdot B+1\Bigg{)}=B+|S_{e}|\leq B+k. (95)

    For each edge ee with corresponding channel PeB+kP_{e}^{B+k}, replace this by |Se||S_{e}| parallel edges, where the corresponding channels are PeBiP_{e}^{B_{i}} for each iSei\in S_{e}. We will refer to each of these new edges by eie^{\prime}_{i}. This is possible since iSeBiB+k\sum_{i\in S_{e}}B_{i}\leq B+k. If iSeBi<B+k\sum_{i\in S_{e}}B_{i}<B+k, then we simply add a dummy (unused) channel to make up for the difference. Let GG^{\prime} denote this new graph.

Observe that after the transformation, the capacities of the corresponding parallel edges are

fijSefjBeBfi,\Bigg{\lceil}\frac{f_{i}}{\sum_{j\in S_{e}}f_{j}}\cdot B\Bigg{\rceil}\cdot\mathcal{E}_{e}\geq B\cdot f_{i}, (96)

noting that jSefj=fee\sum_{j\in S_{e}}f_{j}=f_{e}\leq\mathcal{E}_{e} by the flow constraints.

Let pip^{\prime}_{i} be the path in GG^{\prime} formed by joining up edges of the form eie^{\prime}_{i} for each epie\in p_{i}. Since pip_{i} is a path and eie^{\prime}_{i} joins the same pair of vertices as ee, pip^{\prime}_{i} is also a path. By (96), each of these edges eie^{\prime}_{i} has error exponent at least BfiB\cdot f_{i}, so that the capacity on path pip^{\prime}_{i} is at least BfiB\cdot f_{i}. Thus, there exist edge-disjoint paths p1,p2,,pkp^{\prime}_{1},p^{\prime}_{2},\ldots,p^{\prime}_{k} on GG^{\prime} such that the capacity on each path pip^{\prime}_{i} is at least BfiB\cdot f_{i}.

We have now reduced the problem to the edge-disjoint case studied in the previous subsection, and we conclude that we can achieve error exponents arbitrarily close to B|f|B|f|. Since time was scaled by a factor of B+kB+k, we conclude that G(B+k)G\mathcal{E}_{G^{\prime}}\leq(B+k)\mathcal{E}_{G} and therefore

G1(B+k)G=B|f|(B+k)\mathcal{E}_{G}\geq\frac{1}{(B+k)}\mathcal{E}_{G^{\prime}}=\frac{B|f|}{(B+k)} (97)

Taking BB\rightarrow\infty, we conclude that G|f|\mathcal{E}_{G}\geq|f|.

VI Proof of Theorem 4 (Main Converse Bound)

The proof of our converse bound consists of using the max-flow min-cut duality theorem (Theorem 5) to construct a cut with the same value as |f||f|. We will additionally use the following lemma, where we recall that M,Pf\mathcal{E}^{f}_{M,P} denotes the 1-hop error exponent of a channel with full feedback:

Lemma 25.

Consider any partition (A,B)(A,B) of the nodes in GG with the source node in AA and the destination node in BB. Let P(1),P(2),,P(k)P^{(1)},P^{(2)},\ldots,P^{(k)} be the channels associated with edges from AA to BB, and let P^=P(1)×P(2)××P(k)\hat{P}=P^{(1)}\times P^{(2)}\times\ldots\times P^{(k)} be the product channel. Then for all MM,

M,GM,P^f.\mathcal{E}_{M,G}\leq\mathcal{E}^{f}_{M,\hat{P}}. (98)
Proof.

We compress AA into a single node by allowing infinite information transfer between different nodes inside AA, and do the same for BB. Clearly, this modification cannot decrease the overall error exponent.

At first glance, it may appear that the net result of doing this is simply allowing AA to send information to BB via P(1),P(2),,P(k)P^{(1)},P^{(2)},\ldots,P^{(k)}, each a total of nn times, which is the same as using P^\hat{P} for nn times. However, we must also account for the possibility of back-edges, which amounts to having a certain degree of feedback. In view of this, since any protocol that can be executed under the original setup can also be executed under the new setup with nn uses of P^\hat{P} with full feedback, we conclude that M,GM,P^f\mathcal{E}_{M,G}\leq\mathcal{E}^{f}_{M,\hat{P}}. ∎

Since M,Pf\mathcal{E}_{M,P}^{f} is a decreasing function of MM, M,Pf2,Pf\mathcal{E}_{M,P}^{f}\leq\mathcal{E}_{2,P}^{f}. Furthermore, feedback does not improve the error exponent when M=2M=2 (Theorem 43), and thus Lemma 25 immediately gives the following corollary:

Corollary 26.

Under the definition of P^\hat{P} in Lemma 25, we have

M,G2,P^.\mathcal{E}_{M,G}\leq\mathcal{E}_{2,\hat{P}}. (99)

We can now use maxflow-mincut duality (Theorem 5) and consider a cut (A,B)(A,B) such such that AA contains the source, BB contains the sink, and

i=1k2,P(i)=|f|,\sum_{i=1}^{k}\mathcal{E}_{2,P^{(i)}}=|f|, (100)

where P(1),P(2),,P(k)P^{(1)},P^{(2)},\ldots,P^{(k)} are the channels associated with edges that begin in AA and end in BB. Letting P^=P(1)×P(2)××P(k)\hat{P}=P^{(1)}\times P^{(2)}\times\ldots\times P^{(k)} be the product channel, Lemma 31 gives

2,P^i2,P(i)\mathcal{E}_{2,\hat{P}}\leq\sum_{i}\mathcal{E}_{2,P^{(i)}} (101)

which implies

M,G2,P^i2,P(i)=|f|\mathcal{E}_{M,G}\leq\mathcal{E}_{2,\hat{P}}\leq\sum_{i}\mathcal{E}_{2,P^{(i)}}=|f| (102)

as required.

VII Achievability Bounds for General Channels

In this section, we study achievability bounds in the case that the channels need not be pairwise reversible.

VII-A Constant Number of Messages

When the pairwise reversible assumption is dropped, Theorem 3 can fail to hold even in the 2-hop setting corresponding to a graph with 3 nodes and 2 edges [4, Thm. 3]. However, it is possible to prove some weaker achievability bounds. Recalling ~\tilde{\mathcal{E}} from Definition 13, we state the following achievability bound for general channels:

Lemma 27.

Given an arbitrary graph GG, let |f~||\tilde{f}| be the maxflow formed when the capacity of every edge ee is set to ~Pe\tilde{\mathcal{E}}_{P_{e}}. Then

G|f~|\mathcal{E}_{G}\geq|\tilde{f}| (103)
Proof.

We use the same proof as the pairwise reversible case. The only use of the pairwise reversibility condition is in Lemma 23 when constructing the codewords x1(j),x2(j),,xM(j)\vec{x}^{(j)}_{1},\vec{x}^{(j)}_{2},\ldots,\vec{x}^{(j)}_{M} of length =M!\ell=M! such that dB(xm1(j),xm2(j),(P(j)))M,P(i)d_{\rm B}(\vec{x}^{(j)}_{m_{1}},\vec{x}^{(j)}_{m_{2}},(P^{(j)})^{\ell})\geq\ell\cdot\mathcal{E}_{M,P^{(i)}}. However, such codewords of length =M!\ell=M! with dB(xm1(j),xm2(j),(P(j)))~M,P(i)d_{\rm B}(\vec{x}^{(j)}_{m_{1}},\vec{x}^{(j)}_{m_{2}},(P^{(j)})^{\ell})\geq\ell\cdot\tilde{\mathcal{E}}_{M,P^{(i)}} (i.e., with ~\tilde{\mathcal{E}} in place of \mathcal{E}) are still guaranteed to exist by Lemma 29 even when the pairwise reversibility condition is dropped. Outside of Lemma 23, the proof follows with M,P(i)\mathcal{E}_{M,P^{(i)}} replaced by ~M,P(i)\tilde{\mathcal{E}}_{M,P^{(i)}} where appropriate. ∎

VII-B Zero-Rate Error Exponents

In this subsection, we prove Theorem 7. We start by relating ~M,P\tilde{\mathcal{E}}_{M,P} to the zero-rate error exponent:

Lemma 28.

For any M2M\geq 2, we have

~M,PEP(0)M1M~M,P.\tilde{\mathcal{E}}_{M,P}\geq E_{P}(0)\geq\frac{M-1}{M}\tilde{\mathcal{E}}_{M,P}. (104)

By taking MM\rightarrow\infty on both sides, we also have

limM~M,P=EP(0).\lim_{M\rightarrow\infty}\tilde{\mathcal{E}}_{M,P}=E_{P}(0). (105)
Proof.

We use the expression for EP(0)E_{P}(0) in Theorem 26. Let qq maximize the expression in (26). Draw (x1,x2,,xM)(x_{1},x_{2},\ldots,x_{M}) independently according to distribution qq, and observe that the definition of ~\tilde{\mathcal{E}} in Definition 13 gives

~M,P𝔼(2M(M1)1m1<m2MdB(xm1,xm2,P))=EP(0).\tilde{\mathcal{E}}_{M,P}\geq\mathbb{E}\left(\frac{2}{M(M-1)}\sum_{1\leq m_{1}<m_{2}\leq M}d_{\rm B}(x_{m_{1}},x_{m_{2}},P)\right)=E_{P}(0). (106)

To show the second inequality in the lemma statement, let qq be the empirical distribution (type) of (x1,x2,,xM)(x_{1},x_{2},\ldots,x_{M}) achieving (28). Substituting qq into Theorem 26, we have

EP(0)\displaystyle E_{P}(0) \displaystyle\geq x,xqxqxdB(x,x,P)\displaystyle\sum_{x,x^{\prime}}q_{x}q_{x^{\prime}}d_{\rm B}(x,x^{\prime},P) (107)
=\displaystyle= 1M21m1,m2MdB(xm1,xm2,P)\displaystyle\frac{1}{M^{2}}\sum_{1\leq m_{1},m_{2}\leq M}d_{\rm B}(x_{m_{1}},x_{m_{2}},P) (108)
=\displaystyle= 2M21m1<m2MdB(xm1,xm2,P)\displaystyle\frac{2}{M^{2}}\sum_{1\leq m_{1}<m_{2}\leq M}d_{\rm B}(x_{m_{1}},x_{m_{2}},P) (109)
=(28)\displaystyle\stackrel{{\scriptstyle\eqref{eq:rev_rate}}}{{=}} M1M~M,P.\displaystyle\frac{M-1}{M}\tilde{\mathcal{E}}_{M,P}. (110)

We now prove Theorem 4 by devising a protocol whose error exponent is arbitrarily close to |f||f|, where the edge weights in ff are the respective 1-hop zero-rate error exponents (i.e., EPe(0)E_{P_{e}}(0) for edge ee).

Let MM be a (large) constant, and let |f~M||\tilde{f}_{M}| be the error exponent formed by setting the capacity of each edge ee to ~M,Pe\tilde{\mathcal{E}}_{M,P_{e}}. Since ~M,PeEPe(0)\tilde{\mathcal{E}}_{M,P_{e}}\geq E_{P_{e}}(0) for all PP, we have |f~M||f||\tilde{f}_{M}|\geq|f|.

Building on Sections IV and V, choose a (large) block size BB, and generate a sequential block transmission protocol of length BB with MM messages. We can treat a single sequential block transmission as a discrete memoryless channel QB,MQ_{B,M}. More precisely, the input to QB,MQ_{B,M} is the message that the source node receives, and the output of QB,MQ_{B,M} is what the destination receives from a single block. The analysis in Sections IV and V shows that given any ϵ>0\epsilon>0, there is some sufficiently large BB such that

dB(m1,m2,QB,M)B(|f~M|ϵ)d_{\rm B}(m_{1},m_{2},Q_{B,M})\geq B(|\tilde{f}_{M}|-\epsilon) (111)

for all m1m2m_{1}\neq m_{2}. Substituting into (13), it follows that

~M,QB,MB(|f~M|ϵ).\tilde{\mathcal{E}}_{M,Q_{B,M}}\geq B(|\tilde{f}_{M}|-\epsilon). (112)

We now proceed to bound the zero-rate error exponent of QBQ_{B}. By Lemma 105,

EQB,M(0)M1M~M,QBM1MB(|f~M|ϵ)M1MB(|f|ϵ)E_{Q_{B,M}}(0)\geq\frac{M-1}{M}\tilde{\mathcal{E}}_{M,Q_{B}}\geq\frac{M-1}{M}B(|\tilde{f}_{M}|-\epsilon)\geq\frac{M-1}{M}B(|f|-\epsilon) (113)

Similar to the proof of Corollary 22, we can run n/B|V|n/B-|V| conditionally independent copies of the block protocol in nn time steps, which is equivalent to using channel QBQ_{B} for a total of n/B|V|n/B-|V| times. By adopting any coding strategy for QBQ_{B} that attains its zero-rate error exponent,555Such a strategy may be very complex, but we recall that our focus is on information-theoretic limits and not practical protocols. we deduce that

EG(0)limn1nn|V|BBM1MB(|f|ϵ)=M1M(|f|ϵ).E_{G}(0)\geq\lim_{n\rightarrow\infty}-\frac{1}{n}\cdot\frac{n-|V|B}{B}\cdot\frac{M-1}{M}\cdot B(|f|-\epsilon)=\frac{M-1}{M}\cdot(|f|-\epsilon). (114)

Since ϵ\epsilon is arbitrarily small and MM is arbitrarily large, we conclude that

EG(0)|f|E_{G}(0)\geq|f| (115)

as required.

VIII Further Results

In this section, we provide some additional results that address when it is (or is not) possible to relax the assumptions made in our main results.

VIII-A Converse Bound for M>2M>2

Recall that |fM,G||f_{M,G}| denotes the maxflow for the graph GG with MM messages. One might hope that the proof of Theorem 4 can be modified to prove a converse bound of M,G|fM,G|\mathcal{E}_{M,G}\leq|f_{M,G}| (instead of |f2,G||f_{2,G}|). However, there are at least two problems in doing so:

  • We cannot guarantee that the 1-hop feedback and non-feedback error exponents are equal when M>2M>2;

  • The required analog of Lemma 31 may not hold when M=2M=2.

In this subsection, we partially address these issues by imposing additional assumptions. Specifically, we prove the following:

Theorem 29.

Suppose that the following conditions are satisfied:

  • GG has a mincut with no back-edges (i.e., all edges are from the source side to the destination side);

  • Every channel in GG is pairwise reversible.

Then, we have M,G=|fM,G|\mathcal{E}_{M,G}=|f_{M,G}|.

We proceed with the proof. Although the bound in Lemma 25 uses the feedback error exponent, we do not need to consider feedback if the partition has no back-edges. We immediately deduce the following:

Lemma 30.

Consider any partition (A,B)(A,B) of the nodes in GG with the source node in AA and the destination node in BB. Let P(1),P(2),,P(k)P^{(1)},P^{(2)},\ldots,P^{(k)} be the channels associated with edges from AA to BB, and let P^=P(1)×P(2)××P(k)\hat{P}=P^{(1)}\times P^{(2)}\times\ldots\times P^{(k)} be the product channel. If there are no back-edges, then for all MM, we have

M,GM,P^.\mathcal{E}_{M,G}\leq\mathcal{E}_{M,\hat{P}}. (116)

Starting from a mincut with no back-edges, let P(1),P(2),,P(k)P^{(1)},P^{(2)},\ldots,P^{(k)} be the channels going across the cut. Due to our pairwise reversible assumption, we may apply Lemma 38 to obtain

M,P^=iM,P(i)=|fM,G|,\mathcal{E}_{M,\hat{P}}=\sum_{i}\mathcal{E}_{M,P^{(i)}}=|f_{M,G}|, (117)

where the last step holds by maxflow-mincut duality. This completes the converse part of Theorem 29. The achievability part is an immediate consequence of Theorem 3.

Next, for zero-rate error exponents, we show that the pairwise reversibility condition can be dropped:

Lemma 31.

Suppose that GG has a mincut with no back-edges. Then, we have EG(0)=|f|E_{G}(0)=|f| where |f||f| is the maxflow formed by setting the capacity of every edge ee to EPe(0)E_{P_{e}}(0).

Proof.

Similar to (117), the proof of Lemma 38 shows that

~M,P^=i~M,P(i).\tilde{\mathcal{E}}_{M,\hat{P}}=\sum_{i}\tilde{\mathcal{E}}_{M,P^{(i)}}. (118)

Taking MM\rightarrow\infty and applying (105), we obtain

EP^(0)=iEP(i)(0).E_{\hat{P}}(0)=\sum_{i}E_{P^{(i)}}(0). (119)

Under the same conditions as Lemma 116, we can similarly conclude that EG(0)EP^(0)E_{G}(0)\leq E_{\hat{P}}(0), and therefore

EG(0)iEP(i)(0),E_{G}(0)\leq\sum_{i}E_{P^{(i)}}(0), (120)

which completes the converse part. The achievability part is an immediate consequence of Theorem 7. ∎

VIII-B Approximation Guarantees

Lemma 27 and Theorem 4 reveal that for any network GG and any MM, we have

|f~M,G|M,G|f2,G|,|\tilde{f}_{M,G}|\leq\mathcal{E}_{M,G}\leq|f_{2,G}|, (121)

where we recall that f~M,G\tilde{f}_{M,G} uses (28) for the edge weights in the maxflow calculation. We will proceed to show that this is in fact a 4-approximation:

Lemma 32.

For any MM and GG, we have

|f2,G|4|f~M,G|.|f_{2,G}|\leq 4{|\tilde{f}_{M,G}|}. (122)
Proof.

It is sufficient to show that 2,P4~M,P\mathcal{E}_{2,P}\leq 4\tilde{\mathcal{E}}_{M,P}, as this means that f2,Gf_{2,G} and f~M,G\tilde{f}_{M,G} are defined with respect to edge weights that differ by at most a factor of 4. We have

2,P\displaystyle\mathcal{E}_{2,P} =\displaystyle= maxx,xdC(x,x,P)\displaystyle\max_{x,x^{\prime}}d_{\rm C}(x,x^{\prime},P) (123)
=\displaystyle= maxx,xmaxs[0,1]dC(x,x,P,s)\displaystyle\max_{x,x^{\prime}}\max_{s\in[0,1]}d_{\rm C}(x,x^{\prime},P,s) (124)
\displaystyle\leq 2maxx,xdC(x,x,P,1/2)\displaystyle 2\max_{x,x^{\prime}}d_{\rm C}(x,x^{\prime},P,1/2) (125)
=\displaystyle= 2~2,P,\displaystyle 2\tilde{\mathcal{E}}_{2,P}, (126)

where (123) follows from (32), (125) follows by using the concavity of dCd_{\rm C} in ss [13, Thm. 5] and considering an equal combination of the values at ss and 1s1-s (along with dC0d_{\rm C}\geq 0 for the latter), and (126) follows from (28).

Next, let (x,x)(x,x^{\prime}) be chosen to maximize dB(x,x,P,1/2)d_{\rm B}(x,x^{\prime},P,1/2). By considering the distribution assigning a mass of 1/21/2 to xx and 1/21/2 to xx^{\prime}, we have

~M,PEP(0)12dB(x,x,P,1/2)=12~2,P,\tilde{\mathcal{E}}_{M,P}\geq E_{P}(0)\geq\frac{1}{2}d_{\rm B}(x,x^{\prime},P,1/2)=\frac{1}{2}\tilde{\mathcal{E}}_{2,P}, (127)

where we first applied Lemma 105, followed by (26) with the maximum replaced by the above-mentioned distribution, then (28) with M=2M=2. Combining equations (126) and (127) gives Lemma 122. ∎

The next lemma gives conditions under which a tighter approximation is possible:

Lemma 33.

Suppose that either of the following conditions hold:

  • M=2M=2; or

  • Every channel in GG is pairwise reversible.

Then, we have a 2-approximation in (121), i.e., |f2,G|2|f~M,G||f_{2,G}|\leq 2{|\tilde{f}_{M,G}|}.

Proof.

The case M=2M=2 follows immediately from (126). Moreover, if every channel is pairwise reversible, then ~2,P=2,P\tilde{\mathcal{E}}_{2,P}=\mathcal{E}_{2,P}, so the 2-approximation result follows from (127). ∎

We also identify another scenario where our bounds are exactly tight. Consider the KK-ary symmetric channel with inputs {1,2,,K}\{1,2,\ldots,K\}, outputs {1,2,,K}\{1,2,\ldots,K\}, and

P(y|x)={1(K1)py=xpyxP(y|x)=\begin{cases}1-(K-1)p&y=x\\ p&y\neq x\end{cases} (128)

for some p(0,1K1)p\in\big{(}0,\frac{1}{K-1}\big{)}.

Lemma 34.

Suppose that every channel is a KK-ary symmetric channel for some KMK\geq M (different channels can have different values of KK and different noise levels). Then |f~M,G|=|f2,G||\tilde{f}_{M,G}|=|f_{2,G}|, and the protocol given in Section V provides the achievability part.

Proof.

Similarly to the earlier results in this subsection, we simply need to show that ~M,Pe=2,Pe\tilde{\mathcal{E}}_{M,P_{e}}=\mathcal{E}_{2,P_{e}} for each PeP_{e}. Recall the definition of ~\tilde{\mathcal{E}}:

~M,Pe=2M(M1)max(x1,,xM)1m1m2KdB(xm1,xm2,Pe).\tilde{\mathcal{E}}_{M,P_{e}}=\frac{2}{M(M-1)}\max_{(x_{1},\ldots,x_{M})}\sum_{1\leq m_{1}\leq m_{2}\leq K}d_{\rm B}(x_{m_{1}},x_{m_{2}},P_{e}). (129)

For the KK-ary symmetric channel, dB(x,x,Pe)d_{\rm B}(x,x^{\prime},P_{e}) ie either 0 (if x=xx=x^{\prime}) or a fixed positive value (if xxx\neq x^{\prime}). Since KMK\geq M, we can achieve the maximum in (129) by letting x1,,xMx_{1},\ldots,x_{M} be distinct, yielding

~M,Pe=log(2(1(M1)p)p+(M2)p).\tilde{\mathcal{E}}_{M,P_{e}}=-\log(2\sqrt{(1-(M-1)p)\cdot p}+(M-2)p). (130)

The computation for 2,Pe\mathcal{E}_{2,P_{e}} is also straightforward; since PeP_{e} is pairwise reversible, we can simply choose any two distinct inputs to obtain

2,Pe=maxx1,x2dB(x1,x2,Pe)=log(2(1(M1)p)p+(M2)p).\mathcal{E}_{2,P_{e}}=\max_{x_{1},x_{2}}d_{\rm B}(x_{1},x_{2},P_{e})=-\log(2\sqrt{(1-(M-1)p)\cdot p}+(M-2)p). (131)

Since ~M,Pe=2,Pe\tilde{\mathcal{E}}_{M,P_{e}}=\mathcal{E}_{2,P_{e}} for all edges, it follows that |f~M,G|=|f2,G||\tilde{f}_{M,G}|=|f_{2,G}|. ∎

VIII-C Counterexample to a Natural Conjecture

In Theorem 29, we assumed that GG has a mincut with no back-edges. It might be tempting to believe that this is the case whenever GG contains no directed cycles. However, this is false, as we can see in Figure 2, with the mincut described in the figure caption.

33 11 44 22
Figure 2: An example where the only mincut (namely, A={1,3}A=\{1,3\} and B={2,4}B=\{2,4\}) contains a back-edge. Solid arrows have infinite capacity and dotted arrows have finite capacity.

Using the example in Figure 2, we show that even when every channel is pairwise reversible and the graph is acyclic, it is possible to achieve an error exponent greater than |fM,G||f_{M,G}| when M=3M=3:

Theorem 35.

Even if we assume that GG is acyclic and every channel is pairwise reversible, there exist scenarios in which M,G>|fM,G|\mathcal{E}_{M,G}>|f_{M,G}| when M=3M=3. Moreover, the same statement holds even when the 1-hop error exponents defining fM,Gf_{M,G} are replaced by their counterparts with full feedback.

Proof.

Consider the graph in Figure 2. Set M=3M=3, and consider the setup described as follows for some p(0,13)p\in\big{(}0,\frac{1}{3}\big{)}:

  • On the solid edges, infinite noiseless communication is allowed.

  • On the edge 121\rightarrow 2, we consider a ternary symmetric channel with inputs {0,1,2}\{0,1,2\}, outputs {0,1,2}\{0,1,2\}, and with (y|x)=12p\mathbb{P}(y|x)=1-2p if y=xy=x, and (y|x)=p\mathbb{P}(y|x)=p otherwise.

  • On the edge 343\rightarrow 4, we consider a binary symmetric channel with noise parameter pp.

We will prove that as pp becomes sufficiently small, it is possible to beat the maxflow rate. For the rest of this section, asymptotic notation refers to the limit as p0p\rightarrow 0.

Let us first compute the error exponents of all the channels. Since all channels are pairwise reversible, we may compute the error exponents using M,P=~M,P\mathcal{E}_{M,P}=\tilde{\mathcal{E}}_{M,P} as per (28). For the ternary symmetric channel, (28) is maximized when x1,x2,x3x_{1},x_{2},x_{3} are all distinct, so the error exponent is given by

=log(2p(12p)+p)=12log1p+𝒪(1).\mathcal{E}=-\log(2\sqrt{p(1-2p)}+p)=\frac{1}{2}\log\frac{1}{p}+\mathcal{O}(1). (132)

For the binary symmetric channel, only two of x1,x2,x3x_{1},x_{2},x_{3} can be distinct, so the error exponent is given by

=23log(2p(1p))=13log1p+𝒪(1).\mathcal{E}=-\frac{2}{3}\log(2\sqrt{p(1-p)})=\frac{1}{3}\log\frac{1}{p}+\mathcal{O}(1). (133)

Hence, the maxflow is simply the sum of the two error exponents, i.e., 56log1p+𝒪(1)\frac{5}{6}\log\frac{1}{p}+\mathcal{O}(1).

We will prove that log1p+𝒪(1)\log\frac{1}{p}+\mathcal{O}(1) is an achievable error exponent, thus disproving the reasonable conjecture that the maxflow is an upper bound to the error exponent. Similar to Section IV, define a single “sequential transmission” as follows:

  • Node 1 inputs the true message mm into its channel connecting to node 2, and also sends mm to node 3;

  • Node 2 then tells both nodes 3 and 4 what it received;

  • Node 3 then looks at both the ground truth from node 1 and the forwarded value from node 2. If they are the same, message 0 is sent to node 4, and otherwise, message 11 is sent to node 4.

A total of n3n-3 such transmissions can occur in the nn time steps by chaining them one after the other (similar to Figure 1). Moreover, we can view a single sequential transmission as a discrete memoryless channel QQ, where the input is mm, and the output consists of what node 4 receives from both nodes 2 and 3.

We can classify the result of each sequential transmission into three possible outcomes:

  • (i)

    Node 4 receives message 1 from node 3. This means that either node 2 received the wrong message (and therefore node 3 sent 1), or the message from node 3 to node 4 got flipped. Each event happens with probability 𝒪(p)\mathcal{O}(p), so the total probability of this is 𝒪(p)\mathcal{O}(p).

  • (ii)

    Node 4 receives message 0 from node 3, and receives the wrong message from node 2. In this case, both node the messages from 1 to 2 and 3 to 4 must be corrupted, so this happens with probability 𝒪(p2)\mathcal{O}(p^{2})

  • (iii)

    Node 4 receives message 0 from node 3, and receives the correct message from node 2. This occurs with probability 1𝒪(p)1-\mathcal{O}(p).

We summarize the end-to-end transition probabilities in Figure 3:

output
(0,0) (1,0) (2,0) (*,1)
input 0 1𝒪(p)1-\mathcal{O}(p) 𝒪(p2)\mathcal{O}(p^{2}) 𝒪(p2)\mathcal{O}(p^{2}) 𝒪(p)\mathcal{O}(p)
1 𝒪(p2)\mathcal{O}(p^{2}) 1𝒪(p)1-\mathcal{O}(p) 𝒪(p2)\mathcal{O}(p^{2}) 𝒪(p)\mathcal{O}(p)
2 𝒪(p2)\mathcal{O}(p^{2}) 𝒪(p2)\mathcal{O}(p^{2}) 1𝒪(p)1-\mathcal{O}(p) 𝒪(p)\mathcal{O}(p)
Figure 3: The input represents the message received by node 0. An output of (a,b)(a,b) indicates that aa was received from node 2 and bb was received from node 3. (,1)(*,1) indicates that 1 is received from node 3, with arbitrary (ignored) information from node 2.

Since we are allowed n3n-3 uses of QQ in nn time steps, we have 3,G3,Q\mathcal{E}_{3,G}\geq\mathcal{E}_{3,Q}. We now proceed to bound 3,Q\mathcal{E}_{3,Q}; we have

dB(0,1,Q)=logyQ0(y)Q1(y)=log𝒪(p2)=log1p+𝒪(1)d_{\rm B}(0,1,Q)=-\log\sum_{y}\sqrt{Q_{0}(y)Q_{1}(y)}=-\log\sqrt{\mathcal{O}(p^{2})}=\log\frac{1}{p}+\mathcal{O}(1) (134)

and similarly when (0,1)(0,1) is replaced by any two distinct symbols. Using Lemma 12, this gives

3,Qlog1p+O(1),\mathcal{E}_{3,Q}\geq\log\frac{1}{p}+O(1), (135)

showing that an error exponent of log1p+O(1)\log\frac{1}{p}+O(1) is achievable.

This same counterexample in fact also establishes the second part of Theorem 35 in which the mincut is defined with respect to edge capacities Pef\mathcal{E}^{f}_{P_{e}} instead of Pe\mathcal{E}_{P_{e}}. To see this, first note that for the ternary symmetric channel, we have 3=2\mathcal{E}_{3}=\mathcal{E}_{2}; this follows from the proof of Lemma 34, with the ternary symmetric channel being pairwise reversible. On the other hand, 2=2f\mathcal{E}_{2}=\mathcal{E}^{f}_{2} (Theorem 43), 2f3f\mathcal{E}^{f}_{2}\geq\mathcal{E}^{f}_{3} (error exponent is decreasing in MM) and 3f3\mathcal{E}^{f}_{3}\geq\mathcal{E}_{3} (feedback cannot decrease error exponents). Therefore, the ternary symmetric channel has 2=2f3f3=2\mathcal{E}_{2}=\mathcal{E}^{f}_{2}\geq\mathcal{E}^{f}_{3}\geq\mathcal{E}_{3}=\mathcal{E}_{2}; this means that all the inequalities must hold with equality, yielding

3f=3=2=log(2p(12p)+pp)=12log1p+𝒪(1).\mathcal{E}^{f}_{3}=\mathcal{E}_{3}=\mathcal{E}_{2}=-\log(2\sqrt{p(1-2p)}+\sqrt{p\cdot p})=\frac{1}{2}\log\frac{1}{p}+\mathcal{O}(1). (136)

For the binary symmetric channel, the feedback error exponent for M=3M=3 is given as follows [16, o. 64]:

3f=log(p1/3(1p)2/3+p2/3(1p)1/3)=13log1p+𝒪(1),\mathcal{E}^{f}_{3}=-\log(p^{1/3}(1-p)^{2/3}+p^{2/3}(1-p)^{1/3})=\frac{1}{3}\log\frac{1}{p}+\mathcal{O}(1), (137)

which gives a maxflow of 56log1p+𝒪(1)\frac{5}{6}\log\frac{1}{p}+\mathcal{O}(1). Once again, since this is strictly below log1p+𝒪(1)\log\frac{1}{p}+\mathcal{O}(1) for small enough pp, we obtain the desired result.

References

  • [1] W. Huleihel, Y. Polyanskiy, and O. Shayevitz, “Relaying one bit across a tandem of binary-symmetric channels,” IEEE International Symposium on Information Theory (ISIT), 2019.
  • [2] V. Jog and P. L. Loh, “Teaching and learning in uncertainty,” IEEE Transactions on Information Theory, vol. 67, no. 1, pp. 598–615, 2021.
  • [3] Y. H. Ling and J. Scarlett, “Optimal rates of teaching and learning under uncertainty,” IEEE Transactions on Information Theory, vol. 61, no. 11, pp. 7067–7080, 2021.
  • [4] ——, “Multi-bit relaying over a tandem of channels,” IEEE Transactions on Information Theory (to appear), 2023.
  • [5] S. Rajagopalan and L. Schulman, “A coding theorem for distributed computation,” ACM Symposium on Theory of Computing, 1994.
  • [6] Y. H. Ling and J. Scarlett, “Simple coding techniques for many-hop relaying,” IEEE Transactions on Information Theory, vol. 68, no. 11, pp. 7043–7053, 2022.
  • [7] E. Domanovitz, T. Philosof, and A. Khina, “The information velocity of packet-erasure links,” in IEEE Conference on Computer Communications, 2022.
  • [8] W. Evans, C. Kenyon, Y. Peres, and L. J. Schulman, “Broadcasting on trees and the Ising model,” Annals of Applied Probability, vol. 10, no. 2, pp. 410–433, 2000.
  • [9] A. Makur, E. Mossel, and Y. Polyanskiy, “Broadcasting on random directed acyclic graphs,” IEEE Transactions on Information Theory, vol. 66, no. 2, pp. 780–812, 2020.
  • [10] ——, “Broadcasting on two-dimensional regular grids,” IEEE Transactions on Information Theory, vol. 68, no. 10, pp. 6297–6334, 2022.
  • [11] A. El Gamal and Y.-H. Kim, Network information theory.   Cambridge University Press, 2011.
  • [12] T. M. Cover and J. A. Thomas, Elements of information theory.   John Wiley & Sons, Inc., 2006.
  • [13] C. Shannon, R. Gallager, and E. Berlekamp, “Lower bounds to error probability for coding on discrete memoryless channels. i,” Information and Control, vol. 10, no. 1, p. 65–103, 1967.
  • [14] ——, “Lower bounds to error probability for coding on discrete memoryless channels. ii,” Information and Control, vol. 10, no. 5, p. 522–552, 1967.
  • [15] L. Trevisan, “Stanford CS261 lecture notes,” 2011, https://theory.stanford.edu/ trevisan/cs261/lecture11.pdf.
  • [16] E. R. Berlekamp, “Block coding with noiseless feedback,” Ph.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering, 1964.
  • [17] B. Nakiboglu, “Exponential bounds on error probability with feedback,” Ph.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering, 2011.