Maxflow-Based Bounds for Low-Rate
Information Propagation over Noisy Networks
Abstract
We study error exponents for the problem of low-rate communication over a directed graph, where each edge in the graph represents a noisy communication channel, and there is a single source and destination. We derive maxflow-based achievability and converse bounds on the error exponent that match when there are two messages and all channels satisfy a symmetry condition called pairwise reversibility. More generally, we show that the upper and lower bounds match to within a factor of 4. We also show that with three messages there are cases where the maxflow-based error exponent is strictly suboptimal, thus showing that our tightness result cannot be extended beyond two messages without further assumptions.
I Introduction
Problems of low-rate (e.g., 1-bit) information propagation in noisy networks have recently gained increasing attention, with prominent examples including the following:
-
•
The transmission of a single bit over a tandem of channels (the 2-hop setting) was studied in [1, 2, 3], and the multi-bit variant was studied in [4]. The goal is to characterize the associated error exponent; as we further discuss below, this line of works is the one most relevant to the present paper.
-
•
The transmission of a single bit over a long chain of noisy channels (connected by relays) dates back to the work of Schulman and Rajagopalan [5] and recently regained attention [1, 6, 7] after being posed by Polyanskiy (who termed it the information velocity problem) and connected to the 2-hop setting by Huleihel, Polyanskiy, and Shayevitz [1].
-
•
Another line of works studied broadcasting problems on various kinds of graphs, such as trees [8], random graphs [9], and grids [10]. Here the goal is to reconstruct the original bit from the information received at the leaves, and the focus has been on simple intermediate nodes performing memoryless operations. (In contrast, the problems described above allow the intermediate/relay nodes to perform complicated coding strategies.)
In this paper, we address a notable gap in this list by extending the 2-hop problem to transmission over a fixed but arbitrary directed graph, with a single source and a single destination.111General graphs also fall under the framework of [5], even without the constraint that its size is fixed with respect to the block length, but the focus therein is only on scaling laws and not precise constants or error exponents. An extensive range of channel capacity results are known for such settings [11, Ch. 15], but 1-bit and other low-rate settings have remained less explored.
Specifically, we generalize the 2-hop multi-bit setup to arbitrary networks and show a maxflow-based achievability bound (Theorem 3), which supersedes our main previous multi-bit result for the 2-hop setting [4]. We also provide a converse bound (Theorem 4) based on the cutset idea, which matches the achievability bound in the 1-bit case when every channel satisfies a symmetry condition called pairwise reversibility (see Definition 7 below). While we are unable to find the optimal error exponent more generally, our achievability and converse bounds provide a 4-approximation (Lemma 122).
We proceed by outlining the related work in more detail, and then formally describe the problem setup and state our main results.
I-A Related Work
Tandem of channels (2-hop). The problem of transmitting a single bit over a tandem of channels was introduced by Huleihel, Polyanskiy, and Shayevitz [1], as well as Jog and Loh using different motivation/terminology based on teaching and learning in multi-agent problems [2]. Both of these works gave various protocols with varying trade-offs between their simplicity and their associated error exponents. We then showed in [3] that the 1-hop and 2-hop error exponents coincide whenever the two channels have the same transition law. In [4], we generalized this problem to the multi-bit setting and showed that whenever both channels are pairwise reversible, the 2-hop error exponent is simply given by the bottleneck (i.e., the worse of the two 1-hop error exponents).
Information velocity (many-hop). As we outlined above, the information velocity problem seeks to establish the minimum possible ratio of the number of hops to the block length to reliably send a single bit over a long chain of relays connected by binary symmetric channels (or more generally, other channels). This problem also has interesting connections to the 2-hop setting [1].
A general result of Schulman and Rajagopalan [5] translates noiseless protocols (on arbitrary graphs) to noisy ones, and when specialized to the line graph, this proves that positive information velocity is attainable. We gave a simple and efficient protocol achieving positive information velocity in [6], though precisely characterizing the best possible information velocity remains very much open. The information velocity of erasure-type channels was studied in [7], and for this class of channels an exact characterization was attained.
Broadcasting problems. As mentioned above, problems of broadcasting [8, 9, 10] have a similar flavor to the information velocity problem, but with some substantial differences including (i) the consideration of more complicated graphs, (ii) the combination of many nodes’ information to form the final estimate, and (iii) the consideration of simple memoryless intermediate nodes without the use of coding.
Capacity results on graphs. Another notable line of works has sought to characterize the channel capacity of noisy networks, as summarized in [12, Sec. 15.10] and [11, Ch. 15]. The study of capacity is largely quite distinct from that of 1-bit (or other low-rate) communication, but we will see some similarities throughout the paper. In particular, our results are based on maxflow and mincut ideas, which are widely used in capacity results. Instead of using 1-hop channel capacities as edge weights for the maxflow problem, we will use 1-hop error exponents. Our protocols and analysis will be substantially different compared to capacity results and will require further assumptions such as pairwise reversibility, but we will use an existing idea of replacing individual edges by multiple parallel edges of smaller weight in order to form edge-disjoint paths (e.g., see [11, Sec. 15.3]).
II Problem Setup
We first describe the classical channel coding setup on a discrete memoryless channel (DMC) , whose alphabets are denoted by and (or sometimes and ). The source node is given a message , and has uses of the channel to send information across to the destination node, after which the destination node guesses the value of . An error is said to have occurred if the destination’s guess is different from . The minimum probability of an error across all protocols with time steps is denoted by , and the error exponent of the channel with messages is defined as
(1) |
where the limit is as while remains fixed. We will use , or whenever the omitted parameters are clear from the context.
In this paper, we consider a communication over a directed graph with edges representing noisy channels. Specifically, we are given a directed graph (not necessarily acyclic), with two special nodes called source and destination. Without loss of generality, we assume that there exists at least one path from the source to the destination. For every edge , there is a DMC . We note that for a given pair of nodes, say , we allow for the possibility of multiple parallel edges connecting to (though this could equivalently be viewed as a single edge whose associated channel is a product channel).
At time 0, the source node receives a message . For each time step, every node gets one use of the channel associated with its outgoing edge. This process occurs for a total of time steps. After time steps, the destination node forms an estimate of the message; let be the resulting error probability.222Our results are unaffected by whether this error probability is averaged over all or taken with respect to the worst-case , as is the case for the vast majority of error exponent studies (e.g, [13, 14]). We are interested in the error exponent of the network, given by
(2) |
Note that we are taking the limit as , while and remain fixed. When the context is clear, we will simply denote this as .
For an edge , we let be short for (i.e., the 1-hop error exponent of the associated channel). For a fixed pair, we can create a network flow problem (see Section III-A) that assigns a capacity333This is “capacity” in a generic sense, and not the Shannon capacity. of on every edge , and consider the maximum flow across this network. We will refer to this as the maxflow for the pair, and denote it by . We will also shorten this to when the context is clear.
Our main result is the following achievability result for channels that are pairwise reversible; this is a classical notion of symmetry [14] that notably includes the binary symmetric channel (BSC), binary erasure channel (BEC), and their generalizations to non-binary inputs. Briefly, the property is that the quantity attains its minimum at for all , which implies that the Chernoff distance between two inputs simplifies to the Bhattacharyya distance. See Definition 7 below for further details, and [14] for a more comprehensive treatment with numerous examples.
Theorem 1.
If every channel in is pairwise reversible, then the error exponent is achievable, i.e.,
(3) |
Proof.
We note that this is an information-theoretic achievability result, and we do not attempt to establish an associated computationally efficient protocol (though doing so would be of interest). By comparison, our related works on 2-hop settings contained various results based on both efficient and inefficient protocols [3, 4].
We also prove the following converse bound:
Theorem 2.
For any pair , we have
(4) |
Proof.
See Section VI. ∎
Combining Theorems 3 and 4, we establish that equality holds whenever every channel is pairwise reversible and :
Corollary 3.
If every channel in is pairwise reversible, then
(5) |
We are also interested in the zero-rate error exponent. Momentarily returning to classical 1-hop communication, for , we define the error exponent of the channel at rate by
(6) |
and we extend this function to via . Analogously, in the setting of the present paper, we let and denote the zero-rate error exponent of the network. We can again consider the network flow problem which assigns a capacity of on every edge , which we will denote by or sometimes simply . The following result extends Theorem 3 to zero-rate error exponents, even without requiring the pairwise reversible assumption:
Theorem 4.
For any , whose channels need not be pairwise reversible, we have
(7) |
Proof.
See Section VII-B. ∎
Some additional results are given in Sections VII and VIII, and are briefly outlined as follows:
-
•
Lemma 27 states an achievability result for channels that may not be pairwise reversible; the idea is to use a weaker “symmetrized” error exponent on each edge in the graph, rather than the optimal 1-hop exponent.
-
•
In Theorem 29, we provide conditions (albeit somewhat restrictive) under which a matching converse can be obtained for pairwise reversible channels with .
- •
-
•
In Theorem 35, we show that in general there exist scenarios in which the maxflow-based error exponent is strictly suboptimal when , even under the constraint of an acyclic graph and pairwise reversible channels. Thus, we place a strong restriction on the extent to which our tightness result for can be generalized.
III Preliminaries
In this section, we introduce some additional notation and definitions, and provide a number of useful auxiliary results that will be used for proving our main results.
III-A Flows and Cuts
We momentarily depart from the above setup and consider generic notions on graphs. Consider a directed graph , with two special nodes called the source and destination . Suppose that on every edge , there is a non-negative real-valued capacity, denoted .
Let be a function from to . We say that is a flow if the following conditions hold:
-
•
(Capacity constraints) For each , .
-
•
(Flow conservation) For each , except for the source and destination, we have
(8)
Define the size of a flow as
(9) |
A flow is maximal if for any other flow , .
Closely related to flows is the concept of cuts. A cut is a partition of into two sets such that and . Define the size of a cut as
(10) |
where we emphasize that this only counts edges from to and not back-edges. A cut is minimal if for any cut , . We now state the well-known maxflow-mincut duality theorem:
Theorem 5.
(Maxflow-mincut duality theorem, e.g., [15]) The size of the maximum flow is equal to the size of the minimum cut.
The following result is also well known, but we provide a short proof for the interested reader.
Theorem 6.
(Flow decomposition theorem, e.g., [15]) Let be any flow. There exist finitely many simple paths444A path is simple if it never visits the same node more than once. from to , denoted by , with associated flows , such that
-
•
Each flows only along path . In other words, for all and otherwise.
-
•
The total sum of all is equal to . In other words, we have for all that
(11)
Proof.
We describe a procedure for iteratively building up a list of pairs. Given , let be any path that has non-zero flow across every edge. Let be the smallest among these (non-zero) values, and let be the flow formed by pushing units of flow along the path . We then append to the list of pairs, subtract along every edge on , and repeat the preceding steps until no flow remains. Since the number of active edges (i.e., those with non-zero flow) strictly decreases on every step of this process, it must terminate and produce a finite number of paths, as desired. ∎
III-B Chernoff Divergence
Let be a generic discrete memoryless channel. To lighten notation, we let denote the conditional distribution . An -codebook is defined to be a collection of codewords each having length , and when utilizing such a codebook, we will use the notation for the associated codewords.
For two probability distributions over some finite set , the Bhattacharyya distance is defined as
(12) |
For random variables , we will also use to refer to the Bhattacharyya distance between their distributions and , i.e.
(13) |
and for , we define the Bhattacharyya distance associated with two channel inputs as
(14) |
with a slight abuse of notation.
Generalizing the Bhattacharyya distance, the Chernoff divergence with parameter is given by
(15) |
and the Chernoff divergence (with optimized ) is given by
(16) |
Analogous to (13)–(14), we also write
(17) | |||
(18) |
For a certain class of channels, the quantities and coincide, giving us the following definition:
Definition 7.
[14] A discrete memoryless channel is pairwise reversible if, for all ,
(19) |
which occurs when the quantity
(20) |
attains its minimum at .
The pairwise reversibility assumption allows for a number of useful results and constructions that are not available for general channels. In particular, Lemma 12 below gives an explicit expression for the 1-hop error exponent, and Lemma 29 below is based on an explicit construction that attains that exponent. We again refer the reader to [14] for a detailed discussion on pairwise reversibility with several examples.
For any positive integer , we let denote the -fold product of , with probability mass function
(21) |
For two sequences of length , we also use the notation and similarly to (14) and (18), with the understanding that are treated as inputs to . We will use a well-known tensorization property of (and hence also ), stated as follows.
Lemma 8.
III-C 1-hop Error Exponents
We will use several results from [14] on the 1-hop error exponents. We first state two closely-related results that are implicit in [14, Thm. 2] and its proof.
Lemma 9.
(Implicit in [14, Thm. 2]) Letting be the received sequence from uses of the channel , we have for any fixed and any sequence of codebooks (indexed by ) that
(24) |
Theorem 10.
(Implicit in [14, Thm. 2]) For any , it holds for all sufficiently large that there exists an -codebook such that for all pairs of codewords ( in , we have
(25) |
Next, we state explicit formulas/bounds for error exponents in the zero-rate and fixed- settings, respectively.
Theorem 11.
[14, Thm. 4] For any , the zero-rate error exponent is given by
(26) |
Lemma 12.
In accordance with Lemma 12, we will use the following definition:
Definition 13.
Define the Bhattacharyya distance based error exponent by
(28) |
and note that for pairwise reversible channels.
Lemma 14.
For any DMC , there exists an -codebook with such that for any pair , it holds that
(29) |
Since we will make use of codes over product channels, the following lemma is useful:
Lemma 15.
Let be a fixed integer and be length- codewords, and let be the restriction of to these inputs. Then . Furthermore, if is pairwise reversible, then is also pairwise reversible.
Proof.
Let be codewords achieving an error exponent arbitrarily close to . Since each character in is a string of characters over , we can expand the strings to form , achieving the same error probability when used as codewords for . The scaling factor comes from the factor of difference in the block length.
For the second claim, we apply Lemma 23 to obtain
(30) |
Since is pairwise reversible, the right-hand side attains its minimum at , which means that is also pairwise reversible. ∎
The next lemma allows us to break down the error exponent of a product channel into its individual components when . (We note that this result does not extend to .)
Lemma 16.
Let be arbitrary channels, and let be the product channel. Then
(31) |
Proof.
For any DMC, the error exponent for is given as follows [14, Eq. (1.19)]:
(32) |
Accordingly, we choose attaining the maximum in
(33) |
and let and represent the -th component of and respectively. Then,
(34) | |||||
(35) | |||||
(36) | |||||
(37) |
as required. ∎
The following result shows that we have equality in Lemma 31 if we restrict all to be pairwise reversible:
Lemma 17.
Let be pairwise reversible channels and let be the product channel. Then
(38) |
Proof.
We first note from the second part of Lemma 15 that is pairwise reversible. As a result, by Lemma 12, there exists some such that
(39) |
Moreover, by Lemma 23, we can break this down into the Chernoff exponents for the individual channels:
(40) |
To prove the inequality in the other direction, we simply reverse the entire argument. For each channel , choose to attain equality in (27). Defining then gives
(41) |
where both steps are based on (27) holding with equality (first for , and then for the individual ). ∎
III-D Feedback Error Exponent
In the well-studied problem of 1-hop communication with feedback, the sender is given access to all of the previous channel outputs before sending the next channel input. For a discrete memoryless channel , we define the minimum probability of error with feedback , and define the error exponent of the channel with messages via
(42) |
Clearly, . In general, feedback error exponents may be strictly higher than non-feedback ones, even for the binary symmetric channel [16]. However, the two error exponents match for arbitrary DMCs when :
III-E Properties of Chernoff Divergence
The following lemma concerns the Chernoff divergence of a composite channel:
Lemma 19.
[4, Lemma 7] Let be channels such that . For all , we have
(44) |
The next result relates the error probability of a likelihood ratio test with the Chernoff divergence. We expect that this result is standard, but we provide a short proof for completeness.
Lemma 20.
Let and be two distributions over the same alphabet . For each , define the likelihood ratio by . Then
(45) |
Proof.
Consider the set
(46) |
For all , , and therefore
(47) |
which gives
(48) |
∎
IV The Series Case for Theorem 3
In this section, we handle the special case in which the nodes are arranged in a series (i.e., a line graph). That is, the nodes are labeled , and the edges are . We refer to the corresponding channels as , with corresponding 1-hop error exponents . All channels are assumed to be pairwise reversible in this section.
This series setup directly generalizes previous work on the 2-hop setting in [1, 2, 3, 4], particularly our work on the the multi-bit setting [4]. Specifically, Theorem 3 for the series setup generalizes the main result therein by extending it from 2 hops to any constant number of hops. The protocol that we adopt has some significant differences to [4], which we discuss in Section IV-D. On the other hand, a key similarity is that we still adopt a block-structured strategy.
IV-A Sequential Block Transmission
We momentarily turn our attention to protocols in which nodes transmit blocks to each other one after the other without any parallel transmission. This idea would be highly wasteful if used in a standalone manner, but the idea (in the next subsection) will be to chain many instances of this approach together in parallel so that a negligible fraction of the total block length is ultimately wasted.
For a series graph, define a sequential block transmission with block size as follows:
-
•
For each edge , transmits to using channels across ;
-
•
This process occurs sequentially; transmission over an earlier edge (i.e., closer to the source) must finish before transmission over the next edge commences.
In Section IV-C, we will prove the following key lemma for this setting:
Lemma 21.
Given a series graph, consider a sequential block transmission with block size . Let be the length- sequence received by the final node , and let be the maximum flow (since we are in the series setup, this is equal to ). Then there exists a protocol such that for all ,
(49) |
where is a term that approaches zero as increases while and remain constant.
IV-B Implication for our Main Problem Setting
Returning to our main problem setting in which nodes are not required to transmit one-after-the-other, we obtain the following corollary of Lemma 21.
Corollary 22.
Given a series graph with time steps, there exists a protocol such that for any , the length- sequence received at node satisfies
(50) |
where is a term that approaches zero as increases while and remain constant (without any dependence on ).
Proof.
Consider the process of chaining several sequential block transmissions in parallel, as depicted in Figure 1. If there are sequential block transmissions of length , then they can clearly be executed in time steps. In other words, with a block length of , we can perform sequential block transmissions (up to asymptotically negligible rounding issues).
For each such sequential block transmission, we use the protocol that achieves Lemma 21. Since memory is not stored between the blocks, the blocks are conditionally independent given the source message. Let represent the -th block received by the final node, and let . Then, using tensorization (Lemma 23) and Lemma 21, we have for all that
(51) | |||||
(52) | |||||
(53) |
as required. ∎
IV-C Proof of Lemma 21
We first show that it suffices to prove Lemma 21 under an additional assumption that will be convenient to work with:
Lemma 23.
Proof.
Consider the general setup of Lemma 21 with arbitrary . By Lemma 29, for each , there exist codewords of length such that
(58) |
Let be the restriction of to the codewords . We first show that the channels satisfy the extra requirement stated in Lemma 23.
By Lemma 15, is pairwise reversible. Moreover, for each , we have from (58) that
(59) |
Then, we deduce from Lemma 12 (with the inputs of being substituted for ) that . Since the maxflow is in the series setup, this gives . Then, let be the maximum flow for the sequence of channels . Since for all , the new maximum flow satisfies . On the other hand, by Lemma 15, we have , which gives
(60) |
Combining these upper and lower bounds on , we conclude that , and from (59) we deduce that the channels satisfy the additional requirement imposed in the statement of Lemma 23 (with renamed to ).
Now suppose that Lemma 21 holds for the new sequence of channels , and let denote the corresponding length- sequence received by the destination node. This means that there exists a sequential block transmission protocol of block size such that for any ,
(61) |
Any sequential block transmission with block size using the new channels can easily be converted to a sequential block transmission with block size over the original channels by simply expanding instances of the new letter by the codeword . Therefore, there exists a sequential block transmission of block size such that for any ,
(62) |
Since is a constant, we can appropriately rescale such that above is renamed to (the overall sequential block transmission length), and we conclude that Lemma 21 holds for the original channels . ∎
For the rest of this section, we work under the extra assumption stated in Lemma 23. We also assume that is even to avoid rounding issues.
IV-C1 Description of the Forwarding Protocol
We consider the following protocol for a single sequential block transmission of length :
-
•
Every node () is assigned a state consisting of two integers with and , which we will refer to as when there is no ambiguity. A state of roughly indicates that the node believes to be the most likely value of , where larger values of indicate higher confidence, whereas a smaller value of indicates that there is another competing value that is also relatively likely (we make this more precise below).
-
•
Node 0 is assigned and , where is the actual message.
-
•
If node is assigned a state , then this node sends out a copies of followed by copies of (unless , in which case we wrap around and replace by here). We refer to the corresponding string as , or when we want to emphasize which node is being considered.
-
•
Node (), upon receiving from node , computes the likelihoods for each possible message . Let be the resulting most likely and second most likely value of respectively (with arbitrary tie-breaking).
-
•
Node is now assigned a state of
(63) These equations are chosen such that reflects the most likely source message, and appropriately reflects the confidence level such that the most likely message is roughly times more likely than the next value of .
IV-C2 Analysis of the Strategy
We start by defining a pseudometric on the states by
(64) |
To see that defined above is a pseudometric, note that is equivalent to the -norm after applying the mapping (with occurring at position ). The reason for being a pseudometric rather than a metric is that .
Intuitively, is a measure of how far apart the beliefs and are. If , then these two beliefs are close when and are close. If , the beliefs are close only when and are both comparatively likely, which corresponds to the case where both and are small.
We use to bound the Bhattacharyya distance associated with two codewords:
(65) | |||||
(66) | |||||
(67) |
where:
- •
- •
- •
Observe that the sequence of states taken by the nodes forms a Markov process. We establish a bound on the probability of achieving any intermediate state given the beginning state; intuitively, the following result states that given the original state , the probability of seeing another state is exponentially small in the distance between the two states defined by (64).
Lemma 24.
For all , all , and all , we have
(70) |
and
(71) |
Proof.
We use an induction argument, with the base case being that (70) trivially holds when : Either the left-hand side is infinite due to a probability of zero, or the left-hand side is zero due to a probability of one (i.e., ), in which case the first term on the right-hand side is zero.
We proceed by induction on , and break the inductive argument into the following two claims:
- •
- •
Start by assuming that (70) holds for some . Observe that forms a Markov chain. Let be the channel and be the channel ; applying Lemma 44 with then gives for any that
(72) | |||||
Since (70) holds for (and for all and ), we can combine it with (67) to simplify the expression inside the minimization clause:
(74) | |||||
(75) |
where (74) follows from non-negativity (first term) and the triangle inequality (second and third terms) for the pseudometric , and the last step uses the definition of with . By substituting (75) into (72), we conclude that (71) holds for .
We now turn to the second dot point mentioned at the start of the proof. We will split this into two cases. We note that in (70) we are interested in the event that , and for this to happen, must be the value assigned to used in (63), i.e., must be the most likely message given . We will use this fact in both cases.
Case 1 (). If , then , so that the RHS of (70) is upper bounded by 0, and therefore (70) trivially holds. Therefore, we assume that .
For each and each , define
(76) |
By (63) and the assumption , there exists (which can be chosen to be the second most likely value of given , noting that the most likely value is ) such that
(77) |
Combining this with a union bound over all , we have
(78) | |||||
(79) |
We can now conclude that
(80) | |||||
(81) | |||||
(82) | |||||
(83) | |||||
(84) |
where (81) follows from Lemma 45, (82) follows from the inductive assumption (71), and (84) follows from the definition of with . Thus, (70) holds for as desired.
Case 2 (). For each , define
(85) |
Let (which may depend on ) be the second most likely value of (noting that is the most likely value). We then have
(86) |
where the first inequality uses the fact that the likelihood for is at most that of , and the second inequality follows from (63) after lowing bounding the minimum therein by the second term.
IV-D Comparison to Previous Work
In our previous paper [4], we proved the 2-hop version of Lemma 21. In [4], the problem is reduced to the case (similarly to Lemma 23) and then handled as follows:
-
•
The relay node calculates the KL-divergence between the empirical distribution of the received symbols against the output distribution associated with each message.
-
•
The relay then sends a sorted sequence with more occurrences of the symbol whose associated KL-divergence is smallest, and an equal number of occurrences of the rest.
Our approach in the present paper is instead based on a likelihood ratio (see (63)), and upon calculating this, a sorted sequence is sent containing at most two distinct values.
At first glance, one may have expected that using likelihood ratios would be difficult, as there can be likelihood ratios of interest. However, our approach here shows that using the likelihood ratios between only the two most likely values is sufficient. Overall, our approach here is more similar to the one used in [1] (for general binary-input channels rather than the BSC), which directly computes the likelihood ratio but only handles the 2-hop setting and requires .
V Proof of Theorem 3 (Main Achievability Bound)
In this section, we prove Theorem 3 in the general case, using the series case from Section IV as a stepping stone. We crucially make use of the flow decomposition theorem (Theorem 6) together with its associated paths and flows . The overall idea is to send conditionally independent messages along different paths and only accumulate them at the end.
V-A Edge-Disjoint Paths
The protocol is simpler when the paths are edge-disjoint, so we will consider this case before turning to the general case.
The key idea is to transmit information across each path independently and aggregate them only at the end. Although we assume that paths are edge-disjoint, they may still share nodes. However, each node is designed to act independently on every path it is involved in, so that information entering a node in one path does not affect other outgoing paths.
Let be the sequences received by the destination node along the different paths after time steps, and let . By Corollary 22, for all and all , there exists a (single-path) protocol such that
(89) |
where the term comes from the fact that a simple path cannot have more than edges. Therefore, again letting , we have
(90) | |||||
(91) |
Note that here is treated as a constant, because is a property of the graph so does not depend on .
Applying Lemma 9, we deduce that the total error exponent is bounded below by
(92) |
which shows that error exponents arbitrarily close to are achievable.
V-B The General Case
For the general case, we will still rely on the edge decomposition as before. Our approach is to reduce to the case where paths are edge-disjoint by considering the use of parallel edges, which are allowed to exist in all of our analysis and results. This idea has also been used previously when studying channel capacity results for general networks, e.g., see [11, Sec. 15.3]
We first make two observations:
-
•
Suppose that there is some edge such that is a product channel, say . We can delete and replace it by two parallel edges and , and put the channels and respectively and preserve the error exponent. This is because any symbol that can be sent across can be separately sent across the two channels in an identical manner.
-
•
For any graph , let be the resultant graph formed by replacing each with its -fold product . Then . To see this, observe that any protocol with time steps over can be executed with time steps over .
These two observations allow us to do the following:
-
•
Pick a large block size , and replace every channel with the -fold product channel
-
•
For each edge , let
(93) be the indices of the paths that pass through . For each , define
(94) Since there are only paths, we have . Moreover, we have
(95) For each edge with corresponding channel , replace this by parallel edges, where the corresponding channels are for each . We will refer to each of these new edges by . This is possible since . If , then we simply add a dummy (unused) channel to make up for the difference. Let denote this new graph.
Observe that after the transformation, the capacities of the corresponding parallel edges are
(96) |
noting that by the flow constraints.
Let be the path in formed by joining up edges of the form for each . Since is a path and joins the same pair of vertices as , is also a path. By (96), each of these edges has error exponent at least , so that the capacity on path is at least . Thus, there exist edge-disjoint paths on such that the capacity on each path is at least .
We have now reduced the problem to the edge-disjoint case studied in the previous subsection, and we conclude that we can achieve error exponents arbitrarily close to . Since time was scaled by a factor of , we conclude that and therefore
(97) |
Taking , we conclude that .
VI Proof of Theorem 4 (Main Converse Bound)
The proof of our converse bound consists of using the max-flow min-cut duality theorem (Theorem 5) to construct a cut with the same value as . We will additionally use the following lemma, where we recall that denotes the 1-hop error exponent of a channel with full feedback:
Lemma 25.
Consider any partition of the nodes in with the source node in and the destination node in . Let be the channels associated with edges from to , and let be the product channel. Then for all ,
(98) |
Proof.
We compress into a single node by allowing infinite information transfer between different nodes inside , and do the same for . Clearly, this modification cannot decrease the overall error exponent.
At first glance, it may appear that the net result of doing this is simply allowing to send information to via , each a total of times, which is the same as using for times. However, we must also account for the possibility of back-edges, which amounts to having a certain degree of feedback. In view of this, since any protocol that can be executed under the original setup can also be executed under the new setup with uses of with full feedback, we conclude that . ∎
Since is a decreasing function of , . Furthermore, feedback does not improve the error exponent when (Theorem 43), and thus Lemma 25 immediately gives the following corollary:
Corollary 26.
Under the definition of in Lemma 25, we have
(99) |
VII Achievability Bounds for General Channels
In this section, we study achievability bounds in the case that the channels need not be pairwise reversible.
VII-A Constant Number of Messages
When the pairwise reversible assumption is dropped, Theorem 3 can fail to hold even in the 2-hop setting corresponding to a graph with 3 nodes and 2 edges [4, Thm. 3]. However, it is possible to prove some weaker achievability bounds. Recalling from Definition 13, we state the following achievability bound for general channels:
Lemma 27.
Given an arbitrary graph , let be the maxflow formed when the capacity of every edge is set to . Then
(103) |
Proof.
We use the same proof as the pairwise reversible case. The only use of the pairwise reversibility condition is in Lemma 23 when constructing the codewords of length such that . However, such codewords of length with (i.e., with in place of ) are still guaranteed to exist by Lemma 29 even when the pairwise reversibility condition is dropped. Outside of Lemma 23, the proof follows with replaced by where appropriate. ∎
VII-B Zero-Rate Error Exponents
In this subsection, we prove Theorem 7. We start by relating to the zero-rate error exponent:
Lemma 28.
For any , we have
(104) |
By taking on both sides, we also have
(105) |
Proof.
We use the expression for in Theorem 26. Let maximize the expression in (26). Draw independently according to distribution , and observe that the definition of in Definition 13 gives
(106) |
To show the second inequality in the lemma statement, let be the empirical distribution (type) of achieving (28). Substituting into Theorem 26, we have
(107) | |||||
(108) | |||||
(109) | |||||
(110) |
∎
We now prove Theorem 4 by devising a protocol whose error exponent is arbitrarily close to , where the edge weights in are the respective 1-hop zero-rate error exponents (i.e., for edge ).
Let be a (large) constant, and let be the error exponent formed by setting the capacity of each edge to . Since for all , we have .
Building on Sections IV and V, choose a (large) block size , and generate a sequential block transmission protocol of length with messages. We can treat a single sequential block transmission as a discrete memoryless channel . More precisely, the input to is the message that the source node receives, and the output of is what the destination receives from a single block. The analysis in Sections IV and V shows that given any , there is some sufficiently large such that
(111) |
for all . Substituting into (13), it follows that
(112) |
We now proceed to bound the zero-rate error exponent of . By Lemma 105,
(113) |
Similar to the proof of Corollary 22, we can run conditionally independent copies of the block protocol in time steps, which is equivalent to using channel for a total of times. By adopting any coding strategy for that attains its zero-rate error exponent,555Such a strategy may be very complex, but we recall that our focus is on information-theoretic limits and not practical protocols. we deduce that
(114) |
Since is arbitrarily small and is arbitrarily large, we conclude that
(115) |
as required.
VIII Further Results
In this section, we provide some additional results that address when it is (or is not) possible to relax the assumptions made in our main results.
VIII-A Converse Bound for
Recall that denotes the maxflow for the graph with messages. One might hope that the proof of Theorem 4 can be modified to prove a converse bound of (instead of ). However, there are at least two problems in doing so:
-
•
We cannot guarantee that the 1-hop feedback and non-feedback error exponents are equal when ;
-
•
The required analog of Lemma 31 may not hold when .
In this subsection, we partially address these issues by imposing additional assumptions. Specifically, we prove the following:
Theorem 29.
Suppose that the following conditions are satisfied:
-
•
has a mincut with no back-edges (i.e., all edges are from the source side to the destination side);
-
•
Every channel in is pairwise reversible.
Then, we have .
We proceed with the proof. Although the bound in Lemma 25 uses the feedback error exponent, we do not need to consider feedback if the partition has no back-edges. We immediately deduce the following:
Lemma 30.
Consider any partition of the nodes in with the source node in and the destination node in . Let be the channels associated with edges from to , and let be the product channel. If there are no back-edges, then for all , we have
(116) |
Starting from a mincut with no back-edges, let be the channels going across the cut. Due to our pairwise reversible assumption, we may apply Lemma 38 to obtain
(117) |
where the last step holds by maxflow-mincut duality. This completes the converse part of Theorem 29. The achievability part is an immediate consequence of Theorem 3.
Next, for zero-rate error exponents, we show that the pairwise reversibility condition can be dropped:
Lemma 31.
Suppose that has a mincut with no back-edges. Then, we have where is the maxflow formed by setting the capacity of every edge to .
Proof.
VIII-B Approximation Guarantees
Lemma 27 and Theorem 4 reveal that for any network and any , we have
(121) |
where we recall that uses (28) for the edge weights in the maxflow calculation. We will proceed to show that this is in fact a 4-approximation:
Lemma 32.
For any and , we have
(122) |
Proof.
It is sufficient to show that , as this means that and are defined with respect to edge weights that differ by at most a factor of 4. We have
(123) | |||||
(124) | |||||
(125) | |||||
(126) |
where (123) follows from (32), (125) follows by using the concavity of in [13, Thm. 5] and considering an equal combination of the values at and (along with for the latter), and (126) follows from (28).
The next lemma gives conditions under which a tighter approximation is possible:
Lemma 33.
Suppose that either of the following conditions hold:
-
•
; or
-
•
Every channel in is pairwise reversible.
Then, we have a 2-approximation in (121), i.e., .
Proof.
We also identify another scenario where our bounds are exactly tight. Consider the -ary symmetric channel with inputs , outputs , and
(128) |
for some .
Lemma 34.
Suppose that every channel is a -ary symmetric channel for some (different channels can have different values of and different noise levels). Then , and the protocol given in Section V provides the achievability part.
Proof.
Similarly to the earlier results in this subsection, we simply need to show that for each . Recall the definition of :
(129) |
For the -ary symmetric channel, ie either 0 (if ) or a fixed positive value (if ). Since , we can achieve the maximum in (129) by letting be distinct, yielding
(130) |
The computation for is also straightforward; since is pairwise reversible, we can simply choose any two distinct inputs to obtain
(131) |
Since for all edges, it follows that . ∎
VIII-C Counterexample to a Natural Conjecture
In Theorem 29, we assumed that has a mincut with no back-edges. It might be tempting to believe that this is the case whenever contains no directed cycles. However, this is false, as we can see in Figure 2, with the mincut described in the figure caption.
Using the example in Figure 2, we show that even when every channel is pairwise reversible and the graph is acyclic, it is possible to achieve an error exponent greater than when :
Theorem 35.
Even if we assume that is acyclic and every channel is pairwise reversible, there exist scenarios in which when . Moreover, the same statement holds even when the 1-hop error exponents defining are replaced by their counterparts with full feedback.
Proof.
Consider the graph in Figure 2. Set , and consider the setup described as follows for some :
-
•
On the solid edges, infinite noiseless communication is allowed.
-
•
On the edge , we consider a ternary symmetric channel with inputs , outputs , and with if , and otherwise.
-
•
On the edge , we consider a binary symmetric channel with noise parameter .
We will prove that as becomes sufficiently small, it is possible to beat the maxflow rate. For the rest of this section, asymptotic notation refers to the limit as .
Let us first compute the error exponents of all the channels. Since all channels are pairwise reversible, we may compute the error exponents using as per (28). For the ternary symmetric channel, (28) is maximized when are all distinct, so the error exponent is given by
(132) |
For the binary symmetric channel, only two of can be distinct, so the error exponent is given by
(133) |
Hence, the maxflow is simply the sum of the two error exponents, i.e., .
We will prove that is an achievable error exponent, thus disproving the reasonable conjecture that the maxflow is an upper bound to the error exponent. Similar to Section IV, define a single “sequential transmission” as follows:
-
•
Node 1 inputs the true message into its channel connecting to node 2, and also sends to node 3;
-
•
Node 2 then tells both nodes 3 and 4 what it received;
-
•
Node 3 then looks at both the ground truth from node 1 and the forwarded value from node 2. If they are the same, message is sent to node 4, and otherwise, message is sent to node 4.
A total of such transmissions can occur in the time steps by chaining them one after the other (similar to Figure 1). Moreover, we can view a single sequential transmission as a discrete memoryless channel , where the input is , and the output consists of what node 4 receives from both nodes 2 and 3.
We can classify the result of each sequential transmission into three possible outcomes:
-
(i)
Node 4 receives message 1 from node 3. This means that either node 2 received the wrong message (and therefore node 3 sent 1), or the message from node 3 to node 4 got flipped. Each event happens with probability , so the total probability of this is .
-
(ii)
Node 4 receives message 0 from node 3, and receives the wrong message from node 2. In this case, both node the messages from 1 to 2 and 3 to 4 must be corrupted, so this happens with probability
-
(iii)
Node 4 receives message 0 from node 3, and receives the correct message from node 2. This occurs with probability .
∎
We summarize the end-to-end transition probabilities in Figure 3:
output | |||||
---|---|---|---|---|---|
(0,0) | (1,0) | (2,0) | (*,1) | ||
input | 0 | ||||
1 | |||||
2 |
Since we are allowed uses of in time steps, we have . We now proceed to bound ; we have
(134) |
and similarly when is replaced by any two distinct symbols. Using Lemma 12, this gives
(135) |
showing that an error exponent of is achievable.
This same counterexample in fact also establishes the second part of Theorem 35 in which the mincut is defined with respect to edge capacities instead of . To see this, first note that for the ternary symmetric channel, we have ; this follows from the proof of Lemma 34, with the ternary symmetric channel being pairwise reversible. On the other hand, (Theorem 43), (error exponent is decreasing in ) and (feedback cannot decrease error exponents). Therefore, the ternary symmetric channel has ; this means that all the inequalities must hold with equality, yielding
(136) |
For the binary symmetric channel, the feedback error exponent for is given as follows [16, o. 64]:
(137) |
which gives a maxflow of . Once again, since this is strictly below for small enough , we obtain the desired result.
References
- [1] W. Huleihel, Y. Polyanskiy, and O. Shayevitz, “Relaying one bit across a tandem of binary-symmetric channels,” IEEE International Symposium on Information Theory (ISIT), 2019.
- [2] V. Jog and P. L. Loh, “Teaching and learning in uncertainty,” IEEE Transactions on Information Theory, vol. 67, no. 1, pp. 598–615, 2021.
- [3] Y. H. Ling and J. Scarlett, “Optimal rates of teaching and learning under uncertainty,” IEEE Transactions on Information Theory, vol. 61, no. 11, pp. 7067–7080, 2021.
- [4] ——, “Multi-bit relaying over a tandem of channels,” IEEE Transactions on Information Theory (to appear), 2023.
- [5] S. Rajagopalan and L. Schulman, “A coding theorem for distributed computation,” ACM Symposium on Theory of Computing, 1994.
- [6] Y. H. Ling and J. Scarlett, “Simple coding techniques for many-hop relaying,” IEEE Transactions on Information Theory, vol. 68, no. 11, pp. 7043–7053, 2022.
- [7] E. Domanovitz, T. Philosof, and A. Khina, “The information velocity of packet-erasure links,” in IEEE Conference on Computer Communications, 2022.
- [8] W. Evans, C. Kenyon, Y. Peres, and L. J. Schulman, “Broadcasting on trees and the Ising model,” Annals of Applied Probability, vol. 10, no. 2, pp. 410–433, 2000.
- [9] A. Makur, E. Mossel, and Y. Polyanskiy, “Broadcasting on random directed acyclic graphs,” IEEE Transactions on Information Theory, vol. 66, no. 2, pp. 780–812, 2020.
- [10] ——, “Broadcasting on two-dimensional regular grids,” IEEE Transactions on Information Theory, vol. 68, no. 10, pp. 6297–6334, 2022.
- [11] A. El Gamal and Y.-H. Kim, Network information theory. Cambridge University Press, 2011.
- [12] T. M. Cover and J. A. Thomas, Elements of information theory. John Wiley & Sons, Inc., 2006.
- [13] C. Shannon, R. Gallager, and E. Berlekamp, “Lower bounds to error probability for coding on discrete memoryless channels. i,” Information and Control, vol. 10, no. 1, p. 65–103, 1967.
- [14] ——, “Lower bounds to error probability for coding on discrete memoryless channels. ii,” Information and Control, vol. 10, no. 5, p. 522–552, 1967.
- [15] L. Trevisan, “Stanford CS261 lecture notes,” 2011, https://theory.stanford.edu/ trevisan/cs261/lecture11.pdf.
- [16] E. R. Berlekamp, “Block coding with noiseless feedback,” Ph.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering, 1964.
- [17] B. Nakiboglu, “Exponential bounds on error probability with feedback,” Ph.D. thesis, Massachusetts Institute of Technology, Department of Electrical Engineering, 2011.