SOPI design and analysis for LDN
Abstract
Liquid Data Networking (LDN), introduced in [1], is an ICN architecture that is designed to enable the benefits of erasure-code enabled object delivery. A primary contribution of [1] is the introduction of SOPIs, which enables clients to concurrently download encoded data for the same object from multiple edge nodes, optimizes caching efficiency, and enables seamless mobility. This paper provides an enhanced design and analysis of SOPIs.
I Introduction
The paper [1], inspired by fountain codes [2], [3], [4], [5], introduces Liquid Data Networking (LDN), an ICN architecture that is designed to enable the benefits of erasure-code enabled object delivery. A primary contribution of [1] is the introduction of stream object permutation identifiers (SOPIs), which provides a simple and efficient download coordination mechanism: clients can concurrently download encoded data for the same object from multiple edge nodes, caching efficiency is optimized, and seamless mobility is enabled. This paper provides an enhanced design and analysis of SOPIs, and inherits the terminology and notation of [1].
II SOPI and Stream object
SOPIs and stream objects are fundamental to the design of LDN: they enable a diversity of encoded data to be available for download within the network for each object, while at the same time ensuring that different clients request the same encoded data for an object from the same neighboring encoding node. The SOPI design allows the client decision of which encoded data to request for an object to be simple, robust, and efficient.
An object is partitioned into source symbols, where is the object size divided by the symbol size, and the symbol size is typically chosen to fit into a packet payload. Stream objects can be described in terms of an erasure code with optimal recovery properties and with the property that encoded data symbols can be generated from any object, where is a large prime number, and in particular , where is the maximium number of source symbols in any object.
A stream object for an object consists of all of the possible encoded data that can be generated for the object in a specified order. The essential idea is that different stream objects specify completely different orderings of the available encoded data for an object, and thus a client can simply request prefixes of different stream objects for an object to receive non-overlapping encoded data.
A stream object permutation identifier (SOPI) specifies the ordering of encoded data symbols for a stream object. A SOPI identifies a permutation of the available symbol IDs. Thus, a stream object identifier is the combination of the identifier of the object from which it is generated and a SOPI . We consider SOPIs of the form , where , . Then, defines the permutation of symbol IDs
where each term is taken mod , i.e.,
for any position .
II-A RaptorQ SOPI implementation
For the implementation of the RaptorQ code [3] described at [5], there are possible symbols of encoded data for an object. Since is a prime number (a Mersenne prime), can be used for the implementation [5] of the RaptorQ code.
Since is one less than a power of two, it is straightforward to compute , e.g.,
-
•
Compute .
-
•
Let be the least significant bits of .
-
•
Let be the next bits of .
-
•
Let .
-
•
If then .
-
•
If then .
III Random SOPI sets
We analyze the properties of collections of SOPIs, where each SOPI in the collection is randomly and independently chosen.
Proposition III.1
For any pair of positions
and for any pair of symbol IDs
there is a unique such that
∎
Proposition III.1 implies that the pair of symbol IDs in any pair of positions within the permutation are random with respect to a random SOPI .
Suppose an object is composed of a single source block with source symbols. A client will download encoded data from prefixes of multiple stream objects of an object in order to recover the object. We would like to show that a client doesn’t receive many duplicate symbols when downloading from multiple stream objects with randomly chosen SOPIs. Suppose the client downloads from some number of different stream objects, with corresponding randomly chosen SOPIs . Suppose the client receives symbols from the stream object , symbols from the stream object , etc. where
The expected number of distinct symbol IDs among is at least
and thus if then
symbols on average are enough to receive symbols with at least distinct symbol IDs. Theorem III.2 below provides bounds on the probability that received symbols have at least distinct symbol IDs.
Theorem III.2
Consider an object composed of a single source block with source symbols. Suppose at least
symbols have been received in total from stream objects
where and . Then,
(1) |
with respect to the random variables
Proof:
Let be the random variable that is the number of unique symbol IDs among received symbols for an object with respect to the random variables
The left-hand side of Inequality (1) is equivalent to
(2) |
and thus we need to prove that
(3) |
Each of the received symbols has a symbol position within the stream object from which it is received, and a symbol ID determined by the stream object SOPI and the symbol position within the stream object. We associate a unique index within with each of the (stream object, symbol position) pairs of received symbols. For , we let if the symbol position within the stream object indexed by is mapped to the same symbol ID as the symbol position within the stream object indexed by , and if the symbol position within the stream object indexed by is mapped to a different symbol ID than the symbol position within the stream object indexed by . Thus, if and only if the symbol IDs mapped to by and are duplicates. Then,
(4) | ||||
(5) |
where Inequality (4) follows since is the number of duplicate symbol IDs and if the symbol IDs mapped to by and are the same then , and Inequality (5) follows from Markov’s inequality.
Expanding out terms,
(6) | ||||
(7) |
If and index different symbol positions within the same stream object then since the SOPI for the stream object defines a permutation of the symbol IDs and thus cannot map and to the same symbol ID. If and index symbol positions within two different stream objects then since the SOPIs for the two different stream objects are chosen independently. Thus,
(8) |
Similarly, if and index different symbol positions within the same stream object or and index different symbol positions within the same stream object then . If and index different symbol positions within one stream object and and index different symbols positions within a second stream object then by Proposition III.1. If and index different symbol positions within one stream object and and index the same symbol position within a second stream object then since and cannot map to the same symbol ID. If and index different symbol positions within one stream object and indexes a symbol position in a second stream object and indexes a position within a third stream object then , since if and map to the same symbol ID then and cannot both map to that symbol ID, and if and map to different symbol IDs (with probability ) then and map to the same symbol ID and and map to the same symbol ID with probability from Proposition III.1. If , , and index symbol positions in four different stream objects then , since each of the four SOPIs are chosen randomly and independently of one another. All other cases are variants of these cases. Thus,
(9) |
From Inequalities (8) and (9) it follows that
(10) | ||||
(11) |
where Inequality (11) follows since . Combining Inequality (11) with Inequality (5) proves Inequality (3). ∎∎
III-A Random SOPI set examples
For the RaptorQ code specified in [3], the maximum supported number of source symbols per source block is , and is a good choice as described in Section II-A. With , the condition is satisfied, and for . Thus, Theorem III.2 holds for RaptorQ for all supported source block sizes and for any .
On average at most a fraction of the symbol IDs of received encoded data symbols from prefixes of stream objects will be duplicates when the total number of received symbols is up to . However, stronger bounds on the probability of the number of duplicates being above a certain bound are of importance,
For example, setting , Theorem III.2 shows that an object composed of a single source block of source symbols can be recovered with probability at least
from received symbols in total from different stream objects.
As another example, setting , Theorem III.2 shows that an object composed of a single source block of source symbols can be recovered with probability at least
from received symbols in total from different stream objects.
The analysis of Theorem III.2 is not tight, and thus these bounds are conservative.
IV Designed SOPI sets
Although the analysis in Section III shows there is little chance of there being a significant fraction of duplicates among received symbols from multiple stream objects, there is still a chance that there is a significant fraction of duplicates. As an extreme example, if and are identical then the overlap between prefixes of length is , i.e., all symbols are duplicates. As a somewhat less extreme example, if and where is randomly chosen, then for prefixes of length there is a chance there is no overlap between the prefixes, but with the remaining probability there are on average duplicates.
In this section, we provide a deterministic design of a large set of SOPIs for which we can provide guaranteed upper bounds on the number of symbol ID duplicates there are in prefixes of permutations defined by SOPIs from .
The design has two parts:
-
•
Design a set .
-
•
For each , design a set .
Then, the designed set of SOPIs is the set
IV-A Interactions between B values
The design of set is based on interactions between different values. The following is at the heart of this analysis.
Definition IV.1
Let such that . For any pair of integers , we say matches with respect to if
∎
The importance of Definition IV.1 is that, for any , if the symbol ID at position in the permutation defined by is equal to the symbol ID at position in the permutation defined by and matches with respect to , then the symbol ID at position with respect to is equal to the symbol ID at position with respect to .
It is easy to verify that if matches with respect to and matches with respect to then
matches with respect to . For example, if matches with respect to then, for any integer ,
matches with respect to .
Let be an upper bound on the aggregate number of symbols downloaded by a client from prefixes of stream objects for an object. We impose the condition that
(12) |
Definition IV.2
The set is defined as
∎
Lemma IV.3
For any and , one of the following two possibilities holds:
-
1.
For all , , does not match with respect to . The distance between and is defined to be .
-
2.
There is a , such that matches with respect to and, for any and such that matches with respect to , is an integer multiple of . The distance between and is defined to be .
Proof:
If (1) holds the proof is complete. We need to show that if (1) doesn’t hold then (2) holds. If (1) doesn’t hold there is and such that matches with respect to and is a minimal pair, i.e., there is no and such that matches with respect to and , where .
Consider any with , such that matches with respect to . We want to show that is an integer multiple of . Note that matches with respect to and also matches with respect to .
For any , consider a pair of positions where the symbol ID at position of the permutation defined by is the same as the symbol ID at position of the permutation defined by . Then, the symbol ID at position of is the same as the symbol ID at position of and also the same as the symbol ID at position of , which implies that
(13) |
Since
and
it follows that Equation (13) holds if and only if
Thus, for some ,
Since is a minimal pair, . Then
matches with respect to , and and . Since and is a minimal pair, it follows that and thus (2) holds because is an integer. ∎∎
Lemma IV.4
Suppose for the distance between and is . Then, for any the number of distinct symbol IDs in the prefixes of the permutations defined by SOPIs
is at least
where is the total length of the pair of prefixes.
Proof:
If , or more generally one of the prefixes is of length zero, then the number of distinct symbol IDs is . Otherwise, neither prefix is of length zero. If there are no symbol IDs in common between the two prefixes then the number of distinct symbol IDs is .
Suppose there are symbol IDs in common between the two prefixes. Then there is a symbol ID at some position with respect to that is the same as a symbol ID at some position with respect to .
If (1) of Lemma IV.3 holds, i.e., there is no , such that matches with respect to , then the distance between and is and is the only pair of positions where the symbol IDs are the same among the two prefixes, and thus there are
distinct symbol IDs.
If (2) of Lemma IV.3 holds, i.e., there is a , such that matches with respect to and all other pairs that match with respect to , where , , are integer multiples of . Without loss of generality, and (the case where and is similar). Then the distance between and is . The following maximizes the number of symbol IDs that are duplicates between and :
-
•
The symbol ID in position of is the same as the symbol ID in position of .
-
•
Let . Then set and such that and . (The remaining length can be assigned to and arbitrarily to satisfy .)
It can be verified that the number of pairs of symbol IDs that are the same between the two prefixes is maximized, and that this number of pairs is
∎∎
Corollary IV.5
Suppose for the distance between and is at least for all where . Then, for any the number of distinct symbol IDs in the prefixes of the permutations defined by SOPIs
is at least
where is the total length of the prefixes.
Proof:
Let be the respective lengths of the prefixes, where . From Lemma IV.4, for each , , the number of symbol IDs that are the same between the permutation defined by the prefix of and is at most
Thus, the number of symbol IDs that are the same between all pairs of the prefixes is at most
(14) | |||
(15) |
∎∎
IV-B Constructing a SOPI set
The following algorithm constructs a set such that for , , the distance between and is at least :
-
•
Initialize , .
-
•
Repeat until .
-
–
Choose any
-
–
Delete from and add to .
-
–
For all , such that
-
*
Delete from .
-
*
-
–
Each time an element is added to , after accounting for some symmetries such as
the number of elements deleted from for each element added to is at most
Thus,
when the algorithm completes.
The following algorithm constructs a set for such that for , , the symbol IDs of prefixes of length up to of the permutations defined by SOPI and are all distinct. as follows:
-
•
Initialize
-
•
Choose randomly and add to .
-
•
For do
-
–
-
–
Add to .
-
–
Thus,
when the algorithm completes.
Overall, the set of SOPIs
is constructed. A SOPI can be assigned to an encoding node by choosing any that hasn’t been assigned so far. Overall,
SOPIs can be assigned.
IV-C Designed SOPI set examples
As an example, distance of at least between pairs of values might be sufficient. Lemma IV.4 guarantees that there are at least distinct symbol IDs among the prefixes of any pair of SOPIs with symbol IDs in total. The value of might be sufficient for practical use cases, where when . In this case, the number of possible SOPIs available is over billion.
As another example, distance of at least between pairs of values might be desirable. Corollary IV.5 guarantees that downloading from prefixes of up to stream objects with different SOPIs, where the total number of downloaded symbols is at least and at most , will have less than duplication in symbol IDs. In this case, the number of possible SOPIs available is over million.
V SOPI distribution design
It is not necessary that each encoding node is assigned a unique SOPI. To enjoy the SOPI design properties, it is sufficient that a client avoids downloading encoded data for an object from two encoding nodes with the same assigned SOPI, which the client can easily avoid no matter how the SOPIs are assigned to encoding nodes. For example, if a client is supplied with the same SOPI for two edge encoding nodes reachable over different interfaces, the client can simply send requests for encoded data to only one of the two interfaces, and thus avoid receiving duplicate encoded data. Of course, the benefits of the SOPI design are reduced in this case, since the client cannot effectively download encoded data from both edge encoding nodes for the same object.
Thus, one would like to distribute SOPIs to encoding nodes in such a way that it is unlikely that a client will receive the same SOPI from two different edge encoding nodes. One can conceptually create a graph of all encoding nodes in the internet, where there is an edge connecting two encoding nodes if it is possible that a client may download prefixes of stream objects from both encoding nodes for the same object. Then, one can think of each SOPI as being a different color. Assigning colors to the graph in such a way that no edge has the same color at both endpoints provides a valid assignment of SOPIs to encoding nodes.
It seems likely that such a graph can be colored using a small number of colors, e.g., less than , and perhaps much less than if some edges in the graph as allowed to have the same color. Thus, the number of SOPIs needed overall is likely to be rather small. This can ameliorate one of the possible concerns with the SOPI design, which is that some information about which clients are requesting which objects might be revealed by the requests for encoded data containing SOPIs percolating through the network: the amount of information revealed is minimized if the number of SOPIs distributed is small.
VI Large objects
If an object is not too large then it can be treated as a single source block, i.e., fountain encoding and fountain decoding can be applied to the entire object. However, it becomes inefficient to apply fountain encoding and fountain decoding to objects that are larger, since typically the object needs to fit into working memory. The required working memory is typically linear in the source block size, i.e., the working memory to recover a block with source symbols is typically proportional to the symbol size times .
Thus, for longer objects, it is useful to have an algorithm that automatically partitions the object into multiple source blocks. Such an algorithm is specified in [3] in Section 4.3. For LDN the following is a suitable simplification of that algorithm. Let the desired symbol size be , and let be the maximum source block size for which fountain encoding and fountain decoding is efficient, where . An object of size can be automatically partitioned into source blocks as follows:
-
•
(no. of source symbols in ).
-
•
(max no. symbols per source block).
-
•
(no. of source blocks to split into).
-
•
.
The output of Partition is , where , , , and . The function Partition partitions a block of source symbols into approximately equal-number of source symbols blocks. More specifically, it partitions into blocks, each with source symbols, and blocks, each with source symbols.
Given the symbol size , the maximum source block size , and the object size , an encoding node or client can determine the parameters as described above, where is the total number of source blocks. If then the object is considered as a single source block with source symbols.
If then object is partitioned into source blocks as follows:
-
•
The initial portion of object of size is partitioned into source blocks, each consisting of source symbols of size .
-
•
The remaining portion of object of size is partitioned into source blocks, each consisting of source symbols of size .
VI-A Large object extension of SOPI
We extend the definitions of SOPI and stream object to be applicable to large objects, i.e., objects that are partitioned into source blocks. The source block structure for a large object of size is determined as described in Section VI based on , symbol size and a maximum source block size .
A SOPI for the large object is of the form , where , and . and are used similarly to Subsection II to identify a symbol generated from a source block, and and are used to identify one of the source blocks. For convenience, the ranges of and are chosen to match the ranges of and , and thus large objects with up to source blocks can be supported. For any , let:
-
•
-
•
-
•
Then the symbol at position of the stream object identified by SOPI for object is the symbol identified by from the source block identified by .
Each SOPI defines a stream object , which is a permutation defined by of the possible symbols that can be generated for the source blocks of the object . The permutation has the property that each set of consecutive symbols of the stream object starting at a position that is a multiple of all have the same symbol ID but are from different source blocks, i.e., the consecutive symbols are from a cyclic shift of the source blocks.
The values of and should be chosen randomly chosen. This ensures that the source block sequence within each set of consecutive symbols of the stream object is a random cyclic shift, and that different pairs of cyclic shifts are random. This ensures that packet losses on average with respect to randomly chosen and affect each source block of a large object equally. This is true even if packet loss patterns depend on the position of the symbol within the stream object carried in a packet, as long as the packet loss does not depend on the values of and .
VI-B RaptorQ large object
For the RaptorQ code specified in [3], the maximum number of supported source symbols is , and thus the maximum supported source block size should satisfy , where is the symbol size. With bytes, i.e., around the typical IPv4 packet payload size, the maximum object size supported without partitioning the object into more than one source block is around Mbytes.
References
- [1] J. Byers, M. Luby. Liquid Data Networking. 7th ACM Conference on Information-Centric Networking (ICN ’20), Virtual Event, Canada. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3405656.3418710.
- [2] J. Byers, M. Luby, M. Mitzenmacher, A. Rege. A Digital Fountain Approach to Reliable Distribution of Bulk Data. Proc. of ACM SIGCOMM. Comput. Commun. Rev., vol. 23, no. 4, pp. 56-67, 1998.
- [3] M. Luby, A. Shokrollahi, M. Watson, T. Stockhammer, L. Minder. IETF RFC6330. RaptorQ Forward Error Correction Scheme for Object Delivery. Internet Engineering Task Force, August 2011.
- [4] A. Shokrollahi, M. Luby. Raptor Codes. Foundations and Trends in Communications and Information Theory, Now Publishers, vol. 6, no. 3-4, pp. 213-322, 2011.
- [5] L. Minder, M. Luby, P. Aggarwal. Codornices project website describing optimized implementation of RaptorQ https://www.codornices.info, 2020.