This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

SOPI design and analysis for LDN

Michael Luby The research described in this paper is supported by NSF EAGER 1936572. ICSI and BitRipple, Inc.
Abstract

Liquid Data Networking (LDN), introduced in [1], is an ICN architecture that is designed to enable the benefits of erasure-code enabled object delivery. A primary contribution of [1] is the introduction of SOPIs, which enables clients to concurrently download encoded data for the same object from multiple edge nodes, optimizes caching efficiency, and enables seamless mobility. This paper provides an enhanced design and analysis of SOPIs.

I Introduction

The paper [1], inspired by fountain codes [2], [3], [4], [5], introduces Liquid Data Networking (LDN), an ICN architecture that is designed to enable the benefits of erasure-code enabled object delivery. A primary contribution of [1] is the introduction of stream object permutation identifiers (SOPIs), which provides a simple and efficient download coordination mechanism: clients can concurrently download encoded data for the same object from multiple edge nodes, caching efficiency is optimized, and seamless mobility is enabled. This paper provides an enhanced design and analysis of SOPIs, and inherits the terminology and notation of [1].

II SOPI and Stream object

SOPIs and stream objects are fundamental to the design of LDN: they enable a diversity of encoded data to be available for download within the network for each object, while at the same time ensuring that different clients request the same encoded data for an object from the same neighboring encoding node. The SOPI design allows the client decision of which encoded data to request for an object to be simple, robust, and efficient.

An object is partitioned into KK source symbols, where KK is the object size divided by the symbol size, and the symbol size is typically chosen to fit into a packet payload. Stream objects can be described in terms of an erasure code with optimal recovery properties and with the property that NN encoded data symbols can be generated from any object, where NN is a large prime number, and in particular N>>KmaxN>>{K_{\mathrm{max}}}, where Kmax{K_{\mathrm{max}}} is the maximium number of source symbols in any object.

A stream object for an object consists of all of the possible encoded data that can be generated for the object in a specified order. The essential idea is that different stream objects specify completely different orderings of the available encoded data for an object, and thus a client can simply request prefixes of different stream objects for an object to receive non-overlapping encoded data.

A stream object permutation identifier (SOPI) specifies the ordering of encoded data symbols for a stream object. A SOPI  PP identifies a permutation π(P)\pi(P) of the NN available symbol IDs. Thus, a stream object identifier (D,P)(D,P) is the combination of the identifier DD of the object from which it is generated and a SOPI PP. We consider SOPIs of the form P=(A,B)P=(A,B), where A{0,1,2,,N1}A\in\{0,1,2,\ldots,N-1\}, B{1,2,3,,N1}B\in\{1,2,3,\ldots,N-1\}. Then, P=(A,B)P=(A,B) defines the permutation of symbol IDs

π(P)={A,A+B,A+2B,,A+(N1)B},\pi(P)=\{A,A+B,A+2\cdot B,\ldots,A+(N-1)\cdot B\},

where each term is taken mod NN, i.e.,

π(P)[i]=A+iBmodN\pi(P)[i]=A+i\cdot B\mod N

for any position i{0,,N1}i\in\{0,\ldots,N-1\}.

II-A RaptorQ SOPI implementation

For the implementation of the RaptorQ code [3] described at [5], there are 2312^{31} possible symbols of encoded data for an object. Since 23112^{31}-1 is a prime number (a Mersenne prime), N=2311N=2^{31}-1 can be used for the implementation [5] of the RaptorQ code.

Since NN is one less than a power of two, it is straightforward to compute C=A+iBmodNC=A+i\cdot B\mod N, e.g.,

  • Compute D=iBD=i\cdot B.

  • Let D0D_{0} be the 3131 least significant bits of DD.

  • Let D1D_{1} be the next 3131 bits of DD.

  • Let C=A+D0+D1C=A+D_{0}+D_{1}.

  • If C>NC>N then C=CNC=C-N.

  • If C>NC>N then C=CNC=C-N.

III Random SOPI sets

We analyze the properties of collections of SOPIs, where each SOPI in the collection is randomly and independently chosen.

Proposition III.1

For any pair of positions

i0{0,1,,N1},i1{0,1,,N1}{i0},i_{0}\in\{0,1,\ldots,N-1\},i_{1}\in\{0,1,\ldots,N-1\}-\{i_{0}\},

and for any pair of symbol IDs

j0{0,1,,N1},j1{0,1,,N1}{j0},j_{0}\in\{0,1,\ldots,N-1\},j_{1}\in\{0,1,\ldots,N-1\}-\{j_{0}\},

there is a unique P=(A,B)P=(A,B) such that

π(P)[i0]=j0 and π(P)[i1]=j1.\pi(P)[i_{0}]=j_{0}\mbox{ and }\pi(P)[i_{1}]=j_{1}.

Proposition III.1 implies that the pair of symbol IDs in any pair of positions within the permutation are random with respect to a random SOPI P=(A,B)P=(A,B).

Suppose an object DD is composed of a single source block with KK source symbols. A client will download encoded data from prefixes of multiple stream objects of an object DD in order to recover the object. We would like to show that a client doesn’t receive many duplicate symbols when downloading from multiple stream objects with randomly chosen SOPIs. Suppose the client downloads from some number ss of different stream objects, with corresponding randomly chosen SOPIs P0,P1,,Ps1P_{0},P_{1},\ldots,P_{s-1}. Suppose the client receives M0M_{0} symbols from the stream object (P0,D)(P_{0},D), M1M_{1} symbols from the stream object (P1,D)(P_{1},D), etc. where

i=0s1Mi=M.\sum_{i=0}^{s-1}M_{i}=M.

The expected number of distinct symbol IDs among MM is at least

M(M1)22N,M-\frac{(M-1)^{2}}{2\cdot N},

and thus if K22NK^{2}\leq 2\cdot N then

M=K(1+K2N)K+1M=K\cdot\left(1+\frac{K}{2\cdot N}\right)\leq K+1

symbols on average are enough to receive KK symbols with at least KK distinct symbol IDs. Theorem III.2 below provides bounds on the probability that MM received symbols have at least KK distinct symbol IDs.

Theorem III.2

Consider an object DD composed of a single source block with KK source symbols. Suppose at least

M=K1δM=\frac{K}{1-\delta}

symbols have been received in total from s1s\geq 1 stream objects

(P0,D),(P1,D),,(Ps1,D),(P_{0},D),(P_{1},D),\ldots,(P_{s-1},D),

where δ>0\delta>0 and M22NM^{2}\leq 2\cdot N. Then,

Pr[D not recoverable from the M symbols]1δ2N\Pr[D\mbox{ not recoverable from the $M$ symbols}]\leq\frac{1}{\delta^{2}\cdot N} (1)

with respect to the random variables

P0=(A0,B0),P1=(A1,B1),,Ps1=(As1,Bs1).P_{0}=(A_{0},B_{0}),P_{1}=(A_{1},B_{1}),\ldots,P_{s-1}=(A_{s-1},B_{s-1}).
Proof:

Let YY be the random variable that is the number of unique symbol IDs among MM received symbols for an object DD with respect to the random variables

P0=(A0,B0),P1=(A1,B1),,Ps1=(As1,Bs1).P_{0}=(A_{0},B_{0}),P_{1}=(A_{1},B_{1}),\ldots,P_{s-1}=(A_{s-1},B_{s-1}).

The left-hand side of Inequality (1) is equivalent to

Pr[Y<K]=Pr[Y<(1δ)M]\Pr[Y<K]=\Pr[Y<(1-\delta)\cdot M] (2)

and thus we need to prove that

Pr[Y<(1δ)M]1δ2N.\Pr[Y<(1-\delta)\cdot M]\leq\frac{1}{\delta^{2}\cdot N}. (3)

Each of the MM received symbols has a symbol position within the stream object from which it is received, and a symbol ID determined by the stream object SOPI and the symbol position within the stream object. We associate a unique index within {0,1,,M1}\{0,1,\ldots,M-1\} with each of the MM (stream object, symbol position) pairs of received symbols. For i{0,1,,M1},j{0,1,,M1}{i}i\in\{0,1,\ldots,M-1\},j\in\{0,1,\ldots,M-1\}-\{i\}, we let Xi,j=1X_{i,j}=1 if the symbol position within the stream object indexed by ii is mapped to the same symbol ID as the symbol position within the stream object indexed by jj, and Xi,j=0X_{i,j}=0 if the symbol position within the stream object indexed by ii is mapped to a different symbol ID than the symbol position within the stream object indexed by jj. Thus, Xi,j=1X_{i,j}=1 if and only if the symbol IDs mapped to by ii and jj are duplicates. Then,

Pr[Y<(1δ)M]\displaystyle\Pr[Y<(1-\delta)\cdot M] =Pr[MY>δM]\displaystyle=\Pr[M-Y>\delta\cdot M]
Pr[i,j>iXi,j>δM]\displaystyle\leq\Pr\left[\sum_{i,j>i}X_{i,j}>\delta\cdot M\right] (4)
=Pr[(i,j>iXi,j)2>δ2M2]\displaystyle=\Pr\left[\left(\sum_{i,j>i}X_{i,j}\right)^{2}>\delta^{2}\cdot M^{2}\right]
𝔼[(i,j>iXi,j)2]δ2M2,\displaystyle\leq\frac{\operatorname*{\mathbb{E}}\left[\left(\sum_{i,j>i}X_{i,j}\right)^{2}\right]}{\delta^{2}\cdot M^{2}}, (5)

where Inequality (4) follows since MYM-Y is the number of duplicate symbol IDs and if the symbol IDs mapped to by ii and jj are the same then Xi,j=1X_{i,j}=1, and Inequality (5) follows from Markov’s inequality.

Expanding out terms,

𝔼[(i,j>iXi,j)2]\displaystyle\operatorname*{\mathbb{E}}\left[\left(\sum_{i,j>i}X_{i,j}\right)^{2}\right] =𝔼[i,j>iXi,j2]\displaystyle=\operatorname*{\mathbb{E}}\left[\sum_{i,j>i}X_{i,j}^{2}\right] (6)
+𝔼[i,j>ii,j>i(i,j)(i,j)Xi,jXi,j]\displaystyle+\operatorname*{\mathbb{E}}\left[\sum_{i,j>i}\mathop{\sum_{i^{\prime},j^{\prime}>i^{\prime}}}_{(i^{\prime},j^{\prime})\not=(i,j)}X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right] (7)

If ii and jj index different symbol positions within the same stream object then 𝔼[Xi,j2]=0\operatorname*{\mathbb{E}}[X^{2}_{i,j}]=0 since the SOPI for the stream object defines a permutation of the symbol IDs and thus cannot map ii and jj to the same symbol ID. If ii and jj index symbol positions within two different stream objects then 𝔼[Xi,j2]=1N\operatorname*{\mathbb{E}}[X^{2}_{i,j}]=\frac{1}{N} since the SOPIs for the two different stream objects are chosen independently. Thus,

𝔼[i,j>iXi,j2](M2)NM22N.\operatorname*{\mathbb{E}}\left[\sum_{i,j>i}X_{i,j}^{2}\right]\leq\frac{\binom{M}{2}}{N}\leq\frac{M^{2}}{2\cdot N}. (8)

Similarly, if ii and jj index different symbol positions within the same stream object or ii^{\prime} and jj^{\prime} index different symbol positions within the same stream object then 𝔼[Xi,jXi,j]=0\operatorname*{\mathbb{E}}\left[X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]=0. If ii and ii^{\prime} index different symbol positions within one stream object and jj and jj^{\prime} index different symbols positions within a second stream object then 𝔼[Xi,jXi,j]=1(N1)N\operatorname*{\mathbb{E}}\left[X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]=\frac{1}{(N-1)\cdot N} by Proposition III.1. If ii and ii^{\prime} index different symbol positions within one stream object and jj and jj^{\prime} index the same symbol position within a second stream object then 𝔼[Xi,jXi,j]=0\operatorname*{\mathbb{E}}\left[X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]=0 since ii and ii^{\prime} cannot map to the same symbol ID. If ii and ii^{\prime} index different symbol positions within one stream object and jj indexes a symbol position in a second stream object and jj^{\prime} indexes a position within a third stream object then 𝔼[Xi,jXi,j]=1N2\operatorname*{\mathbb{E}}\left[X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]=\frac{1}{N^{2}}, since if jj and jj^{\prime} map to the same symbol ID then ii and ii^{\prime} cannot both map to that symbol ID, and if jj and jj^{\prime} map to different symbol IDs (with probability N1N\frac{N-1}{N}) then ii and jj map to the same symbol ID and ii^{\prime} and jj^{\prime} map to the same symbol ID with probability 1(N1)N\frac{1}{(N-1)\cdot N} from Proposition III.1. If ii, jj, ii^{\prime} and jj^{\prime} index symbol positions in four different stream objects then 𝔼[Xi,jXi,j]=1N2\operatorname*{\mathbb{E}}\left[X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]=\frac{1}{N^{2}}, since each of the four SOPIs are chosen randomly and independently of one another. All other cases are variants of these cases. Thus,

𝔼[i,j>ii,j>i(i,j)(i,j)Xi,jXi,j](M2)2(N1)NM44N2.\operatorname*{\mathbb{E}}\left[\sum_{i,j>i}\mathop{\sum_{i^{\prime},j^{\prime}>i^{\prime}}}_{(i^{\prime},j^{\prime})\not=(i,j)}X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]\leq\frac{\binom{M}{2}^{2}}{(N-1)\cdot N}\leq\frac{M^{4}}{4\cdot N^{2}}. (9)

From Inequalities (8) and (9) it follows that

𝔼[(i,j>iXi,j)2]\displaystyle\operatorname*{\mathbb{E}}\left[\left(\sum_{i,j>i}X_{i,j}\right)^{2}\right] M22N(1+M22N)\displaystyle\leq\frac{M^{2}}{2\cdot N}\cdot\left(1+\frac{M^{2}}{2\cdot N}\right) (10)
M2N,\displaystyle\leq\frac{M^{2}}{N}, (11)

where Inequality (11) follows since M22NM^{2}\leq 2\cdot N. Combining Inequality (11) with Inequality (5) proves Inequality (3). ∎∎

III-A Random SOPI set examples

For the RaptorQ code specified in [3], the maximum supported number of source symbols per source block is Kmax=56,403{K_{\mathrm{max}}}=56,403, and N=2311N=2^{31}-1 is a good choice as described in Section II-A. With M=65,535M=65,535, the condition M22NM^{2}\leq 2\cdot N is satisfied, and Kmax0.86M=(1δ)M{K_{\mathrm{max}}}\leq 0.86\cdot M=(1-\delta)\cdot M for δ=0.14\delta=0.14. Thus, Theorem III.2 holds for RaptorQ for all supported source block sizes and for any δ0.14\delta\leq 0.14.

On average at most a 0.000020.00002 fraction of the symbol IDs of received encoded data symbols from prefixes of stream objects will be duplicates when the total number of received symbols is up to MM. However, stronger bounds on the probability of the number of duplicates being above a certain bound are of importance,

For example, setting δ=0.01\delta=0.01, Theorem III.2 shows that an object DD composed of a single source block of KK source symbols can be recovered with probability at least

11δ2N=0.9999951-\frac{1}{\delta^{2}\cdot N}=0.999995

from M=K/0.991.01KM=K/0.99\approx 1.01\cdot K received symbols in total from different stream objects.

As another example, setting δ=0.1\delta=0.1, Theorem III.2 shows that an object DD composed of a single source block of KK source symbols can be recovered with probability at least

11δ2N=0.999999951-\frac{1}{\delta^{2}\cdot N}=0.99999995

from M=K/0.91.11KM=K/0.9\approx 1.11\cdot K received symbols in total from different stream objects.

The analysis of Theorem III.2 is not tight, and thus these bounds are conservative.

IV Designed SOPI sets

Although the analysis in Section III shows there is little chance of there being a significant fraction of duplicates among received symbols from multiple stream objects, there is still a chance that there is a significant fraction of duplicates. As an extreme example, if P0P_{0} and P1P_{1} are identical then the overlap between prefixes of length K/2K/2 is K/2K/2, i.e., all K/2K/2 symbols are duplicates. As a somewhat less extreme example, if P0=(0,1)P_{0}=(0,1) and P1=(A,1)P_{1}=(A,1) where AA is randomly chosen, then for prefixes of length K/2K/2 there is a 1K/N1-K/N chance there is no overlap between the prefixes, but with the remaining probability K/NK/N there are on average K/4K/4 duplicates.

In this section, we provide a deterministic design of a large set 𝒫\mathcal{P} of SOPIs for which we can provide guaranteed upper bounds on the number of symbol ID duplicates there are in prefixes of permutations defined by SOPIs from 𝒫\mathcal{P}.

The design has two parts:

  • Design a set {1,,N1}\mathcal{B}\subset\{1,\ldots,N-1\}.

  • For each BB\in\mathcal{B}, design a set 𝒜B{0,,N1}\mathcal{A}_{B}\subset\{0,\ldots,N-1\}.

Then, the designed set of SOPIs is the set

𝒫={P=(A,B):B,A𝒜B}.\mathcal{P}=\{P=(A,B):B\in\mathcal{B},A\in\mathcal{A}_{B}\}.

IV-A Interactions between B values

The design of set \mathcal{B} is based on interactions between different BB values. The following is at the heart of this analysis.

Definition IV.1

Let B0,B1{1,,N}B_{0},B_{1}\in\{1,\ldots,N\} such that B0B1B_{0}\not=B_{1}. For any pair of integers (d0,d1)(d_{0},d_{1}), we say (d0,d1)(d_{0},d_{1}) matches with respect to (B0,B1)(B_{0},B_{1}) if

d0B0modN=d1B1modN.d_{0}\cdot B_{0}\mod N=d_{1}\cdot B_{1}\mod N.

The importance of Definition IV.1 is that, for any A0,A1{0,,N}A_{0},A_{1}\in\{0,\ldots,N\}, if the symbol ID at position p0p_{0} in the permutation defined by P0=(A0,B0)P_{0}=(A_{0},B_{0}) is equal to the symbol ID at position p1p_{1} in the permutation defined by P1=(A1,B1)P_{1}=(A_{1},B_{1}) and (d0,d1)(d_{0},d_{1}) matches with respect to (B0,B1)(B_{0},B_{1}), then the symbol ID at position p0+d0modNp_{0}+d_{0}\mod N with respect to P0P_{0} is equal to the symbol ID at position p1+d1modNp_{1}+d_{1}\mod N with respect to P1P_{1}.

It is easy to verify that if (d0,d1)(d_{0},d_{1}) matches with respect to (B0,B1)(B_{0},B_{1}) and (d0,d1)(d^{\prime}_{0},d^{\prime}_{1}) matches with respect to (B0,B1)(B_{0},B_{1}) then

(d0,d1)+(d0,d1)=(d0+d0,d1+d1)(d_{0},d_{1})+(d^{\prime}_{0},d^{\prime}_{1})=(d_{0}+d^{\prime}_{0},d_{1}+d^{\prime}_{1})

matches with respect to (B0,B1)(B_{0},B_{1}). For example, if (d0,d1)(d_{0},d_{1}) matches with respect to (B0,B1)(B_{0},B_{1}) then, for any integer ii,

i(d0,d1)=(id0,id1)i\cdot(d_{0},d_{1})=(i\cdot d_{0},i\cdot d_{1})

matches with respect to (B0,B1)(B_{0},B_{1}).

Let MM be an upper bound on the aggregate number of symbols downloaded by a client from prefixes of stream objects for an object. We impose the condition that

M2<N/2.M^{2}<N/2. (12)
Definition IV.2

The set DD is defined as

D={M+1,,1}{1,,M1}.D=\{-M+1,\ldots,-1\}\cup\{1,\ldots,M-1\}.

Lemma IV.3

For any B0{1,,N}B_{0}\in\{1,\ldots,N\} and B1{1,,N}B_{1}\in\{1,\ldots,N\}, one of the following two possibilities holds:

  1. 1.

    For all d0Dd_{0}\in D, d1Dd_{1}\in D, (d0,d1)(d_{0},d_{1}) does not match with respect to (B0,B1)(B_{0},B_{1}). The distance between B0B_{0} and B1B_{1} is defined to be 2M2\cdot M.

  2. 2.

    There is a d0Dd_{0}\in D, d1Dd_{1}\in D such that (d0,d1)(d_{0},d_{1}) matches with respect to (B0,B1)(B_{0},B_{1}) and, for any d0Dd^{\prime}_{0}\in D and d1Dd^{\prime}_{1}\in D such that (d0,d1)(d^{\prime}_{0},d^{\prime}_{1}) matches with respect to (B0,B1)(B_{0},B_{1}), (d0,d1)(d^{\prime}_{0},d^{\prime}_{1}) is an integer multiple of (d0,d1)(d_{0},d_{1}). The distance between B0B_{0} and B1B_{1} is defined to be |d0|+|d1||d_{0}|+|d_{1}|.

Proof:

If (1) holds the proof is complete. We need to show that if (1) doesn’t hold then (2) holds. If (1) doesn’t hold there is d0Dd_{0}\in D and d1Dd_{1}\in D such that (d0,d1)(d_{0},d_{1}) matches with respect to (B0,B1)(B_{0},B_{1}) and (d0,d1)(d_{0},d_{1}) is a minimal pair, i.e., there is no d0Dd^{\prime}_{0}\in D and d1Dd^{\prime}_{1}\in D such that (d0,d1)(d^{\prime}_{0},d^{\prime}_{1}) matches with respect to (B0,B1)(B_{0},B_{1}) and (d0,d1)=c(d0,d1)(d^{\prime}_{0},d^{\prime}_{1})=c\cdot(d_{0},d_{1}), where 0<|c|<10<|c|<1.

Consider any (d0,d1)(d^{\prime}_{0},d^{\prime}_{1}) with d0Dd^{\prime}_{0}\in D, d1Dd^{\prime}_{1}\in D such that (d0,d1)(d^{\prime}_{0},d^{\prime}_{1}) matches with respect to (B0,B1)(B_{0},B_{1}). We want to show that (d0,d1)(d^{\prime}_{0},d^{\prime}_{1}) is an integer multiple of (d0,d1)(d_{0},d_{1}). Note that (d0d0,d0d1)(d^{\prime}_{0}\cdot d_{0},d^{\prime}_{0}\cdot d_{1}) matches with respect to (B0,B1)(B_{0},B_{1}) and also (d0d0,d1d0)(d^{\prime}_{0}\cdot d_{0},d^{\prime}_{1}\cdot d_{0}) matches with respect to (B0,B1)(B_{0},B_{1}).

For any A0,A1{0,,N}A_{0},A_{1}\in\{0,\ldots,N\}, consider a pair of positions (p0,p1)(p_{0},p_{1}) where the symbol ID at position p0p_{0} of the permutation defined by P0=(A0,B0)P_{0}=(A_{0},B_{0}) is the same as the symbol ID at position p1p_{1} of the permutation defined by P1=(A1,B1)P_{1}=(A_{1},B_{1}). Then, the symbol ID at position p0+d0d0modNp_{0}+d^{\prime}_{0}\cdot d_{0}\mod N of P0P_{0} is the same as the symbol ID at position p1+d0d1modNp_{1}+d^{\prime}_{0}\cdot d_{1}\mod N of P1P_{1} and also the same as the symbol ID at position p1+d1d0modNp_{1}+d^{\prime}_{1}\cdot d_{0}\mod N of P1P_{1}, which implies that

p1+d0d1=p1+d1d0modN.p_{1}+d^{\prime}_{0}\cdot d_{1}=p_{1}+d^{\prime}_{1}\cdot d_{0}\mod N. (13)

Since

N/2<M2d0d1M2<N/2-N/2<-M^{2}\leq d^{\prime}_{0}\cdot d_{1}\leq M^{2}<N/2

and

N/2<M2d1d0M2<N/2,-N/2<-M^{2}\leq d^{\prime}_{1}\cdot d_{0}\leq M^{2}<N/2,

it follows that Equation (13) holds if and only if

d0d1=d1d0.d^{\prime}_{0}\cdot d_{1}=d^{\prime}_{1}\cdot d_{0}.

Thus, for some c0c\not=0,

(d0,d1)=c(d0,d1).(d^{\prime}_{0},d^{\prime}_{1})=c\cdot(d_{0},d_{1}).

Since (d0,d1)(d_{0},d_{1}) is a minimal pair, |c|1|c|\geq 1. Then

(d0′′,d1′′)\displaystyle(d^{\prime\prime}_{0},d^{\prime\prime}_{1}) =(d0,d1)c(d0,d1)\displaystyle=(d^{\prime}_{0},d^{\prime}_{1})-\lfloor c\rfloor\cdot(d_{0},d_{1})
=(cc)(d0,d1)\displaystyle=(c-\lfloor c\rfloor)\cdot(d_{0},d_{1})

matches with respect to (B0,B1)(B_{0},B_{1}), and d0′′Dd^{\prime\prime}_{0}\in D and d1′′Dd^{\prime\prime}_{1}\in D. Since 0cc<10\leq c-\lfloor c\rfloor<1 and (d0,d1)(d_{0},d_{1}) is a minimal pair, it follows that cc=0c-\lfloor c\rfloor=0 and thus (2) holds because cc is an integer. ∎∎

Lemma IV.4

Suppose for B0,B1{1,,N}B_{0},B_{1}\in\{1,\ldots,N\} the distance between B0B_{0} and B1B_{1} is dd. Then, for any A0,A1{0,,N}A_{0},A_{1}\in\{0,\ldots,N\} the number of distinct symbol IDs in the prefixes of the permutations defined by SOPIs

P0=(A0,B0),P1=(A1,B1),P_{0}=(A_{0},B_{0}),P_{1}=(A_{1},B_{1}),

is at least

mm2d1,m-\left\lfloor\frac{m-2}{d}\right\rfloor-1,

where mMm\leq M is the total length of the pair of prefixes.

Proof:

If m=1m=1, or more generally one of the prefixes is of length zero, then the number of distinct symbol IDs is mm. Otherwise, neither prefix is of length zero. If there are no symbol IDs in common between the two prefixes then the number of distinct symbol IDs is mm.

Suppose there are symbol IDs in common between the two prefixes. Then there is a symbol ID at some position p0p_{0} with respect to P0P_{0} that is the same as a symbol ID at some position p1p_{1} with respect to P1P_{1}.

If (1) of Lemma IV.3 holds, i.e., there is no d0Dd_{0}\in D, d1Dd_{1}\in D such that (d0,d1)(d_{0},d_{1}) matches with respect to (B0,B1)(B_{0},B_{1}), then the distance between B0B_{0} and B1B_{1} is d=2Md=2\cdot M and (p0,p1)(p_{0},p_{1}) is the only pair of positions where the symbol IDs are the same among the two prefixes, and thus there are

m1=mm22M1m-1=m-\left\lfloor\frac{m-2}{2\cdot M}\right\rfloor-1

distinct symbol IDs.

If (2) of Lemma IV.3 holds, i.e., there is a d0Dd_{0}\in D, d1Dd_{1}\in D such that (d0,d1)(d_{0},d_{1}) matches with respect to (B0,B1)(B_{0},B_{1}) and all other pairs (d0,d1)(d^{\prime}_{0},d^{\prime}_{1}) that match with respect to (B0,B1)(B_{0},B_{1}), where d0Dd^{\prime}_{0}\in D, d1Dd^{\prime}_{1}\in D, are integer multiples of (d0,d1)(d_{0},d_{1}). Without loss of generality, d0>0d_{0}>0 and d1>0d_{1}>0 (the case where d0>0d_{0}>0 and d1<0d_{1}<0 is similar). Then the distance between B0B_{0} and B1B_{1} is d=d0+d1d=d_{0}+d_{1}. The following maximizes the number of symbol IDs that are duplicates between P0P_{0} and P1P_{1}:

  • The symbol ID in position 0 of P0P_{0} is the same as the symbol ID in position 0 of P1P_{1}.

  • Let i=m2di=\lfloor\frac{m-2}{d}\rfloor. Then set m0m_{0} and m1m_{1} such that m0id0+1m_{0}\geq i\cdot d_{0}+1 and m1id1+1m_{1}\geq i\cdot d_{1}+1. (The remaining r=mid2r=m-i\cdot d-2 length can be assigned to m0m_{0} and m1m_{1} arbitrarily to satisfy m0+m1=mm_{0}+m_{1}=m.)

It can be verified that the number of pairs of symbol IDs that are the same between the two prefixes is maximized, and that this number of pairs is

i+1=m2d+1.i+1=\left\lfloor\frac{m-2}{d}\right\rfloor+1.

∎∎

Corollary IV.5

Suppose for B0,B1,,Bs1{1,,N}B_{0},B_{1},\ldots,B_{s-1}\in\{1,\ldots,N\} the distance between BiB_{i} and BjB_{j} is at least dd for all i,j{0,,s1}i,j\in\{0,\ldots,s-1\} where iji\not=j. Then, for any A0,A1,,As1{0,,N}A_{0},A_{1},\ldots,A_{s-1}\in\{0,\ldots,N\} the number of distinct symbol IDs in the prefixes of the permutations defined by SOPIs

P0=(A0,B0),P1=(A1,B1),,Ps1=(As1,Bs1),P_{0}=(A_{0},B_{0}),P_{1}=(A_{1},B_{1}),\ldots,P_{s-1}=(A_{s-1},B_{s-1}),

is at least

m(s1)(msd+s2),m-(s-1)\cdot\left(\frac{m-s}{d}+\frac{s}{2}\right),

where mMm\leq M is the total length of the ss prefixes.

Proof:

Let m0,m1,,ms1m_{0},m_{1},\ldots,m_{s-1} be the respective lengths of the ss prefixes, where m=i=0s1mim=\sum_{i=0}^{s-1}m_{i}. From Lemma IV.4, for each i{0,,s1}i\in\{0,\ldots,s-1\}, j{0,,i1}j\in\{0,\ldots,i-1\}, the number of symbol IDs that are the same between the permutation defined by the prefix of PiP_{i} and PjP_{j} is at most

(mi1)+(mj1)d+1.\frac{(m_{i}-1)+(m_{j}-1)}{d}+1.

Thus, the number of symbol IDs that are the same between all pairs of the prefixes is at most

(i=0s1(s1)(mi1)d)+(s1)s2\displaystyle\left(\sum_{i=0}^{s-1}\frac{(s-1)\cdot(m_{i}-1)}{d}\right)+\frac{(s-1)\cdot s}{2} (14)
(s1)(msd+s2).\displaystyle\leq(s-1)\cdot\left(\frac{m-s}{d}+\frac{s}{2}\right). (15)

∎∎

Lemma IV.4 and Corollary IV.5 provide worst case lower bounds on the total number of distinct symbol IDs in prefixes. The actual number of distinct symbol IDs in practice are much closer to the total size mm of the prefixes than to the worst case lower bounds.

IV-B Constructing a SOPI set

The following algorithm constructs a set \mathcal{B} such that for B0B_{0}\in\mathcal{B}, B1{B0}B_{1}\in\mathcal{B}-\{B_{0}\}, the distance between B0B_{0} and B1B_{1} is at least dd:

  • Initialize =\mathcal{B}=\emptyset, ={1,,N1}\mathcal{B}^{\prime}=\{1,\ldots,N-1\}.

  • Repeat until =\mathcal{B}^{\prime}=\emptyset.

    • Choose any BB\in\mathcal{B}^{\prime}

    • Delete BB from \mathcal{B}^{\prime} and add BB to \mathcal{B}.

    • For all iDi\in D, jDj\in D such that |i|+|j|<d|i|+|j|<d

      • *

        Delete B=iBj1modNB^{\prime}=i\cdot B\cdot j^{-1}\mod N from \mathcal{B}^{\prime}.

Each time an element is added to \mathcal{B}, after accounting for some symmetries such as

iBj1modN=iB(j)1modN,i\cdot B\cdot j^{-1}\mod N=-i\cdot B\cdot(-j)^{-1}\mod N,

the number of elements deleted from \mathcal{B}^{\prime} for each element added to \mathcal{B} is at most

(d2)(d3)+1d2.(d-2)\cdot(d-3)+1\leq d^{2}.

Thus,

||N1d2|\mathcal{B}|\geq\frac{N-1}{d^{2}}

when the algorithm completes.

The following algorithm constructs a set 𝒜B\mathcal{A}_{B} for BB\in\mathcal{B} such that for A0𝒜BA_{0}\in\mathcal{A}_{B}, A1𝒜B{A0}A_{1}\in\mathcal{A}_{B}-\{A_{0}\}, the symbol IDs of prefixes of length up to MM of the permutations defined by SOPI P0=(A0,B)P_{0}=(A_{0},B) and P1=(A1,B)P_{1}=(A_{1},B) are all distinct. as follows:

  • Initialize 𝒜B=\mathcal{A}_{B}=\emptyset

  • Choose AA randomly and add AA to 𝒜B\mathcal{A}_{B}.

  • For i=1,,N/M1i=1,\ldots,\lfloor N/M\rfloor-1 do

    • A=A+MBmodNA=A+M\cdot B\mod N

    • Add AA to 𝒜B\mathcal{A}_{B}.

Thus,

|𝒜B|N/M1|\mathcal{A}_{B}|\geq N/M-1

when the algorithm completes.

Overall, the set of SOPIs

𝒫={P=(A,B):B,A𝒜B}\mathcal{P}=\{P=(A,B):B\in\mathcal{B},A\in\mathcal{A}_{B}\}

is constructed. A SOPI can be assigned to an encoding node by choosing any P=(A,B)𝒫P=(A,B)\in\mathcal{P} that hasn’t been assigned so far. Overall,

|𝒫|N2d2M|\mathcal{P}|\geq\frac{N^{2}}{d^{2}\cdot M}

SOPIs can be assigned.

IV-C Designed SOPI set examples

As an example, distance of at least 101101 between pairs of BB values might be sufficient. Lemma IV.4 guarantees that there are at least KK distinct symbol IDs among the prefixes of any pair of SOPIs with M=1.01K+1M=1.01\cdot K+1 symbol IDs in total. The value of M=30,000M=30,000 might be sufficient for practical use cases, where M2N/2M^{2}\leq N/2 when N=2311N=2^{31}-1. In this case, the number of possible SOPIs available is over 1515 billion.

As another example, distance of at least 1,0001,000 between pairs of BB values might be desirable. Corollary IV.5 guarantees that downloading from prefixes of up to 1010 stream objects with different SOPIs, where the total number of downloaded symbols is at least 10,00010,000 and at most 30,00030,000, will have less than 1.5%1.5\% duplication in symbol IDs. In this case, the number of possible SOPIs available is over 150150 million.

V SOPI distribution design

It is not necessary that each encoding node is assigned a unique SOPI. To enjoy the SOPI design properties, it is sufficient that a client avoids downloading encoded data for an object from two encoding nodes with the same assigned SOPI, which the client can easily avoid no matter how the SOPIs are assigned to encoding nodes. For example, if a client is supplied with the same SOPI for two edge encoding nodes reachable over different interfaces, the client can simply send requests for encoded data to only one of the two interfaces, and thus avoid receiving duplicate encoded data. Of course, the benefits of the SOPI design are reduced in this case, since the client cannot effectively download encoded data from both edge encoding nodes for the same object.

Thus, one would like to distribute SOPIs to encoding nodes in such a way that it is unlikely that a client will receive the same SOPI from two different edge encoding nodes. One can conceptually create a graph of all encoding nodes in the internet, where there is an edge connecting two encoding nodes if it is possible that a client may download prefixes of stream objects from both encoding nodes for the same object. Then, one can think of each SOPI P𝒫P\in\mathcal{P} as being a different color. Assigning colors to the graph in such a way that no edge has the same color at both endpoints provides a valid assignment of SOPIs to encoding nodes.

It seems likely that such a graph can be colored using a small number of colors, e.g., less than 100100, and perhaps much less than 100100 if some edges in the graph as allowed to have the same color. Thus, the number of SOPIs needed overall is likely to be rather small. This can ameliorate one of the possible concerns with the SOPI design, which is that some information about which clients are requesting which objects might be revealed by the requests for encoded data containing SOPIs percolating through the network: the amount of information revealed is minimized if the number of SOPIs distributed is small.

VI Large objects

If an object is not too large then it can be treated as a single source block, i.e., fountain encoding and fountain decoding can be applied to the entire object. However, it becomes inefficient to apply fountain encoding and fountain decoding to objects that are larger, since typically the object needs to fit into working memory. The required working memory is typically linear in the source block size, i.e., the working memory to recover a block with KK source symbols is typically proportional to the symbol size times KK.

Thus, for longer objects, it is useful to have an algorithm that automatically partitions the object into multiple source blocks. Such an algorithm is specified in [3] in Section 4.3. For LDN  the following is a suitable simplification of that algorithm. Let the desired symbol size be TT, and let WSWS be the maximum source block size for which fountain encoding and fountain decoding is efficient, where WS/T56,403\lfloor WS/T\rfloor\leq 56,403. An object DD of size FF can be automatically partitioned into source blocks as follows:

  • Kt=F/TKt=\lceil F/T\rceil (no. of source symbols in DD).

  • Kmax=WS/T{K_{\mathrm{max}}}=\lfloor WS/T\rfloor (max no. symbols per source block).

  • Z=Kt/KmaxZ=\lceil Kt/{K_{\mathrm{max}}}\rceil (no. of source blocks to split DD into).

  • (KL,KS,ZL,ZS)=Partition[Kt,Z](KL,KS,ZL,ZS)=\mbox{Partition}[Kt,Z].

The output of Partition[Kt,Z][Kt,Z] is (KL,KS,ZL,ZS)(KL,KS,ZL,ZS), where KL=Kt/ZKL=\lceil Kt/Z\rceil, KS=Kt/ZKS=\lfloor Kt/Z\rfloor, ZL=KtKSZZL=Kt-KS\cdot Z, and ZS=ZZLZS=Z-ZL. The function Partition[Kt,Z][Kt,Z] partitions a block of KtKt source symbols into ZZ approximately equal-number of source symbols blocks. More specifically, it partitions KtKt into ZLZL blocks, each with KLKL source symbols, and ZSZS blocks, each with KSKS source symbols.

Given the symbol size TT, the maximum source block size WSWS, and the object size FF, an encoding node or client can determine the parameters (KL,KS,ZL,ZS)(KL,KS,ZL,ZS) as described above, where Z=ZL+ZSZ=ZL+ZS is the total number of source blocks. If Z=1Z=1 then the object is considered as a single source block with KL=K=F/TKL=K=\lceil F/T\rceil source symbols.

If Z>1Z>1 then object DD is partitioned into ZZ source blocks as follows:

  • The initial portion of object DD of size ZLKLTZL\cdot KL\cdot T is partitioned into ZLZL source blocks, each consisting of KLKL source symbols of size TT.

  • The remaining portion of object DD of size FZLKLTF-ZL\cdot KL\cdot T is partitioned into ZSZS source blocks, each consisting of KSKS source symbols of size TT.

VI-A Large object extension of SOPI

We extend the definitions of SOPI and stream object to be applicable to large objects, i.e., objects that are partitioned into Z>1Z>1 source blocks. The source block structure for a large object DD of size FF is determined as described in Section VI based on FF, symbol size TT and a maximum source block size WSWS.

A SOPI for the large object DD is of the form P=(A,B,C,D)P=(A,B,C,D), where A{0,1,,N1},A\in\{0,1,\ldots,N-1\}, B{1,2,,N1}B\in\{1,2,\ldots,N-1\}, C{0,1,,N1}C\in\{0,1,\ldots,N-1\} and D{1,2,,N1}D\in\{1,2,\ldots,N-1\}. AA and BB are used similarly to Subsection II to identify a symbol generated from a source block, and CC and DD are used to identify one of the ZZ source blocks. For convenience, the ranges of CC and DD are chosen to match the ranges of AA and BB, and thus large objects with up to N1N-1 source blocks can be supported. For any i{0,1,,Z(N1)}i\in\{0,1,\ldots,Z\cdot(N-1)\}, let:

  • r=i/Zr=\lfloor i/Z\rfloor

  • j=(A+rB)modNj=(A+r\cdot B)\mod N

  • j=(i+C+rD)modZj^{\prime}=(i+C+r\cdot D)\mod Z

Then the symbol at position ii of the stream object identified by SOPI PP for object DD is the symbol identified by jj from the source block identified by jj^{\prime}.

Each SOPI PP defines a stream object (P,D)(P,D), which is a permutation defined by PP of the ZNZ\cdot N possible symbols that can be generated for the ZZ source blocks of the object DD. The permutation PP has the property that each set of ZZ consecutive symbols of the stream object starting at a position that is a multiple of ZZ all have the same symbol ID but are from different source blocks, i.e., the ZZ consecutive symbols are from a cyclic shift of the ZZ source blocks.

The values of CC and DD should be chosen randomly chosen. This ensures that the source block sequence within each set of ZZ consecutive symbols of the stream object is a random cyclic shift, and that different pairs of cyclic shifts are random. This ensures that packet losses on average with respect to randomly chosen CC and DD affect each source block of a large object equally. This is true even if packet loss patterns depend on the position of the symbol within the stream object carried in a packet, as long as the packet loss does not depend on the values of CC and DD.

VI-B RaptorQ large object

For the RaptorQ code specified in [3], the maximum number of supported source symbols is 56,40356,403, and thus the maximum supported source block size WSWS should satisfy WS/T56,403\lfloor WS/T\rfloor\leq 56,403, where TT is the symbol size. With T=1400T=1400 bytes, i.e., around the typical IPv4 packet payload size, the maximum object size supported without partitioning the object into more than one source block is around 8080 Mbytes.

For RaptorQ [3], N=2311N=2^{31}-1 is a good choice, and thus up to 23112^{31}-1 source blocks can be supported with the design described in Section VI-A. With TT set to the maximum supported symbol size of 216=65,5362^{16}=65,536 bytes, the maximum size object supported is around 810188\cdot 10^{18} bytes, or around 88 Exabytes.

References

  • [1] J. Byers, M. Luby. Liquid Data Networking. 7th ACM Conference on Information-Centric Networking (ICN ’20), Virtual Event, Canada. ACM, New York, NY, USA, 7 pages. https://doi.org/10.1145/3405656.3418710.
  • [2] J. Byers, M. Luby, M. Mitzenmacher, A. Rege. A Digital Fountain Approach to Reliable Distribution of Bulk Data. Proc. of ACM SIGCOMM. Comput. Commun. Rev., vol. 23, no. 4, pp. 56-67, 1998.
  • [3] M. Luby, A. Shokrollahi, M. Watson, T. Stockhammer, L. Minder. IETF RFC6330. RaptorQ Forward Error Correction Scheme for Object Delivery. Internet Engineering Task Force, August 2011.
  • [4] A. Shokrollahi, M. Luby. Raptor Codes. Foundations and Trends in Communications and Information Theory, Now Publishers, vol. 6, no. 3-4, pp. 213-322, 2011.
  • [5] L. Minder, M. Luby, P. Aggarwal. Codornices project website describing optimized implementation of RaptorQ https://www.codornices.info, 2020.