SOPI design and analysis for LDN

I Introduction

The paper [1], inspired by fountain codes [2], [3], [4], [5], introduces Liquid Data Networking (LDN), an ICN architecture that is designed to enable the benefits of erasure-code enabled object delivery. A primary contribution of [1] is the introduction of stream object permutation identifiers (SOPIs), which provides a simple and efficient download coordination mechanism: clients can concurrently download encoded data for the same object from multiple edge nodes, caching efficiency is optimized, and seamless mobility is enabled. This paper provides an enhanced design and analysis of SOPIs, and inherits the terminology and notation of [1].

II SOPI and Stream object

SOPIs and stream objects are fundamental to the design of LDN: they enable a diversity of encoded data to be available for download within the network for each object, while at the same time ensuring that different clients request the same encoded data for an object from the same neighboring encoding node. The SOPI design allows the client decision of which encoded data to request for an object to be simple, robust, and efficient.

An object is partitioned into $K$ source symbols, where $K$ is the object size divided by the symbol size, and the symbol size is typically chosen to fit into a packet payload. Stream objects can be described in terms of an erasure code with optimal recovery properties and with the property that $N$ encoded data symbols can be generated from any object, where $N$ is a large prime number, and in particular $N>>{K_{\mathrm{max}}}$ , where ${K_{\mathrm{max}}}$ is the maximium number of source symbols in any object.

A stream object for an object consists of all of the possible encoded data that can be generated for the object in a specified order. The essential idea is that different stream objects specify completely different orderings of the available encoded data for an object, and thus a client can simply request prefixes of different stream objects for an object to receive non-overlapping encoded data.

A stream object permutation identifier (SOPI) specifies the ordering of encoded data symbols for a stream object. A SOPI $P$ identifies a permutation $\pi(P)$ of the $N$ available symbol IDs. Thus, a stream object identifier $(D,P)$ is the combination of the identifier $D$ of the object from which it is generated and a SOPI $P$ . We consider SOPIs of the form $P=(A,B)$ , where $A\in\{0,1,2,\ldots,N-1\}$ , $B\in\{1,2,3,\ldots,N-1\}$ . Then, $P=(A,B)$ defines the permutation of symbol IDs

\pi(P)=\{A,A+B,A+2\cdot B,\ldots,A+(N-1)\cdot B\},

where each term is taken mod $N$ , i.e.,

\pi(P)[i]=A+i\cdot B\mod N

for any position $i\in\{0,\ldots,N-1\}$ .

II-A RaptorQ SOPI implementation

For the implementation of the RaptorQ code [3] described at [5], there are $2^{31}$ possible symbols of encoded data for an object. Since $2^{31}-1$ is a prime number (a Mersenne prime), $N=2^{31}-1$ can be used for the implementation [5] of the RaptorQ code.

Since $N$ is one less than a power of two, it is straightforward to compute $C=A+i\cdot B\mod N$ , e.g.,

•

Compute $D=i\cdot B$ .
•

Let $D_{0}$ be the $31$ least significant bits of $D$ .
•

Let $D_{1}$ be the next $31$ bits of $D$ .
•

Let $C=A+D_{0}+D_{1}$ .
•

If $C>N$ then $C=C-N$ .
•

If $C>N$ then $C=C-N$ .

III Random SOPI sets

We analyze the properties of collections of SOPIs, where each SOPI in the collection is randomly and independently chosen.

Proposition III.1

For any pair of positions

i_{0}\in\{0,1,\ldots,N-1\},i_{1}\in\{0,1,\ldots,N-1\}-\{i_{0}\},

and for any pair of symbol IDs

j_{0}\in\{0,1,\ldots,N-1\},j_{1}\in\{0,1,\ldots,N-1\}-\{j_{0}\},

there is a unique $P=(A,B)$ such that

\pi(P)[i_{0}]=j_{0}\mbox{ and }\pi(P)[i_{1}]=j_{1}.

∎

Proposition III.1 implies that the pair of symbol IDs in any pair of positions within the permutation are random with respect to a random SOPI $P=(A,B)$ .

Suppose an object $D$ is composed of a single source block with $K$ source symbols. A client will download encoded data from prefixes of multiple stream objects of an object $D$ in order to recover the object. We would like to show that a client doesn’t receive many duplicate symbols when downloading from multiple stream objects with randomly chosen SOPIs. Suppose the client downloads from some number $s$ of different stream objects, with corresponding randomly chosen SOPIs $P_{0},P_{1},\ldots,P_{s-1}$ . Suppose the client receives $M_{0}$ symbols from the stream object $(P_{0},D)$ , $M_{1}$ symbols from the stream object $(P_{1},D)$ , etc. where

\sum_{i=0}^{s-1}M_{i}=M.

The expected number of distinct symbol IDs among $M$ is at least

M-\frac{(M-1)^{2}}{2\cdot N},

and thus if $K^{2}\leq 2\cdot N$ then

M=K\cdot\left(1+\frac{K}{2\cdot N}\right)\leq K+1

symbols on average are enough to receive $K$ symbols with at least $K$ distinct symbol IDs. Theorem III.2 below provides bounds on the probability that $M$ received symbols have at least $K$ distinct symbol IDs.

Theorem III.2

Consider an object $D$ composed of a single source block with $K$ source symbols. Suppose at least

M=\frac{K}{1-\delta}

symbols have been received in total from $s\geq 1$ stream objects

(P_{0},D),(P_{1},D),\ldots,(P_{s-1},D),

where $\delta>0$ and $M^{2}\leq 2\cdot N$ . Then,

\Pr[D\mbox{ not recoverable from the $M$ symbols}]\leq\frac{1}{\delta^{2}\cdot N}

(1)

with respect to the random variables

P_{0}=(A_{0},B_{0}),P_{1}=(A_{1},B_{1}),\ldots,P_{s-1}=(A_{s-1},B_{s-1}).

Proof:

Let $Y$ be the random variable that is the number of unique symbol IDs among $M$ received symbols for an object $D$ with respect to the random variables

P_{0}=(A_{0},B_{0}),P_{1}=(A_{1},B_{1}),\ldots,P_{s-1}=(A_{s-1},B_{s-1}).

The left-hand side of Inequality (1) is equivalent to

\Pr[Y<K]=\Pr[Y<(1-\delta)\cdot M]

(2)

and thus we need to prove that

\Pr[Y<(1-\delta)\cdot M]\leq\frac{1}{\delta^{2}\cdot N}.

(3)

Each of the $M$ received symbols has a symbol position within the stream object from which it is received, and a symbol ID determined by the stream object SOPI and the symbol position within the stream object. We associate a unique index within $\{0,1,\ldots,M-1\}$ with each of the $M$ (stream object, symbol position) pairs of received symbols. For $i\in\{0,1,\ldots,M-1\},j\in\{0,1,\ldots,M-1\}-\{i\}$ , we let $X_{i,j}=1$ if the symbol position within the stream object indexed by $i$ is mapped to the same symbol ID as the symbol position within the stream object indexed by $j$ , and $X_{i,j}=0$ if the symbol position within the stream object indexed by $i$ is mapped to a different symbol ID than the symbol position within the stream object indexed by $j$ . Thus, $X_{i,j}=1$ if and only if the symbol IDs mapped to by $i$ and $j$ are duplicates. Then,

$\displaystyle\Pr[Y<(1-\delta)\cdot M]$	$\displaystyle=\Pr[M-Y>\delta\cdot M]$
	$\displaystyle\leq\Pr\left[\sum_{i,j>i}X_{i,j}>\delta\cdot M\right]$	(4)
	$\displaystyle=\Pr\left[\left(\sum_{i,j>i}X_{i,j}\right)^{2}>\delta^{2}\cdot M^{2}\right]$
	$\displaystyle\leq\frac{\operatorname*{\mathbb{E}}\left[\left(\sum_{i,j>i}X_{i,j}\right)^{2}\right]}{\delta^{2}\cdot M^{2}},$	(5)

where Inequality (4) follows since $M-Y$ is the number of duplicate symbol IDs and if the symbol IDs mapped to by $i$ and $j$ are the same then $X_{i,j}=1$ , and Inequality (5) follows from Markov’s inequality.

Expanding out terms,

	$\displaystyle\operatorname*{\mathbb{E}}\left[\left(\sum_{i,j>i}X_{i,j}\right)^{2}\right]$	$\displaystyle=\operatorname*{\mathbb{E}}\left[\sum_{i,j>i}X_{i,j}^{2}\right]$		(6)
		$\displaystyle+\operatorname*{\mathbb{E}}\left[\sum_{i,j>i}\mathop{\sum_{i^{\prime},j^{\prime}>i^{\prime}}}_{(i^{\prime},j^{\prime})\not=(i,j)}X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]$		(7)

If $i$ and $j$ index different symbol positions within the same stream object then $\operatorname*{\mathbb{E}}[X^{2}_{i,j}]=0$ since the SOPI for the stream object defines a permutation of the symbol IDs and thus cannot map $i$ and $j$ to the same symbol ID. If $i$ and $j$ index symbol positions within two different stream objects then $\operatorname*{\mathbb{E}}[X^{2}_{i,j}]=\frac{1}{N}$ since the SOPIs for the two different stream objects are chosen independently. Thus,

\operatorname*{\mathbb{E}}\left[\sum_{i,j>i}X_{i,j}^{2}\right]\leq\frac{\binom{M}{2}}{N}\leq\frac{M^{2}}{2\cdot N}.

(8)

Similarly, if $i$ and $j$ index different symbol positions within the same stream object or $i^{\prime}$ and $j^{\prime}$ index different symbol positions within the same stream object then $\operatorname*{\mathbb{E}}\left[X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]=0$ . If $i$ and $i^{\prime}$ index different symbol positions within one stream object and $j$ and $j^{\prime}$ index different symbols positions within a second stream object then $\operatorname*{\mathbb{E}}\left[X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]=\frac{1}{(N-1)\cdot N}$ by Proposition III.1. If $i$ and $i^{\prime}$ index different symbol positions within one stream object and $j$ and $j^{\prime}$ index the same symbol position within a second stream object then $\operatorname*{\mathbb{E}}\left[X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]=0$ since $i$ and $i^{\prime}$ cannot map to the same symbol ID. If $i$ and $i^{\prime}$ index different symbol positions within one stream object and $j$ indexes a symbol position in a second stream object and $j^{\prime}$ indexes a position within a third stream object then $\operatorname*{\mathbb{E}}\left[X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]=\frac{1}{N^{2}}$ , since if $j$ and $j^{\prime}$ map to the same symbol ID then $i$ and $i^{\prime}$ cannot both map to that symbol ID, and if $j$ and $j^{\prime}$ map to different symbol IDs (with probability $\frac{N-1}{N}$ ) then $i$ and $j$ map to the same symbol ID and $i^{\prime}$ and $j^{\prime}$ map to the same symbol ID with probability $\frac{1}{(N-1)\cdot N}$ from Proposition III.1. If $i$ , $j$ , $i^{\prime}$ and $j^{\prime}$ index symbol positions in four different stream objects then $\operatorname*{\mathbb{E}}\left[X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]=\frac{1}{N^{2}}$ , since each of the four SOPIs are chosen randomly and independently of one another. All other cases are variants of these cases. Thus,

\operatorname*{\mathbb{E}}\left[\sum_{i,j>i}\mathop{\sum_{i^{\prime},j^{\prime}>i^{\prime}}}_{(i^{\prime},j^{\prime})\not=(i,j)}X_{i,j}\cdot X_{i^{\prime},j^{\prime}}\right]\leq\frac{\binom{M}{2}^{2}}{(N-1)\cdot N}\leq\frac{M^{4}}{4\cdot N^{2}}.

(9)

From Inequalities (8) and (9) it follows that

	$\displaystyle\operatorname*{\mathbb{E}}\left[\left(\sum_{i,j>i}X_{i,j}\right)^{2}\right]$	$\displaystyle\leq\frac{M^{2}}{2\cdot N}\cdot\left(1+\frac{M^{2}}{2\cdot N}\right)$		(10)
		$\displaystyle\leq\frac{M^{2}}{N},$		(11)

where Inequality (11) follows since $M^{2}\leq 2\cdot N$ . Combining Inequality (11) with Inequality (5) proves Inequality (3). ∎∎

III-A Random SOPI set examples

For the RaptorQ code specified in [3], the maximum supported number of source symbols per source block is ${K_{\mathrm{max}}}=56,403$ , and $N=2^{31}-1$ is a good choice as described in Section II-A. With $M=65,535$ , the condition $M^{2}\leq 2\cdot N$ is satisfied, and ${K_{\mathrm{max}}}\leq 0.86\cdot M=(1-\delta)\cdot M$ for $\delta=0.14$ . Thus, Theorem III.2 holds for RaptorQ for all supported source block sizes and for any $\delta\leq 0.14$ .

On average at most a $0.00002$ fraction of the symbol IDs of received encoded data symbols from prefixes of stream objects will be duplicates when the total number of received symbols is up to $M$ . However, stronger bounds on the probability of the number of duplicates being above a certain bound are of importance,

For example, setting $\delta=0.01$ , Theorem III.2 shows that an object $D$ composed of a single source block of $K$ source symbols can be recovered with probability at least

1-\frac{1}{\delta^{2}\cdot N}=0.999995

from $M=K/0.99\approx 1.01\cdot K$ received symbols in total from different stream objects.

As another example, setting $\delta=0.1$ , Theorem III.2 shows that an object $D$ composed of a single source block of $K$ source symbols can be recovered with probability at least

1-\frac{1}{\delta^{2}\cdot N}=0.99999995

from $M=K/0.9\approx 1.11\cdot K$ received symbols in total from different stream objects.

The analysis of Theorem III.2 is not tight, and thus these bounds are conservative.

IV Designed SOPI sets

Although the analysis in Section III shows there is little chance of there being a significant fraction of duplicates among received symbols from multiple stream objects, there is still a chance that there is a significant fraction of duplicates. As an extreme example, if $P_{0}$ and $P_{1}$ are identical then the overlap between prefixes of length $K/2$ is $K/2$ , i.e., all $K/2$ symbols are duplicates. As a somewhat less extreme example, if $P_{0}=(0,1)$ and $P_{1}=(A,1)$ where $A$ is randomly chosen, then for prefixes of length $K/2$ there is a $1-K/N$ chance there is no overlap between the prefixes, but with the remaining probability $K/N$ there are on average $K/4$ duplicates.

In this section, we provide a deterministic design of a large set $\mathcal{P}$ of SOPIs for which we can provide guaranteed upper bounds on the number of symbol ID duplicates there are in prefixes of permutations defined by SOPIs from $\mathcal{P}$ .

The design has two parts:

•

Design a set $\mathcal{B}\subset\{1,\ldots,N-1\}$ .
•

For each $B\in\mathcal{B}$ , design a set $\mathcal{A}_{B}\subset\{0,\ldots,N-1\}$ .

Then, the designed set of SOPIs is the set

\mathcal{P}=\{P=(A,B):B\in\mathcal{B},A\in\mathcal{A}_{B}\}.

IV-A Interactions between B values

The design of set $\mathcal{B}$ is based on interactions between different $B$ values. The following is at the heart of this analysis.

Definition IV.1

Let $B_{0},B_{1}\in\{1,\ldots,N\}$ such that $B_{0}\not=B_{1}$ . For any pair of integers $(d_{0},d_{1})$ , we say $(d_{0},d_{1})$ matches with respect to $(B_{0},B_{1})$ if

d_{0}\cdot B_{0}\mod N=d_{1}\cdot B_{1}\mod N.

∎

The importance of Definition IV.1 is that, for any $A_{0},A_{1}\in\{0,\ldots,N\}$ , if the symbol ID at position $p_{0}$ in the permutation defined by $P_{0}=(A_{0},B_{0})$ is equal to the symbol ID at position $p_{1}$ in the permutation defined by $P_{1}=(A_{1},B_{1})$ and $(d_{0},d_{1})$ matches with respect to $(B_{0},B_{1})$ , then the symbol ID at position $p_{0}+d_{0}\mod N$ with respect to $P_{0}$ is equal to the symbol ID at position $p_{1}+d_{1}\mod N$ with respect to $P_{1}$ .

It is easy to verify that if $(d_{0},d_{1})$ matches with respect to $(B_{0},B_{1})$ and $(d^{\prime}_{0},d^{\prime}_{1})$ matches with respect to $(B_{0},B_{1})$ then

(d_{0},d_{1})+(d^{\prime}_{0},d^{\prime}_{1})=(d_{0}+d^{\prime}_{0},d_{1}+d^{\prime}_{1})

matches with respect to $(B_{0},B_{1})$ . For example, if $(d_{0},d_{1})$ matches with respect to $(B_{0},B_{1})$ then, for any integer $i$ ,

i\cdot(d_{0},d_{1})=(i\cdot d_{0},i\cdot d_{1})

matches with respect to $(B_{0},B_{1})$ .

Let $M$ be an upper bound on the aggregate number of symbols downloaded by a client from prefixes of stream objects for an object. We impose the condition that

M^{2}<N/2.

(12)

Definition IV.2

The set $D$ is defined as

D=\{-M+1,\ldots,-1\}\cup\{1,\ldots,M-1\}.

∎

Lemma IV.3

For any $B_{0}\in\{1,\ldots,N\}$ and $B_{1}\in\{1,\ldots,N\}$ , one of the following two possibilities holds:

1.

For all $d_{0}\in D$ , $d_{1}\in D$ , $(d_{0},d_{1})$ does not match with respect to $(B_{0},B_{1})$ . The distance between $B_{0}$ and $B_{1}$ is defined to be $2\cdot M$ .
2.

There is a $d_{0}\in D$ , $d_{1}\in D$ such that $(d_{0},d_{1})$ matches with respect to $(B_{0},B_{1})$ and, for any $d^{\prime}_{0}\in D$ and $d^{\prime}_{1}\in D$ such that $(d^{\prime}_{0},d^{\prime}_{1})$ matches with respect to $(B_{0},B_{1})$ , $(d^{\prime}_{0},d^{\prime}_{1})$ is an integer multiple of $(d_{0},d_{1})$ . The distance between $B_{0}$ and $B_{1}$ is defined to be $|d_{0}|+|d_{1}|$ .

Proof:

If (1) holds the proof is complete. We need to show that if (1) doesn’t hold then (2) holds. If (1) doesn’t hold there is $d_{0}\in D$ and $d_{1}\in D$ such that $(d_{0},d_{1})$ matches with respect to $(B_{0},B_{1})$ and $(d_{0},d_{1})$ is a minimal pair, i.e., there is no $d^{\prime}_{0}\in D$ and $d^{\prime}_{1}\in D$ such that $(d^{\prime}_{0},d^{\prime}_{1})$ matches with respect to $(B_{0},B_{1})$ and $(d^{\prime}_{0},d^{\prime}_{1})=c\cdot(d_{0},d_{1})$ , where $0<|c|<1$ .

Consider any $(d^{\prime}_{0},d^{\prime}_{1})$ with $d^{\prime}_{0}\in D$ , $d^{\prime}_{1}\in D$ such that $(d^{\prime}_{0},d^{\prime}_{1})$ matches with respect to $(B_{0},B_{1})$ . We want to show that $(d^{\prime}_{0},d^{\prime}_{1})$ is an integer multiple of $(d_{0},d_{1})$ . Note that $(d^{\prime}_{0}\cdot d_{0},d^{\prime}_{0}\cdot d_{1})$ matches with respect to $(B_{0},B_{1})$ and also $(d^{\prime}_{0}\cdot d_{0},d^{\prime}_{1}\cdot d_{0})$ matches with respect to $(B_{0},B_{1})$ .

For any $A_{0},A_{1}\in\{0,\ldots,N\}$ , consider a pair of positions $(p_{0},p_{1})$ where the symbol ID at position $p_{0}$ of the permutation defined by $P_{0}=(A_{0},B_{0})$ is the same as the symbol ID at position $p_{1}$ of the permutation defined by $P_{1}=(A_{1},B_{1})$ . Then, the symbol ID at position $p_{0}+d^{\prime}_{0}\cdot d_{0}\mod N$ of $P_{0}$ is the same as the symbol ID at position $p_{1}+d^{\prime}_{0}\cdot d_{1}\mod N$ of $P_{1}$ and also the same as the symbol ID at position $p_{1}+d^{\prime}_{1}\cdot d_{0}\mod N$ of $P_{1}$ , which implies that

p_{1}+d^{\prime}_{0}\cdot d_{1}=p_{1}+d^{\prime}_{1}\cdot d_{0}\mod N.

(13)

Since

-N/2<-M^{2}\leq d^{\prime}_{0}\cdot d_{1}\leq M^{2}<N/2

and

-N/2<-M^{2}\leq d^{\prime}_{1}\cdot d_{0}\leq M^{2}<N/2,

it follows that Equation (13) holds if and only if

d^{\prime}_{0}\cdot d_{1}=d^{\prime}_{1}\cdot d_{0}.

Thus, for some $c\not=0$ ,

(d^{\prime}_{0},d^{\prime}_{1})=c\cdot(d_{0},d_{1}).

Since $(d_{0},d_{1})$ is a minimal pair, $|c|\geq 1$ . Then

	$\displaystyle(d^{\prime\prime}_{0},d^{\prime\prime}_{1})$	$\displaystyle=(d^{\prime}_{0},d^{\prime}_{1})-\lfloor c\rfloor\cdot(d_{0},d_{1})$
		$\displaystyle=(c-\lfloor c\rfloor)\cdot(d_{0},d_{1})$

matches with respect to $(B_{0},B_{1})$ , and $d^{\prime\prime}_{0}\in D$ and $d^{\prime\prime}_{1}\in D$ . Since $0\leq c-\lfloor c\rfloor<1$ and $(d_{0},d_{1})$ is a minimal pair, it follows that $c-\lfloor c\rfloor=0$ and thus (2) holds because $c$ is an integer. ∎∎

Lemma IV.4

Suppose for $B_{0},B_{1}\in\{1,\ldots,N\}$ the distance between $B_{0}$ and $B_{1}$ is $d$ . Then, for any $A_{0},A_{1}\in\{0,\ldots,N\}$ the number of distinct symbol IDs in the prefixes of the permutations defined by SOPIs

P_{0}=(A_{0},B_{0}),P_{1}=(A_{1},B_{1}),

is at least

m-\left\lfloor\frac{m-2}{d}\right\rfloor-1,

where $m\leq M$ is the total length of the pair of prefixes.

Proof:

If $m=1$ , or more generally one of the prefixes is of length zero, then the number of distinct symbol IDs is $m$ . Otherwise, neither prefix is of length zero. If there are no symbol IDs in common between the two prefixes then the number of distinct symbol IDs is $m$ .

Suppose there are symbol IDs in common between the two prefixes. Then there is a symbol ID at some position $p_{0}$ with respect to $P_{0}$ that is the same as a symbol ID at some position $p_{1}$ with respect to $P_{1}$ .

If (1) of Lemma IV.3 holds, i.e., there is no $d_{0}\in D$ , $d_{1}\in D$ such that $(d_{0},d_{1})$ matches with respect to $(B_{0},B_{1})$ , then the distance between $B_{0}$ and $B_{1}$ is $d=2\cdot M$ and $(p_{0},p_{1})$ is the only pair of positions where the symbol IDs are the same among the two prefixes, and thus there are

m-1=m-\left\lfloor\frac{m-2}{2\cdot M}\right\rfloor-1

distinct symbol IDs.

If (2) of Lemma IV.3 holds, i.e., there is a $d_{0}\in D$ , $d_{1}\in D$ such that $(d_{0},d_{1})$ matches with respect to $(B_{0},B_{1})$ and all other pairs $(d^{\prime}_{0},d^{\prime}_{1})$ that match with respect to $(B_{0},B_{1})$ , where $d^{\prime}_{0}\in D$ , $d^{\prime}_{1}\in D$ , are integer multiples of $(d_{0},d_{1})$ . Without loss of generality, $d_{0}>0$ and $d_{1}>0$ (the case where $d_{0}>0$ and $d_{1}<0$ is similar). Then the distance between $B_{0}$ and $B_{1}$ is $d=d_{0}+d_{1}$ . The following maximizes the number of symbol IDs that are duplicates between $P_{0}$ and $P_{1}$ :

•

The symbol ID in position $0$ of $P_{0}$ is the same as the symbol ID in position $0$ of $P_{1}$ .
•

Let $i=\lfloor\frac{m-2}{d}\rfloor$ . Then set $m_{0}$ and $m_{1}$ such that $m_{0}\geq i\cdot d_{0}+1$ and $m_{1}\geq i\cdot d_{1}+1$ . (The remaining $r=m-i\cdot d-2$ length can be assigned to $m_{0}$ and $m_{1}$ arbitrarily to satisfy $m_{0}+m_{1}=m$ .)

It can be verified that the number of pairs of symbol IDs that are the same between the two prefixes is maximized, and that this number of pairs is

i+1=\left\lfloor\frac{m-2}{d}\right\rfloor+1.

∎∎

Corollary IV.5

Suppose for $B_{0},B_{1},\ldots,B_{s-1}\in\{1,\ldots,N\}$ the distance between $B_{i}$ and $B_{j}$ is at least $d$ for all $i,j\in\{0,\ldots,s-1\}$ where $i\not=j$ . Then, for any $A_{0},A_{1},\ldots,A_{s-1}\in\{0,\ldots,N\}$ the number of distinct symbol IDs in the prefixes of the permutations defined by SOPIs

P_{0}=(A_{0},B_{0}),P_{1}=(A_{1},B_{1}),\ldots,P_{s-1}=(A_{s-1},B_{s-1}),

is at least

m-(s-1)\cdot\left(\frac{m-s}{d}+\frac{s}{2}\right),

where $m\leq M$ is the total length of the $s$ prefixes.

Proof:

Let $m_{0},m_{1},\ldots,m_{s-1}$ be the respective lengths of the $s$ prefixes, where $m=\sum_{i=0}^{s-1}m_{i}$ . From Lemma IV.4, for each $i\in\{0,\ldots,s-1\}$ , $j\in\{0,\ldots,i-1\}$ , the number of symbol IDs that are the same between the permutation defined by the prefix of $P_{i}$ and $P_{j}$ is at most

\frac{(m_{i}-1)+(m_{j}-1)}{d}+1.

Thus, the number of symbol IDs that are the same between all pairs of the prefixes is at most

	$\displaystyle\left(\sum_{i=0}^{s-1}\frac{(s-1)\cdot(m_{i}-1)}{d}\right)+\frac{(s-1)\cdot s}{2}$		(14)
	$\displaystyle\leq(s-1)\cdot\left(\frac{m-s}{d}+\frac{s}{2}\right).$		(15)

∎∎

Lemma IV.4 and Corollary IV.5 provide worst case lower bounds on the total number of distinct symbol IDs in prefixes. The actual number of distinct symbol IDs in practice are much closer to the total size $m$ of the prefixes than to the worst case lower bounds.

IV-B Constructing a SOPI set

The following algorithm constructs a set $\mathcal{B}$ such that for $B_{0}\in\mathcal{B}$ , $B_{1}\in\mathcal{B}-\{B_{0}\}$ , the distance between $B_{0}$ and $B_{1}$ is at least $d$ :

•

Initialize $\mathcal{B}=\emptyset$ , $\mathcal{B}^{\prime}=\{1,\ldots,N-1\}$ .
•
Repeat until $\mathcal{B}^{\prime}=\emptyset$ .
- –
  
  Choose any $B\in\mathcal{B}^{\prime}$
- –
  
  Delete $B$ from $\mathcal{B}^{\prime}$ and add $B$ to $\mathcal{B}$ .
- –
  For all $i\in D$ , $j\in D$ such that $|i|+|j|<d$
  - *
    
    Delete $B^{\prime}=i\cdot B\cdot j^{-1}\mod N$ from $\mathcal{B}^{\prime}$ .

Each time an element is added to $\mathcal{B}$ , after accounting for some symmetries such as

i\cdot B\cdot j^{-1}\mod N=-i\cdot B\cdot(-j)^{-1}\mod N,

the number of elements deleted from $\mathcal{B}^{\prime}$ for each element added to $\mathcal{B}$ is at most

(d-2)\cdot(d-3)+1\leq d^{2}.

Thus,

|\mathcal{B}|\geq\frac{N-1}{d^{2}}

when the algorithm completes.

The following algorithm constructs a set $\mathcal{A}_{B}$ for $B\in\mathcal{B}$ such that for $A_{0}\in\mathcal{A}_{B}$ , $A_{1}\in\mathcal{A}_{B}-\{A_{0}\}$ , the symbol IDs of prefixes of length up to $M$ of the permutations defined by SOPI $P_{0}=(A_{0},B)$ and $P_{1}=(A_{1},B)$ are all distinct. as follows:

•

Initialize $\mathcal{A}_{B}=\emptyset$
•

Choose $A$ randomly and add $A$ to $\mathcal{A}_{B}$ .
•
For $i=1,\ldots,\lfloor N/M\rfloor-1$ do
- –
  
  $A=A+M\cdot B\mod N$
- –
  
  Add $A$ to $\mathcal{A}_{B}$ .

Thus,

|\mathcal{A}_{B}|\geq N/M-1

when the algorithm completes.

Overall, the set of SOPIs

\mathcal{P}=\{P=(A,B):B\in\mathcal{B},A\in\mathcal{A}_{B}\}

is constructed. A SOPI can be assigned to an encoding node by choosing any $P=(A,B)\in\mathcal{P}$ that hasn’t been assigned so far. Overall,

|\mathcal{P}|\geq\frac{N^{2}}{d^{2}\cdot M}

SOPIs can be assigned.

IV-C Designed SOPI set examples

As an example, distance of at least $101$ between pairs of $B$ values might be sufficient. Lemma IV.4 guarantees that there are at least $K$ distinct symbol IDs among the prefixes of any pair of SOPIs with $M=1.01\cdot K+1$ symbol IDs in total. The value of $M=30,000$ might be sufficient for practical use cases, where $M^{2}\leq N/2$ when $N=2^{31}-1$ . In this case, the number of possible SOPIs available is over $15$ billion.

As another example, distance of at least $1,000$ between pairs of $B$ values might be desirable. Corollary IV.5 guarantees that downloading from prefixes of up to $10$ stream objects with different SOPIs, where the total number of downloaded symbols is at least $10,000$ and at most $30,000$ , will have less than $1.5\%$ duplication in symbol IDs. In this case, the number of possible SOPIs available is over $150$ million.

V SOPI distribution design

It is not necessary that each encoding node is assigned a unique SOPI. To enjoy the SOPI design properties, it is sufficient that a client avoids downloading encoded data for an object from two encoding nodes with the same assigned SOPI, which the client can easily avoid no matter how the SOPIs are assigned to encoding nodes. For example, if a client is supplied with the same SOPI for two edge encoding nodes reachable over different interfaces, the client can simply send requests for encoded data to only one of the two interfaces, and thus avoid receiving duplicate encoded data. Of course, the benefits of the SOPI design are reduced in this case, since the client cannot effectively download encoded data from both edge encoding nodes for the same object.

Thus, one would like to distribute SOPIs to encoding nodes in such a way that it is unlikely that a client will receive the same SOPI from two different edge encoding nodes. One can conceptually create a graph of all encoding nodes in the internet, where there is an edge connecting two encoding nodes if it is possible that a client may download prefixes of stream objects from both encoding nodes for the same object. Then, one can think of each SOPI $P\in\mathcal{P}$ as being a different color. Assigning colors to the graph in such a way that no edge has the same color at both endpoints provides a valid assignment of SOPIs to encoding nodes.

It seems likely that such a graph can be colored using a small number of colors, e.g., less than $100$ , and perhaps much less than $100$ if some edges in the graph as allowed to have the same color. Thus, the number of SOPIs needed overall is likely to be rather small. This can ameliorate one of the possible concerns with the SOPI design, which is that some information about which clients are requesting which objects might be revealed by the requests for encoded data containing SOPIs percolating through the network: the amount of information revealed is minimized if the number of SOPIs distributed is small.

VI Large objects

If an object is not too large then it can be treated as a single source block, i.e., fountain encoding and fountain decoding can be applied to the entire object. However, it becomes inefficient to apply fountain encoding and fountain decoding to objects that are larger, since typically the object needs to fit into working memory. The required working memory is typically linear in the source block size, i.e., the working memory to recover a block with $K$ source symbols is typically proportional to the symbol size times $K$ .

Thus, for longer objects, it is useful to have an algorithm that automatically partitions the object into multiple source blocks. Such an algorithm is specified in [3] in Section 4.3. For LDN the following is a suitable simplification of that algorithm. Let the desired symbol size be $T$ , and let $WS$ be the maximum source block size for which fountain encoding and fountain decoding is efficient, where $\lfloor WS/T\rfloor\leq 56,403$ . An object $D$ of size $F$ can be automatically partitioned into source blocks as follows:

•

$Kt=\lceil F/T\rceil$ (no. of source symbols in $D$ ).
•

${K_{\mathrm{max}}}=\lfloor WS/T\rfloor$ (max no. symbols per source block).
•

$Z=\lceil Kt/{K_{\mathrm{max}}}\rceil$ (no. of source blocks to split $D$ into).
•

$(KL,KS,ZL,ZS)=\mbox{Partition}[Kt,Z]$ .

The output of Partition $[Kt,Z]$ is $(KL,KS,ZL,ZS)$ , where $KL=\lceil Kt/Z\rceil$ , $KS=\lfloor Kt/Z\rfloor$ , $ZL=Kt-KS\cdot Z$ , and $ZS=Z-ZL$ . The function Partition $[Kt,Z]$ partitions a block of $Kt$ source symbols into $Z$ approximately equal-number of source symbols blocks. More specifically, it partitions $Kt$ into $ZL$ blocks, each with $KL$ source symbols, and $ZS$ blocks, each with $KS$ source symbols.

Given the symbol size $T$ , the maximum source block size $WS$ , and the object size $F$ , an encoding node or client can determine the parameters $(KL,KS,ZL,ZS)$ as described above, where $Z=ZL+ZS$ is the total number of source blocks. If $Z=1$ then the object is considered as a single source block with $KL=K=\lceil F/T\rceil$ source symbols.

If $Z>1$ then object $D$ is partitioned into $Z$ source blocks as follows:

•

The initial portion of object $D$ of size $ZL\cdot KL\cdot T$ is partitioned into $ZL$ source blocks, each consisting of $KL$ source symbols of size $T$ .
•

The remaining portion of object $D$ of size $F-ZL\cdot KL\cdot T$ is partitioned into $ZS$ source blocks, each consisting of $KS$ source symbols of size $T$ .

VI-A Large object extension of SOPI

We extend the definitions of SOPI and stream object to be applicable to large objects, i.e., objects that are partitioned into $Z>1$ source blocks. The source block structure for a large object $D$ of size $F$ is determined as described in Section VI based on $F$ , symbol size $T$ and a maximum source block size $WS$ .

A SOPI for the large object $D$ is of the form $P=(A,B,C,D)$ , where $A\in\{0,1,\ldots,N-1\},$ $B\in\{1,2,\ldots,N-1\}$ , $C\in\{0,1,\ldots,N-1\}$ and $D\in\{1,2,\ldots,N-1\}$ . $A$ and $B$ are used similarly to Subsection II to identify a symbol generated from a source block, and $C$ and $D$ are used to identify one of the $Z$ source blocks. For convenience, the ranges of $C$ and $D$ are chosen to match the ranges of $A$ and $B$ , and thus large objects with up to $N-1$ source blocks can be supported. For any $i\in\{0,1,\ldots,Z\cdot(N-1)\}$ , let:

•

$r=\lfloor i/Z\rfloor$
•

$j=(A+r\cdot B)\mod N$
•

$j^{\prime}=(i+C+r\cdot D)\mod Z$

Then the symbol at position $i$ of the stream object identified by SOPI $P$ for object $D$ is the symbol identified by $j$ from the source block identified by $j^{\prime}$ .

Each SOPI $P$ defines a stream object $(P,D)$ , which is a permutation defined by $P$ of the $Z\cdot N$ possible symbols that can be generated for the $Z$ source blocks of the object $D$ . The permutation $P$ has the property that each set of $Z$ consecutive symbols of the stream object starting at a position that is a multiple of $Z$ all have the same symbol ID but are from different source blocks, i.e., the $Z$ consecutive symbols are from a cyclic shift of the $Z$ source blocks.

The values of $C$ and $D$ should be chosen randomly chosen. This ensures that the source block sequence within each set of $Z$ consecutive symbols of the stream object is a random cyclic shift, and that different pairs of cyclic shifts are random. This ensures that packet losses on average with respect to randomly chosen $C$ and $D$ affect each source block of a large object equally. This is true even if packet loss patterns depend on the position of the symbol within the stream object carried in a packet, as long as the packet loss does not depend on the values of $C$ and $D$ .

VI-B RaptorQ large object

For the RaptorQ code specified in [3], the maximum number of supported source symbols is $56,403$ , and thus the maximum supported source block size $WS$ should satisfy $\lfloor WS/T\rfloor\leq 56,403$ , where $T$ is the symbol size. With $T=1400$ bytes, i.e., around the typical IPv4 packet payload size, the maximum object size supported without partitioning the object into more than one source block is around $80$ Mbytes.

For RaptorQ [3], $N=2^{31}-1$ is a good choice, and thus up to $2^{31}-1$ source blocks can be supported with the design described in Section VI-A. With $T$ set to the maximum supported symbol size of $2^{16}=65,536$ bytes, the maximum size object supported is around $8\cdot 10^{18}$ bytes, or around $8$ Exabytes.

SOPI design and analysis for LDN

Abstract

I Introduction

II SOPI and Stream object

II-A RaptorQ SOPI implementation

III Random SOPI sets

Proposition III.1

Theorem III.2

Proof:

III-A Random SOPI set examples

IV Designed SOPI sets

IV-A Interactions between B values

Definition IV.1

Definition IV.2

Lemma IV.3

Proof:

Lemma IV.4

Proof:

Corollary IV.5

Proof:

IV-B Constructing a SOPI set

IV-C Designed SOPI set examples

V SOPI distribution design

VI Large objects

VI-A Large object extension of SOPI

VI-B RaptorQ large object

References