¹¹institutetext: School of Electrical and Computer Engineering
Ben Gurion University of the Negev, Israel
¹¹email: [email protected]

Arithmetic Binary Search Trees:
Static Optimality in the Matching Model

Chen Avin

Abstract

Motivated by recent developments in optical switching and reconfigurable network design, we study dynamic binary search trees (BSTs) in the matching model. In the classical dynamic BST model, the cost of both link traversal and basic reconfiguration (rotation) is $O(1)$ . However, in the matching model, the BST is defined by two optical switches (that represent two matchings in an abstract way), and each switch (or matching) reconfiguration cost is $\alpha$ while a link traversal cost is still $O(1)$ . In this work, we propose Arithmetic BST (A-BST), a simple dynamic BST algorithm that is based on dynamic Shannon-Fano-Elias coding, and show that A-BST is statically optimal for sequences of length $\Omega(n\alpha\log\alpha)$ where $n$ is the number of nodes (keys) in the tree.

Keywords:

binary search trees static optimality matching model arithmetic coding Entropy.

1 Introduction

In this paper, we study one of the most classical problems in computer science, the design of efficient binary search trees (BSTs) [24]. More concretely, we are interested in Dynamic [27] or Self-Adjusting [32] binary search trees that need to serve a sequence of requests. While traditionally binary search trees are studied in the context of data-structures with a focus on memory and running time optimization, we are motivated in the physical world implementations of binary search trees and, in particular, self-adjusting networks [7].

The Matching model and Self-adjusting networks.

Self-adjusting networks, or reconfigurable networks, are communication networks that enable dynamic physical layer topologies [17, 18, 28, 9], i.e., topologies that can change over time. Recently, reconfigurable optical switching technologies have introduced an intriguing alternative to the traditional design of datacenter networks, allowing to dynamically establish direct shortcuts between communicating partners, depending on the demand [18, 21, 28, 1]. For example, such reconfigurable links could be established to support elephant flows or traffic between two racks with significant communication demands. The potential for such demand-aware optimizations is high: empirical studies show that communication traffic features both spatial and temporal locality, i.e., traffic matrices are indeed sparse and a small number of elephant flows can constitute a significant fraction of the datacenter traffic [29, 10, 3]. The main metric of interest in these networks is the flow completion time or the average packet delay, where the reconfiguration delay (aka as latency tax [20]) can be several orders of magnitude larger than the forwarding delay, i.e., milliseconds or microseconds vs. nanoseconds or less. This brings a striking contrast to the common data-structures approach where pointer forwarding and pointer changing (e.g., rotations) are of the same time order and considered as a unit cost.

Previous work [30, 6, 5] has shown that binary search trees (and other types of trees) can be used as an important building block in the design of self-adjusting networks since they carry nice properties like local routing and low degree . Moreover, to capture many recent designs of demand-aware networks like [18, 28, 9, 17, 23], a simple leaf-spine datacenter network model called ToR-Matching-ToR (TMT) was recently proposed [4]. In the TMT model, $n$ Top of the Rack (ToR) switches (leaves) are connected via optical switches (spines), and each spine switch can have a dynamically changing matching between its $n$ input-output ports. See Figure 1 for an illustration. The exact networking and technical details of the model are out of the scope of the paper, but abstractly with only two spine switches (i.e., two matchings), the model supports a simple implementation of a dynamic binary search tree that enables a greedy routing from the root (a ToR) to any other node (ToR).

Refer to caption — Figure 1: Overview of ToR-Matching-ToR (TMT) network model [4]. The $n$ nodes of the BST are the leaf switches and they are interconnected via $n$ port spine switches, each consist of a (dynamic) matching. A BST can be implemented in this model using two spine switches. The cost (reconfiguration delay) of updating a matching is $\alpha$ .

This leads to the matching model for dynamic binary search trees that we study in this paper where the tree can change at any time to any other tree but at a reconfiguration cost which is a parameter denoted as $\alpha$ .

1.1 Formal Model and Main Result

We consider a dynamic binary search tree (BST) that needs to serve a sequence of requests from a set $V$ of $n$ unique keys, where the value of the $i$ ’th key is $v_{i}$ and the keys impose a complete order. For simplicity and w.l.o.g we assume that the keys are sorted by their value, i.e., for $i<j$ , $v_{i}\leq v_{j}$ (otherwise we can just rename the keys). Let $\sigma=\{\sigma_{1},\sigma_{2},\dots\sigma_{m}\}$ be a sequence of $m$ requests for keys where $\sigma_{j}\in V$ .

A binary tree $T$ has a finite set of nodes with one node designated to be the root. Each node has a left and a right child which can be either a different node or a null object. Every node in $T$ , but for the root node, has a single parent node for which it is a child. We denote the root of a tree $T$ by $T.\operatorname{root()}$ and for a node $v$ , let $v.\operatorname{left()}$ and $v.\operatorname{right()}$ denote it left and right children. The depth of $v$ in $T$ is defined as the number of nodes on the path from the root to $v$ , denoted as $\operatorname{depth}(T,v)$ . The depth of the root is therefore one.

In a binary search tree $B$ , each node has a unique key. We assume the set of nodes to be $V$ and use nodes and keys interchangeably. The tree satisfies the symmetric order condition: every node’s key is greater than the keys in its left child subtree and smaller than the keys in its right child subtree.

For a given set of keys $V$ and a sequence $\sigma$ , the cost of the optimal static BST is defined as the BST with minimal cost to serve $\sigma$ , formally

\displaystyle\operatorname{STAT}(\sigma)=\min_{B\in\mathcal{B}}\sum_{t=1}^{m}\operatorname{depth}(B,\sigma_{t})

(1)

where $\mathcal{B}$ is the set of all binary search trees.

A dynamic BST is a sequence of BSTs, $B_{t}$ with $V$ as the set of nodes. At each time step $t$ , the cost of searching (or serving) a key $\sigma_{t}$ is $\operatorname{depth}(B_{t},\sigma_{t})$ , but after the request is served the BST $B_{t}$ can adjust its structure (while maintaining the symmetric order property).

The key point of this paper and in our model is that the cost assumptions differ. In previous studies, classical [8, 2, 27, 32], as well as more recent ones [16, 25, 13], to name a few, the cost of both a single link traversal and a single pointer change (usually called rotation) is $O(1)$ units. In most of these algorithms the cost of reconfiguration of the tree is kept proportional to cost of accessing the recent key. In contrast, in the matching model the cost model assumes that any change from $B_{t}$ to $B_{t+1}$ is possible (it could be a single rotation or a completely new tree) but at the cost of $\alpha$ units.

For a dynamic BST algorithm $\mathcal{A}$ the total cost of serving $\sigma$ starting from a BST $B=B_{1}$ is defined in the matching model as follows,

\displaystyle\operatorname{cost}(\mathcal{A},\sigma,B)=\sum_{t=1}^{m}\operatorname{depth}(B_{t},\sigma_{t})+\alpha\mathbb{I}_{B_{t}\neq B_{t+1}}

(2)

Where $\mathbb{I}_{e}$ is the indicator function, i.e., $\mathbb{I}_{e}=1$ if the event $e$ is true and 0 otherwise. From a networking perspective the cost of a dynamic BST algorithm can be seen as the total delay to serve (i.e., route) requests to nodes in $\sigma$ from a single source that is connected to the tree’s root.

In static optimality, which we next define, we want the dynamic BST algorithm to (asymptotically) perform well even in hindsight compared to the best static tree.

Definition 1 (Static Optimality)

Let $\operatorname{STAT}$ be the cost of an optimal static BST with perfect knowledge of the demand $\sigma$ , and let On be an online algorithm for dynamic BST. We say that On is statically optimal if there exists some constant $\rho\geq 1$ , and for any sufficiently long sequence of keys $\sigma$ , we have that

\operatorname{cost}(\textsc{On},\sigma,B)\leq\rho\operatorname{STAT}(\sigma)

where $B$ is the initial BST from which On starts. In other words, On’s cost is at most a constant factor higher than $\operatorname{STAT}$ in the worst case.

Note that when $\alpha=O(1)$ known algorithms for dynamic BST that are static optimal, e.g., [8, 27, 32, 16, 15], can be used to achieve static optimality in our model. The more interesting scenario is when $\alpha=\omega(1)$ and naive current algorithms with rotation cost (or splay to the root, or move to root) of $\alpha$ will not be static optimal. In this work, we assume for simplicity that $\alpha\geq 2$ .

Let $w_{i}(t)$ be the number of appearances of key $i$ in $\sigma$ up to time $t$ . Note that $\sum_{i=1}^{n}w_{i}(t)=t$ . For short let $w_{i}=w_{i}(m)$ . Let $\overline{W}=\overline{W}(\sigma)$ be the frequency (or empirical) distribution of $\sigma$ , i.e., $\overline{W}=\{\frac{w_{1}}{m},\frac{w_{2}}{m},\dots,\frac{w_{n}}{m}\}$ , and $\overline{W}(t)$ the frequency distribution up to time $t$ . It is a classical result that the optimal static BST for $\sigma$ has amortized access cost of $\Theta(H(\overline{W}))$ where $H$ is the entropy function [26], i.e., $\operatorname{STAT}(\sigma)=\Theta(m(1+H(\overline{W}(\sigma))))$ .

The main result of this paper is a simple dynamic BST algorithm, A-BST, that is based on arithmetic coding [34] and in particular on a dynamic Shannon-Fano-Elias coding [14] and is statically optimal for sequences of length $\Omega(n\alpha\log\alpha)$ . Formally,

Theorem 1

A-BST (Algorithm 3) is a statically optimal, dynamic BST for sequences of length at least $2n\alpha\log\alpha$ .

The rest of the paper is organized as follows: In Section 2 we review related concepts like Entropy and the Shannon-Fano-Elias (SFE) coding with a detailed example. Section 3 presents A-BST including related algorithms and an example for its dynamic operation. In Section 4 we prove Theorem 1. Finally, we conclude with a discussion and open questions in Section 5.

2 Preliminaries

Entropy and Shannon-Fano-Elias (SFE) Coding.

Entropy is a known measure of unpredictability of information content [31]. For a discrete random variable $X$ with possible values $\{x_{1},\dots,x_{n}\}$ , the (binary) entropy $H(X)$ of $X$ is defined as $H(X)=\sum_{i=1}^{n}p_{i}\log_{2}\frac{1}{p_{i}}$ where $p_{i}$ is the probability that $X$ takes the value $x_{i}$ . Note that, $0\cdot\log_{2}\frac{1}{0}=0$ and we usually assume that $p_{i}>0$ $\forall i$ . Let $\overline{p}$ denote $X$ ’s probability distribution, then we may write $H(\overline{p})$ instead of $H(X)$ .

Shannon-Fano-Elias (SFE) [14] is a well-known symbol, prefix code for lossless data compression and is the predecessor of the more famous and used (variable block length) arithmetic coding [34]. The Shannon-Fano-Elias code produce variable-length code-words based on the probability of each possible symbol. As in other entropy encoding methods (e.g., Huffman), higher probability symbols are represented using fewer bits, to reduce the expected code length. Unlike Huffman coding the coding method works for any order of symbols and the probabilities do not need to be sorted. This will fit better with the searching property we need in the tree later on.

The encoding is based on the cumulative distribution function (CDF) of the probability distribution,

\displaystyle F(i)=\sum\limits_{j\leq i}p_{j},

(3)

and encodes symbols using the function $\overline{F}$ where

\displaystyle\overline{F}(i)=\sum_{j<i}p_{j}+\frac{p_{i}}{2}=F(i-1)+\frac{p_{i}}{2}.

(4)

Denote by $B(i)$ the binary representation of $\overline{F}(i)$ . The code-word $C(i)$ for the $i$ ’th symbol $x_{i}$ consists of the first $\ell(i)$ bits of the fractional part of $B(i)$ , denoted as

\displaystyle C(i)=\lfloor B(i)\rfloor_{\ell(i)},

(5)

where the code length $\ell(i)$ is defined as

\displaystyle\ell_{i}=\left\lceil\log\frac{1}{p_{i}}\right\rceil+1.

(6)

The above construction guarantees $(i)$ that the code-words $C(i)$ are prefix-free and therefore appear as leaves in the binary tree that represent the code, and $(ii)$ that the average code length $L_{\mathrm{SFE}}(X)$ ,

\displaystyle L_{\mathrm{SFE}}(X)=\sum\limits_{i=1}^{n}p_{i}\cdot l(i)

\displaystyle=\sum\limits_{i=1}^{n}p_{i}(\lceil\log\frac{1}{p_{i}}\rceil+1)

(7)

is close to the entropy $H(X)$ of the random variable $X$ , and in particular,

\displaystyle H(X)+1\leq L_{\mathrm{SFE}}(X)<H(X)+2,

(8)

For an illustration consider Example A in Figure 2 which we will use also later in the paper. Let $X$ be a random variable with five possible symbols $\{1,2,3,4,5\}$ and the corresponding probabilities $\{0.1,0.2,0.4,0.2,0.1\}$ . Figure 2 (a) provides the detailed parameters for the encoding of each symbol $i$ , including $F(i),\overline{F}(i)$ , the binary representation $B(i)$ , the code-word length $\ell(i)$ and the (prefix-free) binary code-word $C(i)$ . Additionally, Figure 2 (b) presents $F(i)$ and $\overline{F}(i)$ graphically and Figure 2 (c) shows a binary tree with leaves as the code-words where the path from the root to each leaf represent its (prefix-free) code-word. Figure 2 (d) shows the conversion of the prefix tree to a binary search tree which we will discuss next.

3 Arithmetic Binary Search Trees (A-BST)

The idea of arithmetic binary search trees is simple in principle. It is composed of three phases:

1.

Use arithmetic coding, and in particular, Shannon-Fano-Elias coding to create an efficient, variable-length, prefix-free code for a given (empirical) distribution.
2.

In turn, convert the binary tree of the prefix-free code to a biased binary search tree (with an entropy bound on its performance).
3.

Dynamically update the empirical distribution, the prefix-free code and the corresponding binary search tree.

We first describe steps $1$ and $2$ that may be of independent interest and then the crux of the method which is the dynamic update process.

3.1 From Shannon-Fano-Elias (SFE) coding to BST

Algorithm 1 SFE-2-BST(

\overline{P}

): Convert Arithmetic (SFE) coding to BST

1:A probability distribution

\overline{P}

p_{i}

is the probability of key

i

, the key with rank

i

in the sorted keys’ list.

2:A BST

B

where key

i

is at distance

O(\log\frac{1}{p_{i}})

from the root.

3:Create a prefix-free code-word for each key

i

via SFE coding.

4:Create a binary tree

T

from the SFE-code, with keys as leaves (i.e., binary trie).

B

= PrefixTree-2-BST(

T

\triangleright

See Algorithm 2

Algorithms 1 and 2 shows how to create a near optimal biased BST for a given distribution via SFE coding. Algorithm SFE-2-BST (Algorithm 1) is first provided with a probability distribution $\overline{P}$ for the biased BST. Note that, w.l.o.g, the distribution is sorted by the keys values. From $\overline{P}$ the algorithm then creates a prefix-free binary code using SFE coding¹¹1Any other coding method that preserves the keys order can be used, e.g., Shannon–Fano coding [14] which split the probabilities to half, Alphabetic Codes [35] or Mehlhorn tree [26], but not e.g., Huffman coding [22], that needs to sort the probabilities. and the corresponding binary tree $T$ where keys (symbols) are at the leaves and ordered by their value (and not by probabilities). Next, the algorithm calls Algorithm PrefixTree-2-BST (Algorithm 2) which converts $T$ to a BST $B$ .

Algorithm 2 PrefixTree-2-BST(

T

): Convert a binary prefix tree to BST

1:A prefix tree

T

with keys as leaves in sorted order.

2:A BST

B

where each key in

B

is at least as close to the root as in

T

3:if

T

is of size one or NULL then

4: return

T

5:else

6: Let

pre

be the rightmost leaf in the left subtree of

T

(pre-order)

7: Let

post

the leftmost leaf in the right subtree of

T

(post-order)

8: Let

\ell

be the leaf with lower depth between

pre

and

post

9: Let

B

be a new binary search tree with

\ell

as a root

10: Delete

\ell

from

T

11:

B

.root.left

=

PrefixTree-2-BST(

T

.root.left)

12:

B

.root.right

=

PrefixTree-2-BST(

T

.root.right)

13: return

B

14:end if

Algorithm 2 is a recursive algorithm that receives a binary tree $T$ with keys as leaves in increasing order. It creates a BST $B$ by setting the root of the tree as the key which is closer to the root between the two keys that are pre-order and post-order to the root (and then deleting the key from $T$ ). The left and right subtrees of $B$ ’s root are created by calling recursively the algorithm with the relevant subtrees of $T$ .

Recall Figure 2 (c) which shows a binary tree $T$ for the SFE code of the probability distribution in Example A. The keys $1,2,3,4,5$ are leaves and in sorted order. This tree is an intermediate result of SFE-2-BST( $\overline{P}$ ) algorithm. Figure 2 (d) presents the corresponding BST after calling PrefixTree-2-BST $(T)$ . Initially $3$ is selected as the root as it is closer to the root between keys $2$ and $3$ . Then (after deleting $3$ from the tree), the algorithm continues recursively, by building the children of $B$ ’s root with the left and right subtrees of $T$ ’s root. On the left, 2 is selected as the next root and then 1 as its left child. On the right $4$ is selected as the next root and then 5 as its right child.

We can easily bound the depth of any key in the tree generated by SFE-2-BST.

Claim 1

For a key-sorted probability distribution $\overline{P}$ , the algorithm SFE-2-BST( $\overline{P}$ ) creates a biased BST $B$ where for each key $i$ it holds $\operatorname{depth}(B,i)<\log\frac{1}{p_{i}}+3$ .

Proof

The SFE code creates code-words with length $\ell_{i}=\log\frac{1}{p_{i}}+2$ , so the tree $T$ in Algorithm SFE-2-BST has the leaves at depth $\ell_{i}+1$ . Algorithm PrefixTree-2-BST( $T$ ) does not change the symmetric order of keys that are initially stored at leaves, and can only decrease the depth of any key. ∎

We next discuss how to make the BST dynamic.

3.2 Dynamic BST

Algorithm 3 provides a pseudo-code for a dynamic Arithmetic Binary Search Tree (A-BST) $B_{t}$ . The algorithm starts from $B_{1}$ which is a balanced BST over the $n$ possible keys (assuming no knowledge on the keys distribution). The algorithm maintains two probability distribution vectors that change over time, $\overline{P}=\{p_{1},\dots,p_{n}\}$ and $\overline{Q}=\{q_{1},\dots,q_{n}\}$ . $\overline{P}$ holds the probability distribution by which the current BST was built and initially is set to the uniform distribution. $\overline{Q}$ holds the empirical distribution of $\sigma$ up to time $t$ and is updated on each request, i.e., $q_{i}(t)=\frac{w_{i}(t)}{t}$ ²²2To avoid zero probabilities, in practice $q_{i}(t)$ is set by Laplace rule [12] to be $\frac{w_{i}(t)+1}{t+n}$ , this does not change the main Theorem since for $t\geq n$ we have $\frac{w_{i}(t)+1}{t+n}\geq\frac{w_{i}(t)}{2t}$ . For simplicity of exposition we assume $q_{i}(t)=\frac{w_{i}(t)}{t}$ .. The crux of the algorithm is that, whenever a key $i$ is requested, if as a result $p_{i}<q_{i}/2$ , then $\overline{P}$ is updated to be equal to $\overline{Q}$ , and a new BST is created (at cost of $\alpha$ ) from $\overline{P}$ . Note that upon a request to key $i$ only $q_{i}$ increases while all other probabilities decrease. So the condition $p_{j}<q_{j}/2$ can only become valid for $j=i$ when key $i$ is requested.

We demonstrate the algorithm using Example B. Assume we have $5$ keys $1,2,3,4,5$ and after the first ten requests in $\sigma$ we have $\overline{Q}=\overline{P}=\{\frac{1}{10},\frac{2}{10},\frac{4}{10},\frac{2}{10},\frac{1}{10}\}$ . Note that this is the empirical distribution of Example A, so $B_{10}$ is the BST that is shown in Figure 2 (d). Next we assume that the $11th$ and $12th$ requests are for key 1. After the $11th$ request $p_{1}(11)=\frac{1}{10}$ and $q_{1}(11)=\frac{2}{11}$ so $p_{1}(11)>q_{i}(11)/2$ and $B_{11}$ remains as $B_{10}$ . After the $12th$ request we have $p_{1}(12)=\frac{1}{10}$ and $q_{1}(12)=\frac{3}{12}$ , and now $p_{1}(12)<q_{i}(12)/2$ . So at time $t=12$ , a new SFE code and a new BST are created with the empirical distribution $\overline{Q}=\overline{P}=\frac{3}{12},\frac{2}{12},\frac{4}{12},\frac{2}{12},\frac{1}{12}$ . The details of the code for this distribution are given in Figure 2 (e)-(g) and the new BST, $B_{12}$ is shown in Figure 2 (h). Note that key 1 got closer to the root.

Next we analyse A-BST and show it is statically optimal.

Algorithm 3 A-BST: Simple Arithmetic BST

1:A BST

B_{t}

\overline{P}

- the current tree distribution,

\overline{Q}

- empirical distribution (or model distribution).

\forall i,~{}p_{i}\geq q_{i}/2

2:Dynamic BST

B_{t+1}

\forall i,~{}p_{i}\geq q_{i}/2

3:Upon request of key

i

at time

t

4:Update

\overline{Q}

\triangleright

q_{i}(t)=\frac{w_{i}(t)}{t}=\frac{w_{i}(t-1)+1}{t}

5:if

p_{i}<q_{i}/2

then

\triangleright

update

\overline{P}

if needed

6: Set

\overline{P}

\overline{Q}

B_{t+1}=

SFE-2-BST(

\overline{P}

)

\triangleright

at cost

O(\alpha)

8:else

B_{t+1}=B_{t}

10:end if

11:Serve key

i

B_{t+1}

at cost

O(\log\frac{1}{p_{i}})

4 A-BST is statically Optimal

We now prove the main result of this work. Recall that we assume $\alpha\geq 2$ and the more intresting case is $\alpha=\omega(1)$ .

Theorem 4.1

A-BST (Algorithm 3) is a statically optimal, dynamic BST for sequences of length at least $2n\alpha\log\alpha$ .

Proof

Let $B_{t}$ be the BST tree of the algorithm at time $t$ . Let $\overline{P}_{t}=\{p_{1}(t),\dots,p_{n}(t)\}$ be the probability distribution by which $B_{t}$ was build. Let $\overline{Q}_{t}=\{q_{1}(t),\dots,q_{n}(t)\}$ be the frequency distribution of $\sigma$ up to time $t$ . Note that $q_{i}(m)=\frac{w_{i}}{m}$ . Observe that Algorithm 3 guarantees that for each $i$ and each $t$ we have $p_{i}(t)\geq\frac{q_{i}(t)}{2}$ . We first analyse the searching cost (or access cost [11]), $\operatorname{cost}_{S}(\sigma)$ of Algorithm 3.

$\displaystyle\operatorname{cost}_{S}(\sigma)$	$\displaystyle=\sum_{t=1}^{m}\operatorname{depth}(B_{t},\sigma_{t})$	(9)
	$\displaystyle\leq\sum_{t=1}^{m}\left(\log(\frac{1}{p_{\sigma_{t}}(t)})+3\right)$	(10)
	$\displaystyle\leq\sum_{t=1}^{m}\left(\log(\frac{2}{q_{\sigma_{t}}(t)})+3\right)$	(11)
	$\displaystyle\leq 4m+\sum_{t=1}^{m}\log(\frac{1}{q_{\sigma_{t}}(t)})$	(12)

where Eq. (10) is due to Claim 1.

Next we consider the cost $\sum_{t=1}^{m}\log(\frac{1}{q_{\sigma_{t}}(t)})$ and analyze each key $i$ at a time. We follow an idea mentioned in a classical paper on dynamic Huffman coding by Vitter [33] and is due to a personal communication of Vitter with B. Chazelle. Consider the last $\lfloor\frac{w_{i}}{2}\rfloor$ requests for key $i$ . For each such request, since $w_{i}(t)\geq w_{i}/2$ , we have $q_{i}(t)=\frac{w_{i}(t)}{t}\geq\frac{w_{i}}{2t}\geq\frac{w_{i}}{2m}$ . So the sum of these $\frac{w_{i}}{2}$ requests is at most $\frac{w_{i}}{2}\log\frac{2m}{w_{i}}$ .

Similarly for each $j$ ’th request of key $i$ where $\lceil\frac{w_{i}}{2^{k}}\rceil<j\leq\lceil\frac{w_{i}}{2^{k-1}}\rceil$ where $k=1,2,\dots\lceil\log(w_{i})\rceil$ , $q_{i}(t)\geq\frac{w_{i}}{2^{k}m}$ and the cost of all of these requests is less than $\frac{w_{i}}{2^{k}}\log\frac{w_{i}}{2^{k}m}$ . So for each key $i$

$\displaystyle\sum_{\sigma_{t}=i}\log(\frac{1}{q_{i}(t))})$	$\displaystyle\leq\sum_{k=1}^{\lceil\log(w_{i})\rceil}\frac{w_{i}}{2^{k}}\log\frac{2^{k}m}{w_{i}}$	(13)
	$\displaystyle\leq\sum_{k=1}^{\lceil\log(w_{i})\rceil}\frac{w_{i}}{2^{k}}(\log\frac{m}{w_{i}}+\log 2^{k})$	(14)
	$\displaystyle\leq w_{i}\log\frac{m}{w_{i}}\sum_{k=1}^{\lceil\log(w_{i})\rceil}\frac{1}{2^{k}}+w_{i}\sum_{k=1}^{\lceil\log(w_{i})\rceil}\frac{k}{2^{k}}$	(15)
	$\displaystyle\leq w_{i}\log\frac{m}{w_{i}}+2w_{i}$	(16)

From this it follows that:

	$\displaystyle\sum_{t=1}^{m}\log(\frac{1}{q_{\sigma_{t}}(t))})$	$\displaystyle\leq\sum_{i=1}^{n}w_{i}\log\frac{m}{w_{i}}+2w_{i}$		(17)
		$\displaystyle\leq mH(\overline{W})+2m$		(18)

A result identical to Eq. (18) was recently shown in [19]. Next we consider the adjustment cost. We do it again, one key at a time. For a key $i$ consider all the adjustment it caused (lines 5-6 in Algorithm 3). We consider two cases: i) all adjustment when $w_{i}(t)\geq 2\alpha$ and ii) $w_{i}(t)<2\alpha$ . To prove these two cases we will use the following claim.

Claim 2

Let $i$ be the key that cause an adjustment (lines 5-6 in Algorithm 3) at time $t$ . Let $t^{\prime}$ be the time of the previous adjustment initiated by any key. Then $w_{i}(t^{\prime})<\frac{w_{i}(t)}{2}$ .

Proof

By Algorithm 3 after the adjustment at time $t^{\prime}$ , $\overline{P}$ was set to $\overline{Q}$ and in particular $p_{i}(t^{\prime})$ was set to $q_{i}(t^{\prime})$ . The result follows since at time $t$ we have

\displaystyle p_{i}(t)=p_{i}(t^{\prime})=\frac{w_{i}(t^{\prime})}{t^{\prime}}

\displaystyle<\frac{q_{i}(t)}{2}=\frac{w_{i}(t)}{2t}\leq\frac{w_{i}(t)}{2t^{\prime}}

(19)

∎

We can now analyse the two adjustment cases.

Case i: $w_{i}(t)\geq 2\alpha$ .

First notice that if $w_{i}(t^{\prime})\geq 2\alpha$ then for each $t>t^{\prime}$ $w_{i}(t)\geq 2\alpha$ . Assume a time $t$ for which $w_{i}(t)\geq 2\alpha$ and lines 5-6 in Algorithm 3 was executed. Now consider time $t^{\prime}<t$ where the previous execution of lines 5-6 occurred.

From Claim 2 and using the assumption that $w_{i}(t)\geq 2\alpha$ , the number of requests to key $i$ since the last adjustment is:

\displaystyle w_{i}(t)-w_{i}(t^{\prime})\geq w_{i}(t)-\frac{w_{i}(t)}{2}=\frac{w_{i}(t)}{2}\geq\alpha

(20)

Therefore we can amortize the adjustment at time $t$ (of cost $\alpha$ ) with the (at least) $\alpha$ cost of the adversary accessing the (at least) $\alpha$ requests to key $i$ between $t^{\prime}$ and $t$ .

Case ii: $w_{i}(t)\leq 2\alpha$ .

From Claim 2, $w_{i}(t)$ at least doubles between each adjustment that is caused by key $i$ . So until $w_{i}(t)\geq 2\alpha$ , key $i$ can cause at most $\log(\alpha)+1$ adjustments at a total adjustment cost that is less than $2\alpha\log\alpha$ .

It follows that the adjustment cost, $\operatorname{cost}_{A}(\sigma)$ , of Algorithm 3 can be bounded as follows:

$\displaystyle\operatorname{cost}_{A}(\sigma)$	$\displaystyle=\sum_{B_{t}\neq B_{t+1}}\alpha$	(21)
	$\displaystyle=\sum_{i=1}^{n}\sum_{p_{i}<q_{i}/2}\alpha=\sum_{i=1}^{n}\left(\sum_{\begin{subarray}{c}p_{i}(t)<q_{i}(t)/2\\ w_{i}(t)<2\alpha\end{subarray}}\alpha+\sum_{\begin{subarray}{c}p_{i}(t)<q_{i}(t)/2\\ w_{i}(t)\geq 2\alpha\end{subarray}}\alpha\right)$	(22)
	$\displaystyle\leq\sum_{i=1}^{n}\left(2\alpha\log\alpha+w_{i}\right)$	(23)
	$\displaystyle=2n\alpha\log\alpha+m\leq 2m$	(24)

Equations (22) and (23) follows from cases (i) and (ii) above. Eq. (24) follows form the assumption that $m\geq 2n\alpha\log\alpha$ .

To finalize the proof we combine the above results to find the cost of A-BST.

$\displaystyle\operatorname{cost}(\text{A-BST},\sigma,B_{1})$	$\displaystyle=\sum_{t=1}^{m}\operatorname{depth}(B_{t},\sigma_{t})+\alpha\mathbb{I}_{B_{t}\neq B_{t+1}}$	(25)
	$\displaystyle=\operatorname{cost}_{S}(\sigma)+\operatorname{cost}_{A}(\sigma)$	(26)
	$\displaystyle\leq 4m+mH(\overline{W})+2m+2m=m(8+H(\overline{W}))$	(27)

∎

5 Discussion and Open Questions

We believe that the matching model brings some interesting research directions. Our model can emulate the standard BST model when $\alpha=O(1)$ , while dynamic optimally is a major and a long-standing open question for this case [32], it may be possible that for some range of $\alpha$ the question becomes easier. Another property, possibly simpler, to prove or disprove approximation for, is a working-set like theorem [32]. A nice feature of the matching model is that it supports more complex networks than a BST. It is a future research direction to prove static optimality and other online properties for such networks.

Regarding A-BST, the algorithm can be extended in several directions to be more adaptive. First, it can work with a (memory) window instead of the whole history to become more dynamic. Careful attention is required to decide the window size in order not to lose the static optimally feature. More generally, as in arithmetic coding [14] which is an extension of the SFE coding, the prediction model can be independently replaced by a more sophisticated model than the current Laplace model.

References

[1] Alistarh, D., Ballani, H., Costa, P., Funnell, A., Benjamin, J., Watts, P., Thomsen, B.: A high-radix, low-latency optical switch for data centers. In: Proceedings of the 2015 ACM conference on SIGCOMM. pp. 367–368 (2015)
[2] Allen, B., Munro, I.: Self-organizing binary search trees. Journal of the ACM (JACM) 25(4), 526–535 (1978)
[3] Avin, C., Ghobadi, M., Griner, C., Schmid, S.: On the complexity of traffic traces and implications. Proceedings of the ACM on Measurement and Analysis of Computing Systems 4(1), 1–29 (2020)
[4] Avin, C., Griner, C., Salem, I., Schmid, S.: An online matching model for self-adjusting tor-to-tor networks. arXiv preprint arXiv:2006.11148 (2020)
[5] Avin, C., Mondal, K., Schmid, S.: Demand-aware network designs of bounded degree. Distributed Computing pp. 1–15 (2017)
[6] Avin, C., Schmid, S.: Renets: Toward statically optimal self-adjusting networks. arXiv preprint arXiv:1904.03263. To appear in APOCS 2020. (2019)
[7] Avin, C., Schmid, S.: Toward demand-aware networking: A theory for self-adjusting networks. ACM SIGCOMM Computer Communication Review 48(5), 31–40 (2019)
[8] Baer, J.L.: Weight-balanced trees. In: Proceedings of the May 19-22, 1975, national computer conference and exposition. pp. 467–472 (1975)
[9] Ballani, H., Costa, P., Behrendt, R., Cletheroe, D., Haller, I., Jozwik, K., Karinou, F., Lange, S., Shi, K., Thomsen, B., et al.: Sirius: A flat datacenter network with nanosecond optical switching. In: Proceedings of the 2016 ACM SIGCOMM Conference. pp. 782–797 (2020)
[10] Benson, T., Anand, A., Akella, A., Zhang, M.: Understanding data center traffic characteristics. In: Proc. 1st ACM Workshop on Research on Enterprise Networking (WREN). pp. 65–72. ACM (2009)
[11] Blum, A., Chawla, S., Kalai, A.T.: Static optimality and dynamic search-optimality in lists and trees. Algorithmica 36 (2016)
[12] Carnap, R.: On the application of inductive logic. Philosophy and phenomenological research 8(1), 133–148 (1947)
[13] Chalermsook, P., Jiamjitrak, W.P.: New binary search tree bounds via geometric inversions. In: 28th Annual European Symposium on Algorithms (ESA 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2020)
[14] Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley & Sons (2012)
[15] Demaine, E.D., Harmon, D., Iacono, J., Kane, D., Pătraşcu, M.: The geometry of binary search trees. In: Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms. pp. 496–505. SIAM (2009)
[16] Demaine, E.D., Harmon, D., Iacono, J., Pătraşcu, M.: Dynamic optimality—almost. SIAM Journal on Computing 37(1), 240–251 (2007)
[17] Farrington, N., Porter, G., Radhakrishnan, S., Bazzaz, H.H., Subramanya, V., Fainman, Y., Papen, G., Vahdat, A.: Helios: a hybrid electrical/optical switch architecture for modular data centers. In: Proceedings of the ACM SIGCOMM 2010 conference. pp. 339–350 (2010)
[18] Ghobadi, M., Mahajan, R., Phanishayee, A., Devanur, N., Kulkarni, J., Ranade, G., Blanche, P.A., Rastegarfar, H., Glick, M., Kilper, D.: Projector: Agile reconfigurable data center interconnect. In: Proceedings of the 2016 ACM SIGCOMM Conference. pp. 216–229 (2016)
[19] Golin, M., Iacono, J., Langerman, S., Munro, J.I., Nekrich, Y.: Dynamic Trees with Almost-Optimal Access Cost. In: Azar, Y., Bast, H., Herman, G. (eds.) 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), vol. 112, pp. 38:1–38:14. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018). https://doi.org/10.4230/LIPIcs.ESA.2018.38, http://drops.dagstuhl.de/opus/volltexte/2018/9501
[20] Griner, C., Zerwas, J., Blenk, A., Ghobadi, M., Schmid, S., Avin, C.: Performance analysis of demand-oblivious and demand-aware optical datacenter network designs. arXiv preprint arXiv:2010.13081. (2020)
[21] Hamedazimi, N., Qazi, Z., Gupta, H., Sekar, V., Das, S.R., Longtin, J.P., Shah, H., Tanwer, A.: Firefly: A reconfigurable wireless data center fabric using free-space optics. In: Proceedings of the 2014 ACM conference on SIGCOMM. pp. 319–330 (2014)
[22] Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proceedings of the IRE 40(9), 1098–1101 (1952)
[23] Kassing, S., Valadarsky, A., Shahaf, G., Schapira, M., Singla, A.: Beyond fat-trees without antennae, mirrors, and disco-balls. In: Proceedings of the 2017 ACM conference on SIGCOMM. pp. 281–294. ACM (2017)
[24] Knuth, D.E.: Optimum binary search trees. Acta informatica 1(1), 14–25 (1971)
[25] Levy, C., Tarjan, R.: A new path from splay to dynamic optimality. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 1311–1330. SIAM (2019)
[26] Mehlhorn, K.: Nearly optimal binary search trees. Acta Informatica 5(4), 287–295 (1975)
[27] Mehlhorn, K.: Dynamic binary search. SIAM Journal on Computing 8(2), 175–198 (1979)
[28] Mellette, W.M., McGuinness, R., Roy, A., Forencich, A., Papen, G., Snoeren, A.C., Porter, G.: Rotornet: A scalable, low-complexity, optical datacenter network. In: Proceedings of the 2016 ACM SIGCOMM Conference. pp. 267–280 (2017)
[29] Roy, A., Zeng, H., Bagga, J., Porter, G., Snoeren, A.C.: Inside the social network’s (datacenter) network. In: Proc. ACM SIGCOMM Computer Communication Review (CCR). vol. 45, pp. 123–137. ACM (2015)
[30] Schmid, S., Avin, C., Scheideler, C., Borokhovich, M., Haeupler, B., Lotker, Z.: Splaynet: Towards locally self-adjusting networks. IEEE/ACM Transactions on Networking 24(3), 1421–1433 (2015)
[31] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948)
[32] Sleator, D.D., Tarjan, R.E.: Self-adjusting binary search trees. Journal of the ACM (JACM) 32(3), 652–686 (1985)
[33] Vitter, J.S.: Design and analysis of dynamic huffman codes. Journal of the ACM (JACM) 34(4), 825–845 (1987)
[34] Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Commun. ACM 30(6), 520–540 (Jun 1987). https://doi.org/10.1145/214762.214771, http://doi.acm.org/10.1145/214762.214771
[35] Yeung, R.W.: Alphabetic codes revisited. IEEE Transactions on Information Theory 37(3), 564–572 (1991)

Arithmetic Binary Search Trees: Static Optimality in the Matching Model

Abstract

Keywords:

1 Introduction

The Matching model and Self-adjusting networks.

1.1 Formal Model and Main Result

Definition 1 (Static Optimality)

Theorem 1

2 Preliminaries

Entropy and Shannon-Fano-Elias (SFE) Coding.

3 Arithmetic Binary Search Trees (A-BST)

3.1 From Shannon-Fano-Elias (SFE) coding to BST

Claim 1

Proof

3.2 Dynamic BST

4 A-BST is statically Optimal

Theorem 4.1

Proof

Claim 2

Proof

Case i: wi​(t)≥2​αw_{i}(t)\geq 2\alpha.

Case ii: wi​(t)≤2​αw_{i}(t)\leq 2\alpha.

5 Discussion and Open Questions

References

Arithmetic Binary Search Trees:
Static Optimality in the Matching Model

Case i: $w_{i}(t)\geq 2\alpha$ .

Case ii: $w_{i}(t)\leq 2\alpha$ .