This paper was converted on www.awesomepapers.org from LaTeX by an anonymous user.
Want to know more? Visit the Converter page.

11institutetext: School of Electrical and Computer Engineering
Ben Gurion University of the Negev, Israel
11email: [email protected]

Arithmetic Binary Search Trees:
Static Optimality in the Matching Model

Chen Avin
Abstract

Motivated by recent developments in optical switching and reconfigurable network design, we study dynamic binary search trees (BSTs) in the matching model. In the classical dynamic BST model, the cost of both link traversal and basic reconfiguration (rotation) is O(1)O(1). However, in the matching model, the BST is defined by two optical switches (that represent two matchings in an abstract way), and each switch (or matching) reconfiguration cost is α\alpha while a link traversal cost is still O(1)O(1). In this work, we propose Arithmetic BST (A-BST), a simple dynamic BST algorithm that is based on dynamic Shannon-Fano-Elias coding, and show that A-BST is statically optimal for sequences of length Ω(nαlogα)\Omega(n\alpha\log\alpha) where nn is the number of nodes (keys) in the tree.

Keywords:
binary search trees static optimality matching model arithmetic coding Entropy.

1 Introduction

In this paper, we study one of the most classical problems in computer science, the design of efficient binary search trees (BSTs) [24]. More concretely, we are interested in Dynamic [27] or Self-Adjusting [32] binary search trees that need to serve a sequence of requests. While traditionally binary search trees are studied in the context of data-structures with a focus on memory and running time optimization, we are motivated in the physical world implementations of binary search trees and, in particular, self-adjusting networks [7].

The Matching model and Self-adjusting networks.

Self-adjusting networks, or reconfigurable networks, are communication networks that enable dynamic physical layer topologies [17, 18, 28, 9], i.e., topologies that can change over time. Recently, reconfigurable optical switching technologies have introduced an intriguing alternative to the traditional design of datacenter networks, allowing to dynamically establish direct shortcuts between communicating partners, depending on the demand [18, 21, 28, 1]. For example, such reconfigurable links could be established to support elephant flows or traffic between two racks with significant communication demands. The potential for such demand-aware optimizations is high: empirical studies show that communication traffic features both spatial and temporal locality, i.e., traffic matrices are indeed sparse and a small number of elephant flows can constitute a significant fraction of the datacenter traffic [29, 10, 3]. The main metric of interest in these networks is the flow completion time or the average packet delay, where the reconfiguration delay (aka as latency tax [20]) can be several orders of magnitude larger than the forwarding delay, i.e., milliseconds or microseconds vs. nanoseconds or less. This brings a striking contrast to the common data-structures approach where pointer forwarding and pointer changing (e.g., rotations) are of the same time order and considered as a unit cost.

Previous work [30, 6, 5] has shown that binary search trees (and other types of trees) can be used as an important building block in the design of self-adjusting networks since they carry nice properties like local routing and low degree . Moreover, to capture many recent designs of demand-aware networks like [18, 28, 9, 17, 23], a simple leaf-spine datacenter network model called ToR-Matching-ToR (TMT) was recently proposed [4]. In the TMT model, nn Top of the Rack (ToR) switches (leaves) are connected via optical switches (spines), and each spine switch can have a dynamically changing matching between its nn input-output ports. See Figure 1 for an illustration. The exact networking and technical details of the model are out of the scope of the paper, but abstractly with only two spine switches (i.e., two matchings), the model supports a simple implementation of a dynamic binary search tree that enables a greedy routing from the root (a ToR) to any other node (ToR).

Refer to caption
Figure 1: Overview of ToR-Matching-ToR (TMT) network model [4]. The nn nodes of the BST are the leaf switches and they are interconnected via nn port spine switches, each consist of a (dynamic) matching. A BST can be implemented in this model using two spine switches. The cost (reconfiguration delay) of updating a matching is α\alpha.

This leads to the matching model for dynamic binary search trees that we study in this paper where the tree can change at any time to any other tree but at a reconfiguration cost which is a parameter denoted as α\alpha.

1.1 Formal Model and Main Result

We consider a dynamic binary search tree (BST) that needs to serve a sequence of requests from a set VV of nn unique keys, where the value of the ii’th key is viv_{i} and the keys impose a complete order. For simplicity and w.l.o.g we assume that the keys are sorted by their value, i.e., for i<ji<j, vivjv_{i}\leq v_{j} (otherwise we can just rename the keys). Let σ={σ1,σ2,σm}\sigma=\{\sigma_{1},\sigma_{2},\dots\sigma_{m}\} be a sequence of mm requests for keys where σjV\sigma_{j}\in V.

A binary tree TT has a finite set of nodes with one node designated to be the root. Each node has a left and a right child which can be either a different node or a null object. Every node in TT, but for the root node, has a single parent node for which it is a child. We denote the root of a tree TT by T.root()T.\operatorname{root()} and for a node vv, let v.left()v.\operatorname{left()} and v.right()v.\operatorname{right()} denote it left and right children. The depth of vv in TT is defined as the number of nodes on the path from the root to vv, denoted as depth(T,v)\operatorname{depth}(T,v). The depth of the root is therefore one.

In a binary search tree BB, each node has a unique key. We assume the set of nodes to be VV and use nodes and keys interchangeably. The tree satisfies the symmetric order condition: every node’s key is greater than the keys in its left child subtree and smaller than the keys in its right child subtree.

For a given set of keys VV and a sequence σ\sigma, the cost of the optimal static BST is defined as the BST with minimal cost to serve σ\sigma, formally

STAT(σ)=minBt=1mdepth(B,σt)\displaystyle\operatorname{STAT}(\sigma)=\min_{B\in\mathcal{B}}\sum_{t=1}^{m}\operatorname{depth}(B,\sigma_{t}) (1)

where \mathcal{B} is the set of all binary search trees.

A dynamic BST is a sequence of BSTs, BtB_{t} with VV as the set of nodes. At each time step tt, the cost of searching (or serving) a key σt\sigma_{t} is depth(Bt,σt)\operatorname{depth}(B_{t},\sigma_{t}), but after the request is served the BST BtB_{t} can adjust its structure (while maintaining the symmetric order property).

The key point of this paper and in our model is that the cost assumptions differ. In previous studies, classical [8, 2, 27, 32], as well as more recent ones [16, 25, 13], to name a few, the cost of both a single link traversal and a single pointer change (usually called rotation) is O(1)O(1) units. In most of these algorithms the cost of reconfiguration of the tree is kept proportional to cost of accessing the recent key. In contrast, in the matching model the cost model assumes that any change from BtB_{t} to Bt+1B_{t+1} is possible (it could be a single rotation or a completely new tree) but at the cost of α\alpha units.

For a dynamic BST algorithm 𝒜\mathcal{A} the total cost of serving σ\sigma starting from a BST B=B1B=B_{1} is defined in the matching model as follows,

cost(𝒜,σ,B)=t=1mdepth(Bt,σt)+α𝕀BtBt+1\displaystyle\operatorname{cost}(\mathcal{A},\sigma,B)=\sum_{t=1}^{m}\operatorname{depth}(B_{t},\sigma_{t})+\alpha\mathbb{I}_{B_{t}\neq B_{t+1}} (2)

Where 𝕀e\mathbb{I}_{e} is the indicator function, i.e., 𝕀e=1\mathbb{I}_{e}=1 if the event ee is true and 0 otherwise. From a networking perspective the cost of a dynamic BST algorithm can be seen as the total delay to serve (i.e., route) requests to nodes in σ\sigma from a single source that is connected to the tree’s root.

In static optimality, which we next define, we want the dynamic BST algorithm to (asymptotically) perform well even in hindsight compared to the best static tree.

Definition 1 (Static Optimality)

Let STAT\operatorname{STAT} be the cost of an optimal static BST with perfect knowledge of the demand σ\sigma, and let On be an online algorithm for dynamic BST. We say that On is statically optimal if there exists some constant ρ1\rho\geq 1, and for any sufficiently long sequence of keys σ\sigma, we have that

cost(On,σ,B)ρSTAT(σ)\operatorname{cost}(\textsc{On},\sigma,B)\leq\rho\operatorname{STAT}(\sigma)

where BB is the initial BST from which On starts. In other words, On’s cost is at most a constant factor higher than STAT\operatorname{STAT} in the worst case.

Note that when α=O(1)\alpha=O(1) known algorithms for dynamic BST that are static optimal, e.g., [8, 27, 32, 16, 15], can be used to achieve static optimality in our model. The more interesting scenario is when α=ω(1)\alpha=\omega(1) and naive current algorithms with rotation cost (or splay to the root, or move to root) of α\alpha will not be static optimal. In this work, we assume for simplicity that α2\alpha\geq 2.

Let wi(t)w_{i}(t) be the number of appearances of key ii in σ\sigma up to time tt. Note that i=1nwi(t)=t\sum_{i=1}^{n}w_{i}(t)=t. For short let wi=wi(m)w_{i}=w_{i}(m). Let W¯=W¯(σ)\overline{W}=\overline{W}(\sigma) be the frequency (or empirical) distribution of σ\sigma, i.e., W¯={w1m,w2m,,wnm}\overline{W}=\{\frac{w_{1}}{m},\frac{w_{2}}{m},\dots,\frac{w_{n}}{m}\}, and W¯(t)\overline{W}(t) the frequency distribution up to time tt. It is a classical result that the optimal static BST for σ\sigma has amortized access cost of Θ(H(W¯))\Theta(H(\overline{W})) where HH is the entropy function [26], i.e., STAT(σ)=Θ(m(1+H(W¯(σ))))\operatorname{STAT}(\sigma)=\Theta(m(1+H(\overline{W}(\sigma)))).

The main result of this paper is a simple dynamic BST algorithm, A-BST, that is based on arithmetic coding [34] and in particular on a dynamic Shannon-Fano-Elias coding [14] and is statically optimal for sequences of length Ω(nαlogα)\Omega(n\alpha\log\alpha). Formally,

Theorem 1

A-BST (Algorithm 3) is a statically optimal, dynamic BST for sequences of length at least 2nαlogα2n\alpha\log\alpha.

The rest of the paper is organized as follows: In Section 2 we review related concepts like Entropy and the Shannon-Fano-Elias (SFE) coding with a detailed example. Section 3 presents A-BST including related algorithms and an example for its dynamic operation. In Section 4 we prove Theorem 1. Finally, we conclude with a discussion and open questions in Section 5.

2 Preliminaries

Entropy and Shannon-Fano-Elias (SFE) Coding.

Entropy is a known measure of unpredictability of information content [31]. For a discrete random variable XX with possible values {x1,,xn}\{x_{1},\dots,x_{n}\}, the (binary) entropy H(X)H(X) of XX is defined as H(X)=i=1npilog21piH(X)=\sum_{i=1}^{n}p_{i}\log_{2}\frac{1}{p_{i}} where pip_{i} is the probability that XX takes the value xix_{i}. Note that, 0log210=00\cdot\log_{2}\frac{1}{0}=0 and we usually assume that pi>0p_{i}>0 i\forall i. Let p¯\overline{p} denote XX’s probability distribution, then we may write H(p¯)H(\overline{p}) instead of H(X)H(X).

Shannon-Fano-Elias (SFE) [14] is a well-known symbol, prefix code for lossless data compression and is the predecessor of the more famous and used (variable block length) arithmetic coding [34]. The Shannon-Fano-Elias code produce variable-length code-words based on the probability of each possible symbol. As in other entropy encoding methods (e.g., Huffman), higher probability symbols are represented using fewer bits, to reduce the expected code length. Unlike Huffman coding the coding method works for any order of symbols and the probabilities do not need to be sorted. This will fit better with the searching property we need in the tree later on.

The encoding is based on the cumulative distribution function (CDF) of the probability distribution,

F(i)=jipj,\displaystyle F(i)=\sum\limits_{j\leq i}p_{j}, (3)

and encodes symbols using the function F¯\overline{F} where

F¯(i)=j<ipj+pi2=F(i1)+pi2.\displaystyle\overline{F}(i)=\sum_{j<i}p_{j}+\frac{p_{i}}{2}=F(i-1)+\frac{p_{i}}{2}. (4)

Denote by B(i)B(i) the binary representation of F¯(i)\overline{F}(i). The code-word C(i)C(i) for the ii’th symbol xix_{i} consists of the first (i)\ell(i) bits of the fractional part of B(i)B(i), denoted as

C(i)=B(i)(i),\displaystyle C(i)=\lfloor B(i)\rfloor_{\ell(i)}, (5)

where the code length (i)\ell(i) is defined as

i=log1pi+1.\displaystyle\ell_{i}=\left\lceil\log\frac{1}{p_{i}}\right\rceil+1. (6)

The above construction guarantees (i)(i) that the code-words C(i)C(i) are prefix-free and therefore appear as leaves in the binary tree that represent the code, and (ii)(ii) that the average code length LSFE(X)L_{\mathrm{SFE}}(X),

LSFE(X)=i=1npil(i)\displaystyle L_{\mathrm{SFE}}(X)=\sum\limits_{i=1}^{n}p_{i}\cdot l(i) =i=1npi(log1pi+1)\displaystyle=\sum\limits_{i=1}^{n}p_{i}(\lceil\log\frac{1}{p_{i}}\rceil+1) (7)

is close to the entropy H(X)H(X) of the random variable XX, and in particular,

H(X)+1LSFE(X)<H(X)+2,\displaystyle H(X)+1\leq L_{\mathrm{SFE}}(X)<H(X)+2, (8)
Example A Example B
Refer to caption Refer to caption
(a) (e)
Refer to caption Refer to caption
(b) (f)
Refer to caption Refer to caption
(c) (g)
Refer to caption Refer to caption
(d) (h)
Figure 2: Our two running examples A and B. The top row shows the SFE coding details, second row shows the functions F(i)F(i) and F¯(i)\overline{F}(i), the third row shows the SFE prefix-free code-words as a binary tree and the last row present the corresponding BST after running algorithm 2: PrefixTree2BST(TT).

For an illustration consider Example A in Figure 2 which we will use also later in the paper. Let XX be a random variable with five possible symbols {1,2,3,4,5}\{1,2,3,4,5\} and the corresponding probabilities {0.1,0.2,0.4,0.2,0.1}\{0.1,0.2,0.4,0.2,0.1\}. Figure 2 (a) provides the detailed parameters for the encoding of each symbol ii, including F(i),F¯(i)F(i),\overline{F}(i), the binary representation B(i)B(i), the code-word length (i)\ell(i) and the (prefix-free) binary code-word C(i)C(i). Additionally, Figure 2 (b) presents F(i)F(i) and F¯(i)\overline{F}(i) graphically and Figure 2 (c) shows a binary tree with leaves as the code-words where the path from the root to each leaf represent its (prefix-free) code-word. Figure 2 (d) shows the conversion of the prefix tree to a binary search tree which we will discuss next.

3 Arithmetic Binary Search Trees (A-BST)

The idea of arithmetic binary search trees is simple in principle. It is composed of three phases:

  1. 1.

    Use arithmetic coding, and in particular, Shannon-Fano-Elias coding to create an efficient, variable-length, prefix-free code for a given (empirical) distribution.

  2. 2.

    In turn, convert the binary tree of the prefix-free code to a biased binary search tree (with an entropy bound on its performance).

  3. 3.

    Dynamically update the empirical distribution, the prefix-free code and the corresponding binary search tree.

We first describe steps 11 and 22 that may be of independent interest and then the crux of the method which is the dynamic update process.

3.1 From Shannon-Fano-Elias (SFE) coding to BST

Algorithm 1 SFE-2-BST(P¯\overline{P}): Convert Arithmetic (SFE) coding to BST
1:A probability distribution P¯\overline{P}. pip_{i} is the probability of key ii, the key with rank ii in the sorted keys’ list.
2:A BST BB where key ii is at distance O(log1pi)O(\log\frac{1}{p_{i}}) from the root.
3:Create a prefix-free code-word for each key ii via SFE coding.
4:Create a binary tree TT from the SFE-code, with keys as leaves (i.e., binary trie).
5:BB = PrefixTree-2-BST(TT). \triangleright See Algorithm 2

Algorithms 1 and 2 shows how to create a near optimal biased BST for a given distribution via SFE coding. Algorithm SFE-2-BST (Algorithm 1) is first provided with a probability distribution P¯\overline{P} for the biased BST. Note that, w.l.o.g, the distribution is sorted by the keys values. From P¯\overline{P} the algorithm then creates a prefix-free binary code using SFE coding111Any other coding method that preserves the keys order can be used, e.g., Shannon–Fano coding [14] which split the probabilities to half, Alphabetic Codes [35] or Mehlhorn tree [26], but not e.g., Huffman coding [22], that needs to sort the probabilities. and the corresponding binary tree TT where keys (symbols) are at the leaves and ordered by their value (and not by probabilities). Next, the algorithm calls Algorithm PrefixTree-2-BST (Algorithm 2) which converts TT to a BST BB.

Algorithm 2 PrefixTree-2-BST(TT): Convert a binary prefix tree to BST
1:A prefix tree TT with keys as leaves in sorted order.
2:A BST BB where each key in BB is at least as close to the root as in TT.
3:if TT is of size one or NULL then
4:     return TT
5:else
6:     Let prepre be the rightmost leaf in the left subtree of TT (pre-order)
7:     Let postpost the leftmost leaf in the right subtree of TT (post-order)
8:     Let \ell be the leaf with lower depth between prepre and postpost
9:     Let BB be a new binary search tree with \ell as a root
10:     Delete \ell from TT
11:     BB.root.left==PrefixTree-2-BST(TT.root.left)
12:     BB.root.right==PrefixTree-2-BST(TT.root.right)
13:     return BB
14:end if

Algorithm 2 is a recursive algorithm that receives a binary tree TT with keys as leaves in increasing order. It creates a BST BB by setting the root of the tree as the key which is closer to the root between the two keys that are pre-order and post-order to the root (and then deleting the key from TT). The left and right subtrees of BB’s root are created by calling recursively the algorithm with the relevant subtrees of TT.

Recall Figure 2 (c) which shows a binary tree TT for the SFE code of the probability distribution in Example A. The keys 1,2,3,4,51,2,3,4,5 are leaves and in sorted order. This tree is an intermediate result of SFE-2-BST(P¯\overline{P}) algorithm. Figure 2 (d) presents the corresponding BST after calling PrefixTree-2-BST(T)(T). Initially 33 is selected as the root as it is closer to the root between keys 22 and 33. Then (after deleting 33 from the tree), the algorithm continues recursively, by building the children of BB’s root with the left and right subtrees of TT’s root. On the left, 2 is selected as the next root and then 1 as its left child. On the right 44 is selected as the next root and then 5 as its right child.

We can easily bound the depth of any key in the tree generated by SFE-2-BST.

Claim 1

For a key-sorted probability distribution P¯\overline{P}, the algorithm SFE-2-BST(P¯\overline{P}) creates a biased BST BB where for each key ii it holds depth(B,i)<log1pi+3\operatorname{depth}(B,i)<\log\frac{1}{p_{i}}+3.

Proof

The SFE code creates code-words with length i=log1pi+2\ell_{i}=\log\frac{1}{p_{i}}+2, so the tree TT in Algorithm SFE-2-BST has the leaves at depth i+1\ell_{i}+1. Algorithm PrefixTree-2-BST(TT) does not change the symmetric order of keys that are initially stored at leaves, and can only decrease the depth of any key. ∎

We next discuss how to make the BST dynamic.

3.2 Dynamic BST

Algorithm 3 provides a pseudo-code for a dynamic Arithmetic Binary Search Tree (A-BST) BtB_{t}. The algorithm starts from B1B_{1} which is a balanced BST over the nn possible keys (assuming no knowledge on the keys distribution). The algorithm maintains two probability distribution vectors that change over time, P¯={p1,,pn}\overline{P}=\{p_{1},\dots,p_{n}\} and Q¯={q1,,qn}\overline{Q}=\{q_{1},\dots,q_{n}\}. P¯\overline{P} holds the probability distribution by which the current BST was built and initially is set to the uniform distribution. Q¯\overline{Q} holds the empirical distribution of σ\sigma up to time tt and is updated on each request, i.e., qi(t)=wi(t)tq_{i}(t)=\frac{w_{i}(t)}{t} 222To avoid zero probabilities, in practice qi(t)q_{i}(t) is set by Laplace rule [12] to be wi(t)+1t+n\frac{w_{i}(t)+1}{t+n}, this does not change the main Theorem since for tnt\geq n we have wi(t)+1t+nwi(t)2t\frac{w_{i}(t)+1}{t+n}\geq\frac{w_{i}(t)}{2t}. For simplicity of exposition we assume qi(t)=wi(t)tq_{i}(t)=\frac{w_{i}(t)}{t}.. The crux of the algorithm is that, whenever a key ii is requested, if as a result pi<qi/2p_{i}<q_{i}/2, then P¯\overline{P} is updated to be equal to Q¯\overline{Q}, and a new BST is created (at cost of α\alpha) from P¯\overline{P}. Note that upon a request to key ii only qiq_{i} increases while all other probabilities decrease. So the condition pj<qj/2p_{j}<q_{j}/2 can only become valid for j=ij=i when key ii is requested.

We demonstrate the algorithm using Example B. Assume we have 55 keys 1,2,3,4,51,2,3,4,5 and after the first ten requests in σ\sigma we have Q¯=P¯={110,210,410,210,110}\overline{Q}=\overline{P}=\{\frac{1}{10},\frac{2}{10},\frac{4}{10},\frac{2}{10},\frac{1}{10}\}. Note that this is the empirical distribution of Example A, so B10B_{10} is the BST that is shown in Figure 2 (d). Next we assume that the 11th11th and 12th12th requests are for key 1. After the 11th11th request p1(11)=110p_{1}(11)=\frac{1}{10} and q1(11)=211q_{1}(11)=\frac{2}{11} so p1(11)>qi(11)/2p_{1}(11)>q_{i}(11)/2 and B11B_{11} remains as B10B_{10}. After the 12th12th request we have p1(12)=110p_{1}(12)=\frac{1}{10} and q1(12)=312q_{1}(12)=\frac{3}{12}, and now p1(12)<qi(12)/2p_{1}(12)<q_{i}(12)/2. So at time t=12t=12, a new SFE code and a new BST are created with the empirical distribution Q¯=P¯=312,212,412,212,112\overline{Q}=\overline{P}=\frac{3}{12},\frac{2}{12},\frac{4}{12},\frac{2}{12},\frac{1}{12}. The details of the code for this distribution are given in Figure 2 (e)-(g) and the new BST, B12B_{12} is shown in Figure 2 (h). Note that key 1 got closer to the root.

Next we analyse A-BST and show it is statically optimal.

Algorithm 3 A-BST: Simple Arithmetic BST
1:A BST BtB_{t}, P¯\overline{P} - the current tree distribution, Q¯\overline{Q} - empirical distribution (or model distribution). i,piqi/2\forall i,~{}p_{i}\geq q_{i}/2.
2:Dynamic BST Bt+1B_{t+1}, i,piqi/2\forall i,~{}p_{i}\geq q_{i}/2.
3:Upon request of key ii at time tt
4:Update Q¯\overline{Q} \triangleright qi(t)=wi(t)t=wi(t1)+1tq_{i}(t)=\frac{w_{i}(t)}{t}=\frac{w_{i}(t-1)+1}{t}
5:if pi<qi/2p_{i}<q_{i}/2  then \triangleright update P¯\overline{P} if needed
6:     Set P¯\overline{P} to Q¯\overline{Q}
7:     Bt+1=B_{t+1}= SFE-2-BST(P¯\overline{P}) \triangleright at cost O(α)O(\alpha)
8:else
9:     Bt+1=BtB_{t+1}=B_{t}
10:end if
11:Serve key ii on Bt+1B_{t+1} at cost O(log1pi)O(\log\frac{1}{p_{i}})

4 A-BST is statically Optimal

We now prove the main result of this work. Recall that we assume α2\alpha\geq 2 and the more intresting case is α=ω(1)\alpha=\omega(1).

Theorem 4.1

A-BST (Algorithm 3) is a statically optimal, dynamic BST for sequences of length at least 2nαlogα2n\alpha\log\alpha.

Proof

Let BtB_{t} be the BST tree of the algorithm at time tt. Let P¯t={p1(t),,pn(t)}\overline{P}_{t}=\{p_{1}(t),\dots,p_{n}(t)\} be the probability distribution by which BtB_{t} was build. Let Q¯t={q1(t),,qn(t)}\overline{Q}_{t}=\{q_{1}(t),\dots,q_{n}(t)\} be the frequency distribution of σ\sigma up to time tt. Note that qi(m)=wimq_{i}(m)=\frac{w_{i}}{m}. Observe that Algorithm 3 guarantees that for each ii and each tt we have pi(t)qi(t)2p_{i}(t)\geq\frac{q_{i}(t)}{2}. We first analyse the searching cost (or access cost [11]), costS(σ)\operatorname{cost}_{S}(\sigma) of Algorithm 3.

costS(σ)\displaystyle\operatorname{cost}_{S}(\sigma) =t=1mdepth(Bt,σt)\displaystyle=\sum_{t=1}^{m}\operatorname{depth}(B_{t},\sigma_{t}) (9)
t=1m(log(1pσt(t))+3)\displaystyle\leq\sum_{t=1}^{m}\left(\log(\frac{1}{p_{\sigma_{t}}(t)})+3\right) (10)
t=1m(log(2qσt(t))+3)\displaystyle\leq\sum_{t=1}^{m}\left(\log(\frac{2}{q_{\sigma_{t}}(t)})+3\right) (11)
4m+t=1mlog(1qσt(t))\displaystyle\leq 4m+\sum_{t=1}^{m}\log(\frac{1}{q_{\sigma_{t}}(t)}) (12)

where Eq. (10) is due to Claim 1.

Next we consider the cost t=1mlog(1qσt(t))\sum_{t=1}^{m}\log(\frac{1}{q_{\sigma_{t}}(t)}) and analyze each key ii at a time. We follow an idea mentioned in a classical paper on dynamic Huffman coding by Vitter [33] and is due to a personal communication of Vitter with B. Chazelle. Consider the last wi2\lfloor\frac{w_{i}}{2}\rfloor requests for key ii. For each such request, since wi(t)wi/2w_{i}(t)\geq w_{i}/2, we have qi(t)=wi(t)twi2twi2mq_{i}(t)=\frac{w_{i}(t)}{t}\geq\frac{w_{i}}{2t}\geq\frac{w_{i}}{2m} . So the sum of these wi2\frac{w_{i}}{2} requests is at most wi2log2mwi\frac{w_{i}}{2}\log\frac{2m}{w_{i}}.

Similarly for each jj’th request of key ii where wi2k<jwi2k1\lceil\frac{w_{i}}{2^{k}}\rceil<j\leq\lceil\frac{w_{i}}{2^{k-1}}\rceil where k=1,2,log(wi)k=1,2,\dots\lceil\log(w_{i})\rceil, qi(t)wi2kmq_{i}(t)\geq\frac{w_{i}}{2^{k}m} and the cost of all of these requests is less than wi2klogwi2km\frac{w_{i}}{2^{k}}\log\frac{w_{i}}{2^{k}m}. So for each key ii

σt=ilog(1qi(t)))\displaystyle\sum_{\sigma_{t}=i}\log(\frac{1}{q_{i}(t))}) k=1log(wi)wi2klog2kmwi\displaystyle\leq\sum_{k=1}^{\lceil\log(w_{i})\rceil}\frac{w_{i}}{2^{k}}\log\frac{2^{k}m}{w_{i}} (13)
k=1log(wi)wi2k(logmwi+log2k)\displaystyle\leq\sum_{k=1}^{\lceil\log(w_{i})\rceil}\frac{w_{i}}{2^{k}}(\log\frac{m}{w_{i}}+\log 2^{k}) (14)
wilogmwik=1log(wi)12k+wik=1log(wi)k2k\displaystyle\leq w_{i}\log\frac{m}{w_{i}}\sum_{k=1}^{\lceil\log(w_{i})\rceil}\frac{1}{2^{k}}+w_{i}\sum_{k=1}^{\lceil\log(w_{i})\rceil}\frac{k}{2^{k}} (15)
wilogmwi+2wi\displaystyle\leq w_{i}\log\frac{m}{w_{i}}+2w_{i} (16)

From this it follows that:

t=1mlog(1qσt(t)))\displaystyle\sum_{t=1}^{m}\log(\frac{1}{q_{\sigma_{t}}(t))}) i=1nwilogmwi+2wi\displaystyle\leq\sum_{i=1}^{n}w_{i}\log\frac{m}{w_{i}}+2w_{i} (17)
mH(W¯)+2m\displaystyle\leq mH(\overline{W})+2m (18)

A result identical to Eq. (18) was recently shown in [19]. Next we consider the adjustment cost. We do it again, one key at a time. For a key ii consider all the adjustment it caused (lines 5-6 in Algorithm 3). We consider two cases: i) all adjustment when wi(t)2αw_{i}(t)\geq 2\alpha and ii) wi(t)<2αw_{i}(t)<2\alpha. To prove these two cases we will use the following claim.

Claim 2

Let ii be the key that cause an adjustment (lines 5-6 in Algorithm 3) at time tt. Let tt^{\prime} be the time of the previous adjustment initiated by any key. Then wi(t)<wi(t)2w_{i}(t^{\prime})<\frac{w_{i}(t)}{2}.

Proof

By Algorithm 3 after the adjustment at time tt^{\prime}, P¯\overline{P} was set to Q¯\overline{Q} and in particular pi(t)p_{i}(t^{\prime}) was set to qi(t)q_{i}(t^{\prime}). The result follows since at time tt we have

pi(t)=pi(t)=wi(t)t\displaystyle p_{i}(t)=p_{i}(t^{\prime})=\frac{w_{i}(t^{\prime})}{t^{\prime}} <qi(t)2=wi(t)2twi(t)2t\displaystyle<\frac{q_{i}(t)}{2}=\frac{w_{i}(t)}{2t}\leq\frac{w_{i}(t)}{2t^{\prime}} (19)

We can now analyse the two adjustment cases.

Case i: wi(t)2αw_{i}(t)\geq 2\alpha.

First notice that if wi(t)2αw_{i}(t^{\prime})\geq 2\alpha then for each t>tt>t^{\prime} wi(t)2αw_{i}(t)\geq 2\alpha. Assume a time tt for which wi(t)2αw_{i}(t)\geq 2\alpha and lines 5-6 in Algorithm 3 was executed. Now consider time t<tt^{\prime}<t where the previous execution of lines 5-6 occurred.

From Claim 2 and using the assumption that wi(t)2αw_{i}(t)\geq 2\alpha, the number of requests to key ii since the last adjustment is:

wi(t)wi(t)wi(t)wi(t)2=wi(t)2α\displaystyle w_{i}(t)-w_{i}(t^{\prime})\geq w_{i}(t)-\frac{w_{i}(t)}{2}=\frac{w_{i}(t)}{2}\geq\alpha (20)

Therefore we can amortize the adjustment at time tt (of cost α\alpha) with the (at least) α\alpha cost of the adversary accessing the (at least) α\alpha requests to key ii between tt^{\prime} and tt.

Case ii: wi(t)2αw_{i}(t)\leq 2\alpha.

From Claim 2, wi(t)w_{i}(t) at least doubles between each adjustment that is caused by key ii. So until wi(t)2αw_{i}(t)\geq 2\alpha, key ii can cause at most log(α)+1\log(\alpha)+1 adjustments at a total adjustment cost that is less than 2αlogα2\alpha\log\alpha.

It follows that the adjustment cost, costA(σ)\operatorname{cost}_{A}(\sigma), of Algorithm 3 can be bounded as follows:

costA(σ)\displaystyle\operatorname{cost}_{A}(\sigma) =BtBt+1α\displaystyle=\sum_{B_{t}\neq B_{t+1}}\alpha (21)
=i=1npi<qi/2α=i=1n(pi(t)<qi(t)/2wi(t)<2αα+pi(t)<qi(t)/2wi(t)2αα)\displaystyle=\sum_{i=1}^{n}\sum_{p_{i}<q_{i}/2}\alpha=\sum_{i=1}^{n}\left(\sum_{\begin{subarray}{c}p_{i}(t)<q_{i}(t)/2\\ w_{i}(t)<2\alpha\end{subarray}}\alpha+\sum_{\begin{subarray}{c}p_{i}(t)<q_{i}(t)/2\\ w_{i}(t)\geq 2\alpha\end{subarray}}\alpha\right) (22)
i=1n(2αlogα+wi)\displaystyle\leq\sum_{i=1}^{n}\left(2\alpha\log\alpha+w_{i}\right) (23)
=2nαlogα+m2m\displaystyle=2n\alpha\log\alpha+m\leq 2m (24)

Equations (22) and (23) follows from cases (i) and (ii) above. Eq. (24) follows form the assumption that m2nαlogαm\geq 2n\alpha\log\alpha.

To finalize the proof we combine the above results to find the cost of A-BST.

cost(A-BST,σ,B1)\displaystyle\operatorname{cost}(\text{A-BST},\sigma,B_{1}) =t=1mdepth(Bt,σt)+α𝕀BtBt+1\displaystyle=\sum_{t=1}^{m}\operatorname{depth}(B_{t},\sigma_{t})+\alpha\mathbb{I}_{B_{t}\neq B_{t+1}} (25)
=costS(σ)+costA(σ)\displaystyle=\operatorname{cost}_{S}(\sigma)+\operatorname{cost}_{A}(\sigma) (26)
4m+mH(W¯)+2m+2m=m(8+H(W¯))\displaystyle\leq 4m+mH(\overline{W})+2m+2m=m(8+H(\overline{W})) (27)

5 Discussion and Open Questions

We believe that the matching model brings some interesting research directions. Our model can emulate the standard BST model when α=O(1)\alpha=O(1), while dynamic optimally is a major and a long-standing open question for this case [32], it may be possible that for some range of α\alpha the question becomes easier. Another property, possibly simpler, to prove or disprove approximation for, is a working-set like theorem [32]. A nice feature of the matching model is that it supports more complex networks than a BST. It is a future research direction to prove static optimality and other online properties for such networks.

Regarding A-BST, the algorithm can be extended in several directions to be more adaptive. First, it can work with a (memory) window instead of the whole history to become more dynamic. Careful attention is required to decide the window size in order not to lose the static optimally feature. More generally, as in arithmetic coding [14] which is an extension of the SFE coding, the prediction model can be independently replaced by a more sophisticated model than the current Laplace model.

References

  • [1] Alistarh, D., Ballani, H., Costa, P., Funnell, A., Benjamin, J., Watts, P., Thomsen, B.: A high-radix, low-latency optical switch for data centers. In: Proceedings of the 2015 ACM conference on SIGCOMM. pp. 367–368 (2015)
  • [2] Allen, B., Munro, I.: Self-organizing binary search trees. Journal of the ACM (JACM) 25(4), 526–535 (1978)
  • [3] Avin, C., Ghobadi, M., Griner, C., Schmid, S.: On the complexity of traffic traces and implications. Proceedings of the ACM on Measurement and Analysis of Computing Systems 4(1), 1–29 (2020)
  • [4] Avin, C., Griner, C., Salem, I., Schmid, S.: An online matching model for self-adjusting tor-to-tor networks. arXiv preprint arXiv:2006.11148 (2020)
  • [5] Avin, C., Mondal, K., Schmid, S.: Demand-aware network designs of bounded degree. Distributed Computing pp. 1–15 (2017)
  • [6] Avin, C., Schmid, S.: Renets: Toward statically optimal self-adjusting networks. arXiv preprint arXiv:1904.03263. To appear in APOCS 2020. (2019)
  • [7] Avin, C., Schmid, S.: Toward demand-aware networking: A theory for self-adjusting networks. ACM SIGCOMM Computer Communication Review 48(5), 31–40 (2019)
  • [8] Baer, J.L.: Weight-balanced trees. In: Proceedings of the May 19-22, 1975, national computer conference and exposition. pp. 467–472 (1975)
  • [9] Ballani, H., Costa, P., Behrendt, R., Cletheroe, D., Haller, I., Jozwik, K., Karinou, F., Lange, S., Shi, K., Thomsen, B., et al.: Sirius: A flat datacenter network with nanosecond optical switching. In: Proceedings of the 2016 ACM SIGCOMM Conference. pp. 782–797 (2020)
  • [10] Benson, T., Anand, A., Akella, A., Zhang, M.: Understanding data center traffic characteristics. In: Proc. 1st ACM Workshop on Research on Enterprise Networking (WREN). pp. 65–72. ACM (2009)
  • [11] Blum, A., Chawla, S., Kalai, A.T.: Static optimality and dynamic search-optimality in lists and trees. Algorithmica 36 (2016)
  • [12] Carnap, R.: On the application of inductive logic. Philosophy and phenomenological research 8(1), 133–148 (1947)
  • [13] Chalermsook, P., Jiamjitrak, W.P.: New binary search tree bounds via geometric inversions. In: 28th Annual European Symposium on Algorithms (ESA 2020). Schloss Dagstuhl-Leibniz-Zentrum für Informatik (2020)
  • [14] Cover, T.M., Thomas, J.A.: Elements of information theory. John Wiley & Sons (2012)
  • [15] Demaine, E.D., Harmon, D., Iacono, J., Kane, D., Pătraşcu, M.: The geometry of binary search trees. In: Proceedings of the twentieth annual ACM-SIAM symposium on Discrete algorithms. pp. 496–505. SIAM (2009)
  • [16] Demaine, E.D., Harmon, D., Iacono, J., Pătraşcu, M.: Dynamic optimality—almost. SIAM Journal on Computing 37(1), 240–251 (2007)
  • [17] Farrington, N., Porter, G., Radhakrishnan, S., Bazzaz, H.H., Subramanya, V., Fainman, Y., Papen, G., Vahdat, A.: Helios: a hybrid electrical/optical switch architecture for modular data centers. In: Proceedings of the ACM SIGCOMM 2010 conference. pp. 339–350 (2010)
  • [18] Ghobadi, M., Mahajan, R., Phanishayee, A., Devanur, N., Kulkarni, J., Ranade, G., Blanche, P.A., Rastegarfar, H., Glick, M., Kilper, D.: Projector: Agile reconfigurable data center interconnect. In: Proceedings of the 2016 ACM SIGCOMM Conference. pp. 216–229 (2016)
  • [19] Golin, M., Iacono, J., Langerman, S., Munro, J.I., Nekrich, Y.: Dynamic Trees with Almost-Optimal Access Cost. In: Azar, Y., Bast, H., Herman, G. (eds.) 26th Annual European Symposium on Algorithms (ESA 2018). Leibniz International Proceedings in Informatics (LIPIcs), vol. 112, pp. 38:1–38:14. Schloss Dagstuhl–Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany (2018). https://doi.org/10.4230/LIPIcs.ESA.2018.38, http://drops.dagstuhl.de/opus/volltexte/2018/9501
  • [20] Griner, C., Zerwas, J., Blenk, A., Ghobadi, M., Schmid, S., Avin, C.: Performance analysis of demand-oblivious and demand-aware optical datacenter network designs. arXiv preprint arXiv:2010.13081. (2020)
  • [21] Hamedazimi, N., Qazi, Z., Gupta, H., Sekar, V., Das, S.R., Longtin, J.P., Shah, H., Tanwer, A.: Firefly: A reconfigurable wireless data center fabric using free-space optics. In: Proceedings of the 2014 ACM conference on SIGCOMM. pp. 319–330 (2014)
  • [22] Huffman, D.A.: A method for the construction of minimum-redundancy codes. Proceedings of the IRE 40(9), 1098–1101 (1952)
  • [23] Kassing, S., Valadarsky, A., Shahaf, G., Schapira, M., Singla, A.: Beyond fat-trees without antennae, mirrors, and disco-balls. In: Proceedings of the 2017 ACM conference on SIGCOMM. pp. 281–294. ACM (2017)
  • [24] Knuth, D.E.: Optimum binary search trees. Acta informatica 1(1), 14–25 (1971)
  • [25] Levy, C., Tarjan, R.: A new path from splay to dynamic optimality. In: Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 1311–1330. SIAM (2019)
  • [26] Mehlhorn, K.: Nearly optimal binary search trees. Acta Informatica 5(4), 287–295 (1975)
  • [27] Mehlhorn, K.: Dynamic binary search. SIAM Journal on Computing 8(2), 175–198 (1979)
  • [28] Mellette, W.M., McGuinness, R., Roy, A., Forencich, A., Papen, G., Snoeren, A.C., Porter, G.: Rotornet: A scalable, low-complexity, optical datacenter network. In: Proceedings of the 2016 ACM SIGCOMM Conference. pp. 267–280 (2017)
  • [29] Roy, A., Zeng, H., Bagga, J., Porter, G., Snoeren, A.C.: Inside the social network’s (datacenter) network. In: Proc. ACM SIGCOMM Computer Communication Review (CCR). vol. 45, pp. 123–137. ACM (2015)
  • [30] Schmid, S., Avin, C., Scheideler, C., Borokhovich, M., Haeupler, B., Lotker, Z.: Splaynet: Towards locally self-adjusting networks. IEEE/ACM Transactions on Networking 24(3), 1421–1433 (2015)
  • [31] Shannon, C.E.: A mathematical theory of communication. The Bell system technical journal 27(3), 379–423 (1948)
  • [32] Sleator, D.D., Tarjan, R.E.: Self-adjusting binary search trees. Journal of the ACM (JACM) 32(3), 652–686 (1985)
  • [33] Vitter, J.S.: Design and analysis of dynamic huffman codes. Journal of the ACM (JACM) 34(4), 825–845 (1987)
  • [34] Witten, I.H., Neal, R.M., Cleary, J.G.: Arithmetic coding for data compression. Commun. ACM 30(6), 520–540 (Jun 1987). https://doi.org/10.1145/214762.214771, http://doi.acm.org/10.1145/214762.214771
  • [35] Yeung, R.W.: Alphabetic codes revisited. IEEE Transactions on Information Theory 37(3), 564–572 (1991)