<aside> 📌 These notes are based on the following source material: Keith's Slides on Splay Trees & Beyond Worst Case analysis, These notes for Cornell's CS3110, and Berkeley's CS61B

</aside>

Beyond Worst-Case Efficiency

Spatial Locality/Dynamic Finger Property

<aside> 🚧

The guarantees we want from our data structure depend on how we plan on using the data structure. Worst case efficiency, the workhorse of algorithmic analysis, works well when we know nothing about the structure of the data we'll be storing (except for a few foundational assumptions such as that the items are comparable). It also works well when the designers of the data structure do not know apriori the data access patterns that the client will have. Therefore, worst case analysis helps us understand how our data structure (or algorithm) will fair if the the sequence of inputs is constructed by an adversary.

</aside>

<aside> 🤖 Sometimes though (often?) we do know a few things about the structure of our data and the manner in which we'll be querying it. Remember that when discussing sorting, we were able to beat the $\Omega(n \lg n)$ lower bound by making judicious assumptions about the nature of our data. This gave us nifty algorithms — counting sort, radix sort, and bucket sort. In the realm of data structures, if we know the data access patterns, we can often beat the $O(\lg n)$ lower bound for a single lookup in a dictionary (amortized, of course).

</aside>

<aside> 📌 Before we see how we can beat this lower bound, we need to formalize a few things first.

</aside>

The Balance Property

The worst case lookup time in any BST is at least $\lg n$. That is, it's $\Omega(\lg n)$. Therefore, we say that a tree is balanced (or equivalently, a tree satisfies the balance property) if the amortized cost of any lookup in the tree is $\Omega(\lg n)$.

The Entropy Property

Suppose that we have a tree holding a sequence of keys $K = \left<k_1, k_2, \ldots k_n\right>$ . Suppose also that the queries we make on this tree are sampled from a known fixed discrete probability distribution $P = \left<p_1, p_2, \ldots p_n \right>$. That is every time we query the tree, the probability that this query is for some key $k_i$ is $p_i$. We can use this information to build an optimal binary search tree (For a more thorough treatment see Chapter 15 — DP — of CLRS). We can use the probabilities of the keys to ensure that keys with high probability are high up in the tree. The resulting tree will be a weight balanced BST. Taking a very brief detour into information theory, we know that the entropy of a discrete probability distribution is given by $H_P = \sum_i -p_i\lg p_i$. This value is equal to $\lg n$ if all items under the distribution have equal probability. This should remind you of worst case analysis where we assume that each key is as likely to be queried as any other key. $H_P = 0$ if only one elements is ever accessed — that element has probability $= 1$ and all the others have $0$ probability. A BST is said to satisfy the entropy property if the expected cost of a lookup is $\Omega(1 + H_P)$ — assuming we know the access probabilities of course. How should we interpret this? Well basically, the cost of a lookup should be less than logarithmic if some elements are more likely to be queried than others.

<aside> 🛠

You are almost certainly (vaguely?) aware of spatial locality — its that thing that makes iterating over a contiguous block of memory in order very desirable. The basic idea is that if an item at position $i$ in the space in question is queried for, it is highly likely that another query in the near future will query for items close to $i$. Before we proceed any further, I always find it useful, when attempting to understand spatial locality, to ask, "what is the space in question?" and "What distance metric are we using to define closeness?". In the first example, the space in question is the logical address space and we use pointer addition to figure out nearby locations. As another example, consider the task of iterating over all the keys of a binary search tree in sorted order. We can do this in linear time by doing an in-order traversal of the tree. Using aggregate analysis, we see that the amortized cost of accessing each key is $\Theta(1)$. This is waaay below the logarithmic lower bound!! How is that possible? It is possible because that lower bound assumes that each lookup starts at the root — in this case, we do not. We have a pointer (Keith calls it a finger) telling us where we left off, and whenever we call iter.next() we simply advance this pointer. Under this model, we would certainly hope that searching for keys that are spatially close (in key space) should be faster — faster than $\lg n$ — than searching those that are further apart. This is called the Dynamic finger property. A level linked balanced tree can be used to achieve this. This is a tree in which each node has an additional node to its neighbor (For those familiar with $B^+$ trees, this is why the leaves are all linked together — to facilitate range queries).

</aside>

Temporal Locality/Working Set Property

<aside> ⏰ Temporal locality is the whole reason why the caching industry exists. It is a heuristic that bets that if a key is queried, it is likely to be queried again in the near future. In data structures where the access pattern exhibits temporal locality, we often have a set of hot keys that are repeatedly accessed. This set if often much smaller than the total number of items in our tree. In a data structure with temporal locality, we would like to ensure that if the number of hot keys is $t$, it only takes time proportional to $O(\lg t)$ to access any hot item. A data structure with this property is said to ha a working set property. This property cannot be achieved without changing the structure of the tree any more than a cache can be constructed without evictions. Iacono's working set structure is an example implementation of a data structure with the working set property

</aside>

<aside> 💍 Now, you may wonder, is it possible to have a single data structure that satisfies all these properties? That is, the amortized cost of any lookup is logarithmic in the number of keys in the data structure. That when our keys have different access probabilities, the cost to lookup those with high access probabilities is less than logarithmic. That once we have queried for some key $\alpha$, the cost of accessing nearby keys such as $\beta \text { and } \gamma$ is much smaller than the cost of accessing keys further away such as $\chi \text { and } \psi$. And finally, that if the access pattern exhibits temporal locality, the cost of accessing the hot elements is logarithmic in the number of hot elements.

</aside>

<aside> 💍 Can we have one tree ...

</aside>

<aside> 💍 The answer is yes!! The Splay Tree is one such tree.

</aside>

<aside> 💍 To rule them all??

</aside>

Splay Trees

Tree Rotations

<aside> 🌳 A splay tree achieves all the 4 properties discussed above by moving nodes around the tree after each operation. in particular, whenever some element $x$ is accessed — through any of the tree's dictionary operations, we move it to the root of the tree in a process called splaying. In a splay tree, any sequence of $k$ operations takes a total runtime of $O(k \lg n)$. This means that on average, each operation takes $O(\lg n)$ — that is, the amortized runtime of each operation is logarithmic.

</aside>

And I Splay!

<aside> 🎼 'Cause I splay (splay), I splay (Hey), I splay (Okay), I splay (Okay) All day (Okay), I splay (Okay), I splay (Okay), I splay (Okay) We gon' splay (splay), gon' splay (Okay), we splay (Okay), I splay (Okay) I splay (Okay), okay (Okay), I splay (Okay), okay, okay, okay, okay Okay, okay, nodes, now let's get in rotation, 'cause I splay Okay, nodes, now let's get in rotation, 'cause I splay Prove to me you got some coordination, 'cause I splay

</aside>

<aside> 🦜 Splaying relies on tree rotations. Therefore, to understand it, we have to understand rotations. Rotations are structural changes to the tree that preserve the search tree invariant. There are two type — Left Rotations and Right Rotations.

In Left Rotations, we are given some node $x$ and we'd like to switch this node with its right child $y$. This child has a left child $\chi$ and a right child $\Psi$ either of which could be None. Note that $y$ cannot be None is a rotation is to proceed. $x$ too has a left child $\alpha$ which may be None as well. Because of the search tree property we know that: $\alpha < x < \chi < y < \psi$ . Our goal is to exchange $x \text { and } y$ in this ordering. We start by switching them — $y$ becomes the root of the local subtree and $x$ becomes its left child. Note that $\psi$ remains as the right child of $y$ since only it is greater than $y$. Similarly, $\alpha$ remains as the left child of $x$. Since all the slots in $y$ are now filled, we need to find a new home for $\chi$. We place it at the only remaining spot — as the right child of $x$.

Right rotations proceed similarly. We have some node $y$ and we with to exchange it with its left child $x$.

</aside>

Splaying: The Three Musketeers

To Splay some node $\lambda$, we repeatedly apply the three operations below until $\lambda$ becomes the root. We splay a node that was last touched in an operation. This means that we splay even when our access found that the key queries is not in the tree. This ensures that the tree's balance improves on each access. To that effect, each of the tree operations return the position at which the operation ended. We then splay from that position. Below, we discuss the three operations that together constitute a spaly operation.

Zig-Zig!

<aside> 👘 This applies when $\lambda$ is a left child of a left child or a right child of a right child.

When both the child and parent are left children, we have $\lambda < \chi < \psi$ — where $\chi$ is the parent of lambda and $\psi$ is the parent of chi. In this case we apply two right rotations. We right rotate the parent with the grandparent (right_rotate($\\psi$)) . Then we right rotate the grandparent (which by this point has become the parent) with $\lambda$ — right_rotate($\\psi$))

When both the child and parent are right children, we have $\lambda > \chi > \psi$ — where $\chi$ is the parent of lambda and $\psi$ is the parent of chi. In this case we apply two left rotations. We left rotate the parent with the grandparent — left_rotate($\\psi$) Then we left rotate the grandparent (which by this point has become the parent) with $\lambda$ — left_rotate($\\psi$))

</aside>

Zig-Zag!

<aside> 🏺 This applies when $\lambda$ is a right child of a left child or it is a left child of a right child.

When $\lambda$ is a right child of a left child, we have $\chi < \lambda < \psi$ . To splay, we begin by applying a left rotation to exchange $\lambda \text { and } \chi$ — left_rotate($\\chi$). We then apply a right rotation to exchange $\lambda$ with its grandparent — which after the first rotation it its parent. That is, we run right_rotate($\\psi$)

When $\lambda$ is a left child of a right child, we have $\chi > \lambda > \psi$ . To splay, we begin by applying a right rotation to exchange $\lambda \text { and } \chi$ — right_rotate($\\chi$). We then apply a left rotation to exchange $\lambda$ with its grandparent — which after the first rotation it its parent. That is, we run left_rotate($\\psi$)

</aside>

Zig!

<aside> ⚙ This is the last operation applied. When $\lambda$ becomes a child of the root, we with to exchange switch it with the root. A zig is a simple left or right rotation. If $\lambda$ is a right child, we run left_rotate(root). If it is a left child, we run right_rotate(root)

</aside>

Bonus

<aside> 🦒 Splay operations allow us to efficiently split trees at a given split point.

To split a tree at $x$, we start by finding s = succ(x). We then splay(s). This makes it the root of the tree. Now, we can cut either the left link. This gives us two trees: a left tree in which $x$ is the largest value and a right tree in which $s$ is the smallest value.

To join a left tree and a right tree, we begin by finding the largest value in the left tree and splaying it to the root. Note that this largest value has not right subtree (duh!). We then join the two trees by making the root of the right tree a right child of the new root.

</aside>

Final Note

<aside> 📌 Do note that the version of the tree discussed thus far is a bottom up splay tree. We also have another variant called the top down splay tree. The difference between these two tree types stems from how the implement the splay operation. As discussed above, the bottom up tree implements splay by moving nodes to the root of the tree using a series of rotations. On the other hand, the top down splay tree implements splay by maintaining 3 trees and moving nodes between them after each operation. In practice, the top down procedure is preffered to the bottom up procedure.

</aside>