Treaps: Randomized search trees
A treap stores items in sorted order and offers efficient lookup, addition and removal of items.
If you could use only one data structure, which one would you choose? A hash table? While it supports the basic lookup, addition and removal operations, it doesn’t keep the elements sorted. Therefore it can’t efficiently perform some common tasks, such as finding the minimum element or producing an ordered list of all elements.
What would you require of this ideal, sole structure? It should be easy to use (and preferably easy to implement) and it should be efficient in terms of both speed and memory.
The randomized search tree by Seidel and Aragon, described in “Randomized Search Trees” (Algorithmica, 16(4/5):464–497, 1996), fulfills all of these requirements, and it offers the functionality of a general-purpose sorting routine, priority queue, hash table, stack, or standard queue.
Data Structure
The data structure consists of a balanced binary tree. Therefore, an element is never more than about log n steps away, where n is the number of elements in the treap and log n is the binary logarithm.
Being able to access the elements in the treap in log n time is a big improvement over some of the most common basic data structures.
Insert | Find | Min | |
---|---|---|---|
Unsorted array | 1 | n | n |
Sorted array | n | log n | 1 |
Unsorted linked list | 1 | n | n |
Sorted linked list | n | n | 1 |
Hash table | 1 | 1 | n |
Treap | log n | log n | log n |
The treap is reasonably space efficient as well. Each element is stored in a node in a tree, and each node contains an integer and two references, one to the left subtree and one to the right subtree. Hence, to store n elements in a treap, we need n integers and 2n references.
A typical hash-table implementation uses n integers for the hash values and n references for the linked lists. Furthermore, it uses an array with a size that is typically close to n. For example, with a load factor of 1.0, the hash table will use the same amount of extra space as a treap if the table is full, and more space if it’s only partly filled.
Implementation
A search tree consists of a number of nodes. Each node contains a key and has two references, one to the left subtree and one to the right.
E / \ B H / / \ A F K
Note that the tree is ordered. All keys in the left subtree are smaller than the key in the node, and all keys in the right subtree are larger. This property holds for every node in the tree. Therefore, it’s possible to search the tree in a simple manner.
For example, to check if the key F is present in the tree,
- you start by comparing F to the key in topmost node (the root) of the tree. F is larger than E and you know that F must be in the right subtree (if it’s in the tree at all).
- Next compare F to H, the root of the right subtree. It’s smaller, and you turn to the left, where you find the key you are looking for.
You can use the same technique to insert a new node. If the key of the new node isn’t already present in the tree, the search will take you to a position at the bottom of the tree where the new node can be inserted.
But you still have a potential problem. The algorithm above is efficient only if the tree is well balanced. For example, if each node in the tree has only one child, the structure will behave just like a linked list: You have to visit every node in the tree to find the one at the bottom.
There are plenty of algorithms available that try to keep a search tree balanced. Most of them are rather complicated. The treap offers a simple solution.
You will use rotations (the simplest possible rebalancing operation) to help keep the tree balanced. The idea is to change the form of the tree without disturbing the order of the keys. The left tree in this figure is transformed into the right one by a right rotation.
B A / \ / \ A <-> B / \ / \
This operation is easy to implement – only two references need to be updated, and the order of the keys is preserved. The corresponding balancing operation that transforms the right tree into the left one is called a left rotation.
Now comes the stroke of genius. You want the tree to look as if it was constructed from a random sequence of updates. (Random trees are nicely balanced. The expected length of an average search path is roughly 1.4 log2 n - 2, and the expected length of the longest search path is about 4.3 log2 n.)
To achieve this,
- you assign a random priority number to each node before you insert it into the tree,
- and you make sure that each node in the tree has higher priority than its children.
If you do this, the tree will have the same form as if the nodes had been added in priority order, that is random order.
Insert
The insertion algorithm is best explained by an example.
(a) E3 (b) E3 / \ / \ B5 H7 B5 H7 / / \ / / \ A6 F9 K8 A6 G2 K8 \ / G2 F9 (c) E3 (d) G2 / \ / \ B5 G2 E3 H7 / / \ / \ \ A6 F9 H7 B5 F9 K8 \ / K8 A6
In Fig. A the node G (with priority 2) has just been added to the treap. It has been placed in the correct position with respect to the ordering of the keys, but it violates the priority requirement: It has higher priority than its parent.
- To fix this you perform a left rotation, to move it higher up in the tree. (Fig. B)
- G still has higher priority than its parent and we perform a right rotation. (Fig. C)
- Finally, G is rotated up to the root of the tree. (Fig. D)
Now the tree fulfills both the key order and the priority order requirements.
At first this algorithm may seem hard to implement. It looks as if you would have to walk up the tree, which is hard since the references point in the other direction. However, this problem is easily overcome if you formulate the insertion algorithm recursively.
// Insert a new node in the tree and return the updated tree. Algorithm insert(node, tree) if tree is empty return node if node.key < tree.key tree.left ← insert(node, tree.left) if tree.prio > tree.left.prio rotate tree to the right else tree.right ← insert(node, tree.right) if tree.prio > tree.right.prio rotate tree to the left return tree
Delete
Compared to those of other balanced search tree schemes, the delete operation is also quite simple.
- If the node to be deleted is at the very bottom of the tree, you remove it directly and no rebalancing is necessary.
- If the node has only one nonempty subtree, you remove the node and replace it with the root of the subtree.
- If the node has two nonempty subtrees, you look at the priority numbers of the roots of the subtrees. If the left subtree has the highest priority, you rotate the root of this subtree to the left; otherwise, rotate the root of the other subtree to the right. This will move the node to be deleted further down the tree and you have reduced the original problem to a smaller one that may be solved by calling the delete algorithm recursively.
In Dr. Dobb’s Journal, pp. 40-44, Vol. 267, July 1997.