Thursday, February 18, 2010

Treaps - Best data structure since sliced bread

I will be the first to admit that data structures are not my strong point. The wonderful thing about data structures is that once someone shows you how one works - its easy to understand, but trying to do something with the wrong data structure is a performance killer.

While updating the AFF4 map implementation I had a requirement for a unique data structure. First a quick recap of the AFF4 map implementation. AFF4 maps are a list of points which specify a linear transormation between the map stream and one or more backing streams (which may be maps or images or whatever). For example:

map stream offset, target offset, target URL

This means that data in the map stream from offset 0 to 500 will be take from offset 100 to 600 in file://file1.dd. From byte 500 to 800, the data will be taken from offset 1000 to 1300 in file2.dd etc.

In order to satisfy a read request at a specific offset we need to quickly search the map for the two elements immediately before and after the offset. For example if I want to read 100 bytes from offset 450 in the map stream, I need to find row 1 and row 2 above:



I then extrapolate the target offset and realise that I only have 50 bytes available from offset 100 + 450 = 550 in file1.dd.

When the map is huge the speed of this search is very important. The interesting thing is that I will rarely need an exact match - I really need a range.

My initial implementation used a sorted list for the map, sorted by the stream offset. I then did a binary search for the stream offset and retreived the index before and after the final point. Binary searching is O(lg N) and storage efficiency is excellent - no overhead is requires as the array can simply be allocated at once.

The biggest problem with this approach is that it is not possible to add points to the map at the same time as reading them. This is because the array needs to be sorted before you can binary search it. If we just add all the elements in batch and then sort we can get away with O(1) on adding and then O(n) for the sort. There are a number of applications, however, where we need to be able to read the map while we are writing it. In particular whenever we need to know if a point is already present in the map, before adding a new one we need to sort it first. Sorting upon each insertion will result in O(n lg n) for insertion which is very poor.

So I needed a data structure which:

  • At least O(lg N) on retrieval
  • Better than O(n log n) on insertion.
  • Reasonably efficient for sorting.
  • Most importantly it has to be a data structure which can retrieve the nearest match rather than an exact match. We need to be able to identify the highest offset below the query point, and the lowest offset above the query point.

A treap is the perfect solution:

  • It has O(lg n) on insertion and retrieval
  • Its possible to do a previous search and next search to retrieve the elements which are before and after the query point. This gives us direct ranging.
  • The treap is already sorted - to dump a sorted list, just traverse the treap in forward order.

Reading the wikipedia page above made my head spin. Implementing one would be tricky. Luckily I found this - Awesome.

How to actually use it?

The .h file defines a bunch of macros (huge macros I might add) which implement the basic functions required for a treap implementation. I guess this is kind of like c++ templates - the idea is to have the macros define the algorithm and fit it to any struct you need.

So there are two major structures a tree_t struct (the names are settable as args to the macro), and a node_t struct. The node carries whatever information you want to store in each node, while the tree just stores the head of the treap (and is used in all operations on the treap).

You start off by defining the node:

typedef struct map_point_node_s map_point_node_t;
struct map_point_node_s {
uint64_t image_offset;
uint64_t target_offset;
// This is a pointer into the target list (i.e. its not unique to
// this node).
RDFURN target;

// The link is what makes this node part of the tree.
trp_node(map_point_node_t) link;

// This just defines the tree type.
typedef trp(map_point_node_t) map_point_tree_t;

Inside our C file we now just generate all the functions we need to manipulate the tree. We need to define a comparison function so the implementation can order elements in the tree. The comparison function is called on two nodes to decide if they are the same or one is bigger than the other. In our case we only care about the stream offsets:

static int map_point_cmp(map_point_node_t *a, map_point_node_t *b) {
int rVal = (a->image_offset > b->image_offset) - \
(a->image_offset <>image_offset);

return rVal;

/* This huge macro generates all the tree tranversal and searching
trp_gen(static, tree_, map_point_tree_t, map_point_node_t, \
link, map_point_cmp, 1297, 1301);

In other words it will make static functions like tree_insert(), tree_remove(), tree_search() etc. Note that the macros do not manage memory at all, they just maintain the pointers to each node. This means you still need to worry about allocating and deleting individual nodes - you might want to allocate a slab for a bunch of nodes or allocate each node separately.

Now to use the functions:

// Statically allocate a tree
map_point_tree_t tree;

// Initialise it
tree_new(&tree, 42);

// Make a new node
map_point_node_t *node = calloc(sizeof(map_point_node_t));

//set the key:
node->stream_offset = 5;

// add it
tree_add(&tree, node);

// retrieve it - we make a static node, fill it with the key and then search for the node in the tree which matches it:
map_point_node_t query;

query.stream_offset = 5;
// This will be NULL if the query is not in the tree
node = tree_search(&tree, &query);

You can also do tree_nsearch() and tree_psearch() to retrieve the nodes before and after this one.

To dump the tree in sorted order we define an iterator function callback and run it on all the tree:

static map_point_node_t *inline_map_iterate_cb(map_point_tree_t *tree, map_point_node_t *node, void *data) {
return NULL;

then (data is a pointer passed to the callback):

tree_iter(&tree, NULL, inline_map_iterate_cb, (void *)data);

Thats it.

This can be used very easily to build a fast data tree as well - making it an awesome substitute for a hash table or a dictionary. Even when we just want to store a bunch of items by a key.

No comments: