Database Architects: The Case for B-Tree Index Structures

Saturday, December 23, 2017

The Case for B-Tree Index Structures

Recently a very interesting paper made a Case for Learned Index Structures. It argued that we could, and perhaps should, replace traditional index structures with machine learning, using the following reasoning: If we consider the leaf pages of an index as a sorted array, the inner pages of the index point towards a (bucketized) position within that array. Which means that it essentially describes the cummulative distribution function (CDF), mapping from keys to array positions.

And the argument of that paper was that using machine learning we can do that mapping much better because a) the learned model (in this case neuronal network) is much smaller than a traditional b-tree, and b) the learned model can predict the CDF value much more accurately than a simple b-tree, which improves performance.

Now I am all in favor of trying out new ideas, and adapting to the data distribution is clearly a good idea, but do we really need a neural network for that? Because, after all, the neuronal network is just an approximation of the CDF function. There are many other ways to approximate a function, for example spline interpolation: We define a few knots of the spline, and then interpolate between the knots. For example (picture by D.J. Graham)

Thus, what we need for a spline are a sequence of knots where we can interpolate between, i.e., a sequence of (x,y) values.

Now, if we think back about traditional index structures, in particular B-trees, we see that they have something similar:

The inner pages consist of separator values, and offsets to the next lower level. Well, we can interpret that as a spline. Instead of just going down to the next level and then doing binary search in the next node, we can interpret our search key as position between the two separators, and then interpolate the position of our search key one the next level. This estimate will be slightly off, of course, but the same is true for the machine learning approach, and we can use the same binary search strategy starting from our estimated position. We can use that interpolation strategy on all levels, both when navigating the inner pages and then going down to the leaf nodes.

How well does that work in practice? The learned indexes paper gives accuracy results and performance results for different sizes of neuronal network models. In the paper the b-trees are depicted as being very large, but in reality that is a parameter, of course. We can get arbitrarily sized b-trees by modifying the page size of the b-tree. For comparisons we chose the b-trees to have the same size (in KB) as the neuronal networks reported in the paper. The source code of the learned indexes approach is not available, thus we only report the original numbers for the neuronal networks. Our own proof of concept code is available upon request. As data sets we used the map set and the lognormal set mentioned in the paper, as we could not obtain the other data sets.

If we just look at the accuracy of the prediction of the final tuple we get as average error the number shown below. For the b-trees we report distance between the estimated position and the real tuple position, averaged over all elements in the data set. For the neuronal networks the wording in the paper is a bit unclear, we think the numbers are the average of the average errors of the second level models, which might be slightly different.

Map data	size (MB)	avg error
Learned Index (10,000)	0.15	8 ± 45
Learned Index (100,000)	1.53	2 ± 36
Complex Learned Index	1.53	2 ± 30
B-tree (10,000)	0.15	225
B-tree (100,000)	1.53	22

Log normal data	size (MB)	avg error
Learned Index (10,000)	0.15	17,060 ± 61,072
Learned Index (100,000)	1.53	17,005 ± 60,959
Complex Learned Index	1.53	8 ± 33
B-tree (10,000)	0.15	1,330
B-tree (100,000)	1.53	3

If we look at the numbers, the interpolating b-tree doesn't perform that bad. For the map data the learned index is a bit more accurate, but the difference is small. For the log normal data the interpolating b-tree is in fact much more accurate than the learned index, being able to predict the final position very accurately.

What does that mean for index performance? That is a complicated topic, as we do not have the source code of the learned index and we do not even know precisely on which hardware the experiments were run. We thus only give some indicative numbers, being fully aware that we might be comparing apples with oranges due to various differences in hardware and implementation. If we compare the reported numbers from the paper for lognormal with our proof of
concept implementation (running on a i7-5820K @ 3.30GHz, searching for every element in the data set in shuffled order) we get

Log normal data	Total (ns)	Model (ns)	Search (ns)
Learned Index (10,000)	178	26	152
Learned Index (100,000)	152	36	127
Complex Learned Index	178	110	67
B-tree (10,000)	156	101	54
B-tree (100,000)	171	159	12

Again, the b-tree does not perform that bad, being virtually identical to the reported learned index performance (remember the caveat about hardware differences!). And the b-tree is a very well understood data structure, well tested, with efficient update support etc., while the machine learning model will have great difficulties if the data is updated later on. Thus, I would argue that traditional index structures, in particular b-trees, are still the method of choice, and will probably remain so in the foreseeable future.

Does this mean we should not consider machine learning for indexing? No, we should consider everything that helps. It is just that "everything that helps" does not just include fashionable trends like machine learning, but also efficient implementations of well known data structures.

21 comments:

Todd HoffDecember 23, 2017 at 9:33 PM
Why do I get a feeling this is the modern version of John Henry? Instead of man against machine, we have the old machines competing against the new AIs.
ReplyDelete
Replies
Tim KraskaDecember 24, 2017 at 5:12 AM
Hi Thomas,

Great to see your interest in learned indexes. Yet, we would like to clarify a few things:

- Why not use other models than NN: We could not agree more. The main point of the paper is to offer a new view on how to design data structures and algorithms, and we make the case that machine learning can help. We just use neural nets because of their generality and potential for TPUs. At the same time, many other types of models can work and might be better. Ideally, the system would try automatically different types of models from B-Trees to splines to neural nets. It just always depends on the use case what works best. For example in the log-normal data set, the log-normal CDF function would probably be the smallest and fastest index structure available.

- Performance results: we tried your described approach in one of the first iterations and it had comparable performance to our B-Tree implementation. In fact, there is another paper under submission from Brown, which studies how the leaf nodes of a BTree can be merged using linear functions. However, we did find that the search between the layers of the BTree (even with interpolation search) has a negative impact of the performance. In our experiments your described technique was roughly 2x slower than the best learned indexes.

The best indicator that it is an apples-to-oranges comparison can be seen in your B-Tree(10,000) case vs our B-Tree implementation. The avg. error for your B-Tree(10k) case is 225 but the search takes only 54ns. In contrast, our most fine-grained B-Tree with an average error of 4 takes 52ns to find the data. With an average error of 128 (page size 512) it takes 154ns in our paper, so 3x longer than your implementation while still having a smaller average error (I make the assumption here, that the average error between B-Trees is actually comparable.)

There might be several factors contributing to it:
(1) The hardware as you already pointed out.
(2) The record size. We always used records with a key and a payload and we already know that the payload can have a significant impact.
(3) Our general learned index framework and other implementation details.

In addition, it would be interesting to know how the performance numbers for the map data looks like. Our guess is, that they are worse than the log-normal performance numbers given the higher error. At the same time we report even better numbers for them (under 100ns)

- On Inserts: You statement that "machine learning model will have great difficulties if the data is updated later on" is not so clear to us. In fact, if the new data follows roughly the same trend/distribution of the existing data, even inserts could become faster, ideally O(1). To some degree the rebalancing of a B-Tree is nothing else than retraining a model and there is more and more work on how to provide better guarantees for ML under changing conditions. But clearly more research is needed here to understand this better.

- Your final words that we should try everything that helps including efficient implementations. Yes and double yes! Learned indexes are just another tool and it highly depends on the use case. Our hope is that further research will continue to refine that tool and understand those use cases so that learned indexes are trusted as much as B-Trees.

Best,

Tim, Alex, Alkis, Ed, Jeff
ReplyDelete
Replies
Full Infinity FlameDecember 24, 2017 at 5:33 AM
I suspect at least some of the learned index work is designed specifically for Google's tensor units and their low-accuracy but high-parallelism computations.
ReplyDelete
Replies
Tim KraskaDecember 24, 2017 at 4:12 PM
Hi Thomas,

no worries! Family always comes first and I am also pretty busy right now with Christmas preparations. Just a "quick" answer to your two questions:

For updates, the difference between BTrees and learned indexes is that the available space is more intelligently spread. This allows for much more O(1) inserts. Plus it can be really O(1), as in the case of the BTree you still need to search the key, which is O(log n). The idea also better separates the processes of inserting and adding space. For example, you could insert space during night for the best performance during the day. But you are right, if the distribution shifts, this is not yet as well-understood and a great future research direction (Alkis and I had plenty of discussions about it).

On the high log-normal error: yes, this is because of a particularity of our training process and the std err alone is not a good indicator here. The reason why we also included the std. err variance between buckets. However, a mean-value or better a per bucket-size-weighted std. err/mean would be more representative; something we can fix in the next revision of the paper. Let me send you more details on it after Christmas when I have time to dig up the numbers.
However note, that with small changes in the model search process, we could (easily) achieve even much better numbers for the log-normal data than for the map-data as it is not hard to learn the often simple distributions of a data generators. We will also expand on this in the next revision of the paper.

Glad to hear, that the paper achieved its main goal to offer a new tool and view on indexing. However, I do see a lot of potential in the idea, especially when combined with clever auto-tuning of the models and the hybrid indexing idea. The hybrid index can take advantage of the distribution where possible and degrades to a BTree where it does not make sense. So even without GPUs/TPUs it should provide significant benefits.

Merry Christmas to you and your family and let's catch up after the holidays,

Tim
ReplyDelete
Replies
Mark CallaghanDecember 25, 2017 at 8:07 PM
I am interested in the topic for read-only index structures like the per-SST block indexes and bloom filters in an LSM. How can space and search efficiency be improved compared to what is currently done for RocksDB. The SILT paper has interesting results on that topic for the SortedStore - https://www.cs.cmu.edu/~dga/papers/silt-sosp2011.pdf
ReplyDelete
Replies
AnonymousJanuary 16, 2018 at 11:16 AM
You point out that not having any code available from the "learned index" paper makes comparison difficult, yet you decided to make your own code available "upon request" only. Both choices make the results harder to reproduce and study, they hamper experimenting with the design space.
ReplyDelete
Replies
VinayakNovember 19, 2019 at 2:56 AM
Hi Thomas,

Started reading the paper some days back.
Sure there is great potential in the idea of replacing the traditional indexes with the machine learning models. But while researching i came across your blog discussions with Tim and Alex, and it is definitely helpful to carry out the future research in software systems direction.

Just a quick question. So it's almost been 2 years since paper has released, is there anything changed in database community due to this paper? and also since you mentioned you had tried out the implementation of the paper. Would it be possible for you to share it across?

Thanks

Vinayak

ReplyDelete
Replies
VinayakNovember 19, 2019 at 3:44 PM
Hi Thomas,

Thank you so much for the information.
Is there any implementation available for "The Case for learned index structure".?
I checked out the "Learning to Search", but it seems unavailable for now in the above mentioned website.

Thanks,

Vinayak
ReplyDelete
Replies
vinayakDecember 3, 2019 at 3:23 AM
Hi Thomas,

Thank you for the update, was following up the ML for systems accepted papers page for last few days,but what i saw the topic "learning to search" was removed suddenly yesterday. Is it replaced with "SOSD: A Benchmark for Learned Indexes." and are this both papers same?

Thanks,

Vinayak
ReplyDelete
Replies

Add comment