Slides presented at 2024 DCC: "Faster Wavelet Tree Queries"
Abstract
Given a text, rank and select queries return the number of occurrences of a character up to a position (rank) or the position of a character with a given rank (select). These queries have applications in, e.g., compression, computational geometry, and pattern matching in the form of the backwards search – the backbone of many compressed full-text indices. A wavelet tree is a compact data structure that for a text of length n over an alphabet of size σ requires only n⌈logσ⌉(1+o(1)) bits of space and can answer rank and select queries in Θ(logσ) time. Wavelet trees are used in the applications described above. In this paper, we show how to improve query performance of wavelet trees by using a 4-ary tree instead of a binary tree as basis of the wavelet tree. To this end, we present a space-efficient rank and select data structure for quad vectors. The 4-ary tree layout of a wavelet tree helps to halve the number of cache misses during queries and thus reduces the query latency. Our experimental evaluation shows that our 4-ary wavelet tree can improve the latency of rank and select queries by a factor of ≈2 compared to the wavelet tree implementations contained in the widely used Succinct Data Structure Library (SDSL).