Faster Block Tree Construction

Abstract

The block tree [Belazzougui et al. J. Comput. Syst. Sci. ’21] is a compressed text index that can answer access (extract a character at a position), rank (number of occurrences of a specified character in a prefix of the text), and select (size of smallest prefix such that a specified character has a specified rank) queries. It requires O(z log(n/z)) words of space, where z is the number of Lempel-Ziv factors of the text. For some highly repetitive inputs, a block tree can require as little as 0.015 bits per character of the text. Small values of z make the block tree a space-efficient alternative to the wavelet tree, which is another index for these three types of queries. While wavelet trees can be constructed fast in practice, up so far compressed versions of the wavelet tree only leverage statistical compression, meaning that they are blind to spaced repetitions.

To make block trees usable in practice, a first step is to find ways in constructing them efficiently. We address this problem by presenting a practically efficient construction algorithm for block trees, which is up to an order of magnitude faster than previous implementations. Additionally, we parallelize our implementation, making it the first block tree construction implementation that works in parallel in shared memory.

BibTeX

@inproceedings{Kurpicz2023ESA,
  author    = {Dominik Köppl and Florian Kurpicz and Daniel Meyer},
  title     = {Faster Block Tree Construction},
  booktitle = {ESA},
  series    = {LIPIcs},
  volume    = {274},
  pages     = {74:1--74:20},
  publisher = {Schloss Dagstuhl - Leibniz-Zentrum f{"{u}}r Informatik},
  doi       = {10.4230/LIPIcs.ESA.2023.74},
  year      = {2023}
}

Faster Block Tree Construction

Abstract

BibTeX

PDF