Skip to main content Link Search Menu Expand Document (external link)

Parallel Text Index Construction

DOI pdf

Abstract

The focus of this dissertation is the parallel construction of text indices. Text indices provide additional information about a text that allow to answer queries faster. Full-text indices for example are used to efficiently answer phrase queries, i.e., if and where a phrase occurs in a text. The research in this dissertation is focused on but not limited to parallel construction algorithms for text indices in both shared and distributed memory. In the first part, we look at wavelet trees: a compact index that generalizes rank and select queries from binary alphabets to alphabets of arbitrary size. In the second part of this dissertation, we consider the suffix array—one of the most researched text indices.The suffix array of a text contains the starting positions of the text’s lexicographically sorted suffixes, i.e., we want to sort all its suffixes. Finally, we use the distributed suffix arrays (and LCP arrays) to compute distributed Patricia tries. This allows us to answer different phrase queries more efficiently than using only the suffix array.

BibTeX

@phdthesis{Kurpicz2020,
  author    = {Florian Kurpicz},
  title     = {Parallel Text Index Construction},
  doi       = {10.17877/DE290R-21114},
  school    = {Technical University of Dortmund, Germany},
  year      = {2020}
}

PDF