Skip to main content Link Search Menu Expand Document (external link)

On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

DOI arXiv slides pdf

Abstract

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with pp processors. Given a static text of length nn, we first show how to compute the suffix array interval of a given pattern of length mm in O(mp+lgp+lglgplglgn)O(\frac{m}{p} + \lg p + \lg\lg p \cdot \lg\lg n) time for pmp \leq m. For approximate pattern matching with kk differences or mismatches, we show how to compute all occurrences of a given pattern in O(mkσkpmax{k,lglgn}+(1+m/p)lgplglgn+occ)O(\frac{m^k\sigma^k}{p} \max\{k, \lg\lg n\} + (1+m/p)\lg p \cdot \lg\lg n + occ) time, where σ\sigma is the size of the alphabet and pσkmkp \leq \sigma^k m^k. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns PP and PP', we present a data structure for computing the interval of PPPP' in O(lglgn)O(\lg\lg n) sequential time, or in O(1+lgplgn)O(1 + \lg_p\lg n) parallel time. All our data structures are of size O(n)O(n) bits (in addition to the suffix array).

BibTeX

@inproceedings{Kurpicz2016CPM,
  author    = {Johannes Fischer and Dominik Köppl and Florian Kurpicz},
  title     = {On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching},
  booktitle = {CPM},
  series    = {LIPIcs},
  volume    = {54},
  pages     = {26:1–26:11},
  publisher = {Schloss Dagstuhl - Leibniz-Zentrum für Informatik},
  doi       = {10.4230/LIPIcs.CPM.2016.26},
  year      = {2016}
}

PDF