On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Abstract

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with $p$ processors. Given a static text of length $n$ , we first show how to compute the suffix array interval of a given pattern of length $m$ in $O(\frac{m}{p} + \lg p + \lg\lg p \cdot \lg\lg n)$ time for $p \leq m$ . For approximate pattern matching with $k$ differences or mismatches, we show how to compute all occurrences of a given pattern in $O(\frac{m^k\sigma^k}{p} \max\{k, \lg\lg n\} + (1+m/p)\lg p \cdot \lg\lg n + occ)$ time, where $\sigma$ is the size of the alphabet and $p \leq \sigma^k m^k$ . The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns $P$ and $P'$ , we present a data structure for computing the interval of $PP'$ in $O(\lg\lg n)$ sequential time, or in $O(1 + \lg_p\lg n)$ parallel time. All our data structures are of size $O(n)$ bits (in addition to the suffix array).

BibTeX

@inproceedings{Kurpicz2016CPM,
  author    = {Johannes Fischer and Dominik Köppl and Florian Kurpicz},
  title     = {On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching},
  booktitle = {CPM},
  series    = {LIPIcs},
  volume    = {54},
  pages     = {26:1–26:11},
  publisher = {Schloss Dagstuhl - Leibniz-Zentrum für Informatik},
  doi       = {10.4230/LIPIcs.CPM.2016.26},
  year      = {2016}
}

On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Abstract

BibTeX

PDF