Florian Kurpicz
Presentation at CPM 2016

On the Benefit of Merging Suffix Array Intervals for Parallel Pattern Matching

Today, I presented our work on maximum common subgraph problems in series-parallel graphs at CPM 2016. We presented a new approach to parallelize pattern matching based on merging intervals in the suffix array that contain the results of subqueries. Our algorithm can also answer pattern matching with k errors. The slides are available as handout and heavily animated. You can finde the paper here (DOI) and here (arXiv).

Slides in Handout Mode

Slide image cpm_2016_slides_handout-01.jpg
Slide image cpm_2016_slides_handout-02.jpg
Slide image cpm_2016_slides_handout-03.jpg
Slide image cpm_2016_slides_handout-04.jpg
Slide image cpm_2016_slides_handout-05.jpg
Slide image cpm_2016_slides_handout-06.jpg
Slide image cpm_2016_slides_handout-07.jpg
Slide image cpm_2016_slides_handout-08.jpg
Slide image cpm_2016_slides_handout-09.jpg
Slide image cpm_2016_slides_handout-10.jpg
Slide image cpm_2016_slides_handout-11.jpg
Slide image cpm_2016_slides_handout-12.jpg
Slide image cpm_2016_slides_handout-13.jpg
Slide image cpm_2016_slides_handout-14.jpg
Slide image cpm_2016_slides_handout-15.jpg
Slide image cpm_2016_slides_handout-16.jpg
Slide image cpm_2016_slides_handout-17.jpg
Slide image cpm_2016_slides_handout-18.jpg
Slide image cpm_2016_slides_handout-19.jpg
Slide image cpm_2016_slides_handout-20.jpg

Abstract

We present parallel algorithms for exact and approximate pattern matching with suffix arrays, using a CREW-PRAM with p processors. Given a static text of length n, we first show how to compute the suffix array interval of a given pattern of length m in O(m/p + lg p + lg lg p * lg lg n) time for p <= m. For approximate pattern matching with k differences or mismatches, we show how to compute all occurrences of a given pattern in O((m^k sigma^k)/p max (k, lg lg n) + (1+m/p) lg p * lg lg n + occ} time, where sigma is the size of the alphabet and p <= sigma^k m^k. The workhorse of our algorithms is a data structure for merging suffix array intervals quickly: Given the suffix array intervals for two patterns P and P', we present a data structure for computing the interval of PP' in O(lg lg n) sequential time, or in O(1 + lg_p lg n) parallel time. All our data structures are of size O(n) bits (in addition to the suffix array).