INDIGO Home University of Illinois at Urbana-Champaign logo uic building uic pavilion uic student center

High Performance Multiple Sequence Alignment System for Pyrosequencing Reads from Multiple Reference Genomes

Show full item record

Bookmark or cite this item: http://hdl.handle.net/10027/8610

Files in this item

File Description Format
PDF mypaper12.pdf (782KB) (no description provided) PDF
Title: High Performance Multiple Sequence Alignment System for Pyrosequencing Reads from Multiple Reference Genomes
Author(s): Saeed, Fahad; Perez-Rathke, Alan; Gwarnicki, Jaroslaw; Berger-Wolf, Tanya; Khokhar, Ashfaq
Subject(s): Multiple sequence alignment Pyrosequencing Parallel algorithms Computational biology Genome alignment and mapping
Abstract: Genome resequencing with short reads generated from pyrosequencing generally relies on mapping the short reads against a single reference genome. However, mapping of reads from multiple reference genomes is not possible using a pairwise mapping algorithm. In order to align the reads w.r.t each other and the reference genomes, existing multiple sequence alignment(MSA) methods cannot be used because they do not take into account the position of these short reads with respect to the genome, and are highly inefficient for large number of sequences. In this paper, we develop a highly scalable parallel algorithm based on domain decomposition, referred to as PPyro- Align, to align such large number of reads from single or multiple reference genomes. The proposed alignment algorithm accurately aligns the erroneous reads, and has been implemented on a cluster of workstations using MPI library. Experimental results for different problem sizes are analyzed in terms of execution time, quality of the alignments, and the ability of the algorithm to handle reads from multiple haplotypes. We report high quality multiple alignment of up to 0.5 million reads. The algorithm is shown to be highly scalable and exhibits superlinear speedups with increasing number of processors.
Issue Date: 2012-01
Publisher: Elsevier
Citation Info: Saeed, F., Perez-Rathke, A., Gwarnicki, J., Berger-Wolf, T., & Khokhar, A. 2012. A high performance multiple sequence alignment system for pyrosequencing reads from multiple reference genomes. Journal of Parallel and Distributed Computing, 72(1): 83-93. DOI: 10.1016/j.jpdc.2011.08.001
Type: Article
Description: NOTICE: this is the author’s version of a work that was accepted for publication in Journal of Parallel and Distributed Computing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in Journal of Parallel and Distributed Computing, Vol 72, Issue 1, (JAN 2012). DOI: 10.1016/j.jpdc.2011.08.001
URI: http://hdl.handle.net/10027/8610
ISSN: 0743-7315
Date Available in INDIGO: 2012-08-21
 

This item appears in the following Collection(s)

Show full item record

Statistics

Country Code Views
United States of America 144
China 65
United Kingdom 18
Netherlands 5
Iran 4

Browse

My Account

Information

Access Key