Cargando…
Long read alignment based on maximal exact match seeds
Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436841/ https://www.ncbi.nlm.nih.gov/pubmed/22962447 http://dx.doi.org/10.1093/bioinformatics/bts414 |
_version_ | 1782242710107717632 |
---|---|
author | Liu, Yongchao Schmidt, Bertil |
author_facet | Liu, Yongchao Schmidt, Bertil |
author_sort | Liu, Yongchao |
collection | PubMed |
description | Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger. Results: We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 and GASSST, by aligning simulated and real datasets to the human genome. The performance evaluation shows that CUSHAW2 is consistently among the highest-ranked aligners in terms of alignment quality for both single-end and paired-end alignment, while demonstrating highly competitive speed. Furthermore, our aligner shows good parallel scalability with respect to the number of CPU threads. Availability: CUSHAW2, written in C++, and all simulated datasets are available at http://cushaw2.sourceforge.net Contact: liuy@uni-mainz.de; bertil.schmidt@uni-mainz.de Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-3436841 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-34368412012-12-12 Long read alignment based on maximal exact match seeds Liu, Yongchao Schmidt, Bertil Bioinformatics Original Papers Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger. Results: We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 and GASSST, by aligning simulated and real datasets to the human genome. The performance evaluation shows that CUSHAW2 is consistently among the highest-ranked aligners in terms of alignment quality for both single-end and paired-end alignment, while demonstrating highly competitive speed. Furthermore, our aligner shows good parallel scalability with respect to the number of CPU threads. Availability: CUSHAW2, written in C++, and all simulated datasets are available at http://cushaw2.sourceforge.net Contact: liuy@uni-mainz.de; bertil.schmidt@uni-mainz.de Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2012-09-15 2012-09-03 /pmc/articles/PMC3436841/ /pubmed/22962447 http://dx.doi.org/10.1093/bioinformatics/bts414 Text en © The Author(s) (2012). Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Liu, Yongchao Schmidt, Bertil Long read alignment based on maximal exact match seeds |
title | Long read alignment based on maximal exact match seeds |
title_full | Long read alignment based on maximal exact match seeds |
title_fullStr | Long read alignment based on maximal exact match seeds |
title_full_unstemmed | Long read alignment based on maximal exact match seeds |
title_short | Long read alignment based on maximal exact match seeds |
title_sort | long read alignment based on maximal exact match seeds |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436841/ https://www.ncbi.nlm.nih.gov/pubmed/22962447 http://dx.doi.org/10.1093/bioinformatics/bts414 |
work_keys_str_mv | AT liuyongchao longreadalignmentbasedonmaximalexactmatchseeds AT schmidtbertil longreadalignmentbasedonmaximalexactmatchseeds |