Cargando…

Long read alignment based on maximal exact match seeds

Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Yongchao, Schmidt, Bertil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436841/
https://www.ncbi.nlm.nih.gov/pubmed/22962447
http://dx.doi.org/10.1093/bioinformatics/bts414
_version_ 1782242710107717632
author Liu, Yongchao
Schmidt, Bertil
author_facet Liu, Yongchao
Schmidt, Bertil
author_sort Liu, Yongchao
collection PubMed
description Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger. Results: We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 and GASSST, by aligning simulated and real datasets to the human genome. The performance evaluation shows that CUSHAW2 is consistently among the highest-ranked aligners in terms of alignment quality for both single-end and paired-end alignment, while demonstrating highly competitive speed. Furthermore, our aligner shows good parallel scalability with respect to the number of CPU threads. Availability: CUSHAW2, written in C++, and all simulated datasets are available at http://cushaw2.sourceforge.net Contact: liuy@uni-mainz.de; bertil.schmidt@uni-mainz.de Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3436841
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-34368412012-12-12 Long read alignment based on maximal exact match seeds Liu, Yongchao Schmidt, Bertil Bioinformatics Original Papers Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger. Results: We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 and GASSST, by aligning simulated and real datasets to the human genome. The performance evaluation shows that CUSHAW2 is consistently among the highest-ranked aligners in terms of alignment quality for both single-end and paired-end alignment, while demonstrating highly competitive speed. Furthermore, our aligner shows good parallel scalability with respect to the number of CPU threads. Availability: CUSHAW2, written in C++, and all simulated datasets are available at http://cushaw2.sourceforge.net Contact: liuy@uni-mainz.de; bertil.schmidt@uni-mainz.de Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2012-09-15 2012-09-03 /pmc/articles/PMC3436841/ /pubmed/22962447 http://dx.doi.org/10.1093/bioinformatics/bts414 Text en © The Author(s) (2012). Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Liu, Yongchao
Schmidt, Bertil
Long read alignment based on maximal exact match seeds
title Long read alignment based on maximal exact match seeds
title_full Long read alignment based on maximal exact match seeds
title_fullStr Long read alignment based on maximal exact match seeds
title_full_unstemmed Long read alignment based on maximal exact match seeds
title_short Long read alignment based on maximal exact match seeds
title_sort long read alignment based on maximal exact match seeds
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436841/
https://www.ncbi.nlm.nih.gov/pubmed/22962447
http://dx.doi.org/10.1093/bioinformatics/bts414
work_keys_str_mv AT liuyongchao longreadalignmentbasedonmaximalexactmatchseeds
AT schmidtbertil longreadalignmentbasedonmaximalexactmatchseeds