Cargando…

Long read alignment based on maximal exact match seeds

Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Yongchao, Schmidt, Bertil
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2012
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436841/ https://www.ncbi.nlm.nih.gov/pubmed/22962447 http://dx.doi.org/10.1093/bioinformatics/bts414

_version_	1782242710107717632
author	Liu, Yongchao Schmidt, Bertil
author_facet	Liu, Yongchao Schmidt, Bertil
author_sort	Liu, Yongchao
collection	PubMed
description	Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger. Results: We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 and GASSST, by aligning simulated and real datasets to the human genome. The performance evaluation shows that CUSHAW2 is consistently among the highest-ranked aligners in terms of alignment quality for both single-end and paired-end alignment, while demonstrating highly competitive speed. Furthermore, our aligner shows good parallel scalability with respect to the number of CPU threads. Availability: CUSHAW2, written in C++, and all simulated datasets are available at http://cushaw2.sourceforge.net Contact: liuy@uni-mainz.de; bertil.schmidt@uni-mainz.de Supplementary information: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-3436841
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-34368412012-12-12 Long read alignment based on maximal exact match seeds Liu, Yongchao Schmidt, Bertil Bioinformatics Original Papers Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger. Results: We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 and GASSST, by aligning simulated and real datasets to the human genome. The performance evaluation shows that CUSHAW2 is consistently among the highest-ranked aligners in terms of alignment quality for both single-end and paired-end alignment, while demonstrating highly competitive speed. Furthermore, our aligner shows good parallel scalability with respect to the number of CPU threads. Availability: CUSHAW2, written in C++, and all simulated datasets are available at http://cushaw2.sourceforge.net Contact: liuy@uni-mainz.de; bertil.schmidt@uni-mainz.de Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2012-09-15 2012-09-03 /pmc/articles/PMC3436841/ /pubmed/22962447 http://dx.doi.org/10.1093/bioinformatics/bts414 Text en © The Author(s) (2012). Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Liu, Yongchao Schmidt, Bertil Long read alignment based on maximal exact match seeds
title	Long read alignment based on maximal exact match seeds
title_full	Long read alignment based on maximal exact match seeds
title_fullStr	Long read alignment based on maximal exact match seeds
title_full_unstemmed	Long read alignment based on maximal exact match seeds
title_short	Long read alignment based on maximal exact match seeds
title_sort	long read alignment based on maximal exact match seeds
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3436841/ https://www.ncbi.nlm.nih.gov/pubmed/22962447 http://dx.doi.org/10.1093/bioinformatics/bts414
work_keys_str_mv	AT liuyongchao longreadalignmentbasedonmaximalexactmatchseeds AT schmidtbertil longreadalignmentbasedonmaximalexactmatchseeds

Long read alignment based on maximal exact match seeds

Ejemplares similares