Cargando…

mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications

High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the ‘best’ mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstr...

Descripción completa

Detalles Bibliográficos
Autores principales: Hach, Faraz, Sarrafi, Iman, Hormozdiari, Farhad, Alkan, Can, Eichler, Evan E., Sahinalp, S. Cenk
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086126/
https://www.ncbi.nlm.nih.gov/pubmed/24810850
http://dx.doi.org/10.1093/nar/gku370
_version_ 1782324773968150528
author Hach, Faraz
Sarrafi, Iman
Hormozdiari, Farhad
Alkan, Can
Eichler, Evan E.
Sahinalp, S. Cenk
author_facet Hach, Faraz
Sarrafi, Iman
Hormozdiari, Farhad
Alkan, Can
Eichler, Evan E.
Sahinalp, S. Cenk
author_sort Hach, Faraz
collection PubMed
description High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the ‘best’ mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstream analysis such as structural variation detection, where it is important to report multiple mapping loci for each read. For this purpose we introduce mrsFAST-Ultra, a fast, cache oblivious, SNP-aware aligner that can handle the multi-mapping of HTS reads very efficiently. mrsFAST-Ultra improves mrsFAST, our first cache oblivious read aligner capable of handling multi-mapping reads, through new and compact index structures that reduce not only the overall memory usage but also the number of CPU operations per alignment. In fact the size of the index generated by mrsFAST-Ultra is 10 times smaller than that of mrsFAST. As importantly, mrsFAST-Ultra introduces new features such as being able to (i) obtain the best mapping loci for each read, and (ii) return all reads that have at most n mapping loci (within an error threshold), together with these loci, for any user specified n. Furthermore, mrsFAST-Ultra is SNP-aware, i.e. it can map reads to reference genome while discounting the mismatches that occur at common SNP locations provided by db-SNP; this significantly increases the number of reads that can be mapped to the reference genome. Notice that all of the above features are implemented within the index structure and are not simple post-processing steps and thus are performed highly efficiently. Finally, mrsFAST-Ultra utilizes multiple available cores and processors and can be tuned for various memory settings. Our results show that mrsFAST-Ultra is roughly five times faster than its predecessor mrsFAST. In comparison to newly enhanced popular tools such as Bowtie2, it is more sensitive (it can report 10 times or more mappings per read) and much faster (six times or more) in the multi-mapping mode. Furthermore, mrsFAST-Ultra has an index size of 2GB for the entire human reference genome, which is roughly half of that of Bowtie2. mrsFAST-Ultra is open source and it can be accessed at http://mrsfast.sourceforge.net.
format Online
Article
Text
id pubmed-4086126
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-40861262014-10-28 mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications Hach, Faraz Sarrafi, Iman Hormozdiari, Farhad Alkan, Can Eichler, Evan E. Sahinalp, S. Cenk Nucleic Acids Res Article High throughput sequencing (HTS) platforms generate unprecedented amounts of data that introduce challenges for processing and downstream analysis. While tools that report the ‘best’ mapping location of each read provide a fast way to process HTS data, they are not suitable for many types of downstream analysis such as structural variation detection, where it is important to report multiple mapping loci for each read. For this purpose we introduce mrsFAST-Ultra, a fast, cache oblivious, SNP-aware aligner that can handle the multi-mapping of HTS reads very efficiently. mrsFAST-Ultra improves mrsFAST, our first cache oblivious read aligner capable of handling multi-mapping reads, through new and compact index structures that reduce not only the overall memory usage but also the number of CPU operations per alignment. In fact the size of the index generated by mrsFAST-Ultra is 10 times smaller than that of mrsFAST. As importantly, mrsFAST-Ultra introduces new features such as being able to (i) obtain the best mapping loci for each read, and (ii) return all reads that have at most n mapping loci (within an error threshold), together with these loci, for any user specified n. Furthermore, mrsFAST-Ultra is SNP-aware, i.e. it can map reads to reference genome while discounting the mismatches that occur at common SNP locations provided by db-SNP; this significantly increases the number of reads that can be mapped to the reference genome. Notice that all of the above features are implemented within the index structure and are not simple post-processing steps and thus are performed highly efficiently. Finally, mrsFAST-Ultra utilizes multiple available cores and processors and can be tuned for various memory settings. Our results show that mrsFAST-Ultra is roughly five times faster than its predecessor mrsFAST. In comparison to newly enhanced popular tools such as Bowtie2, it is more sensitive (it can report 10 times or more mappings per read) and much faster (six times or more) in the multi-mapping mode. Furthermore, mrsFAST-Ultra has an index size of 2GB for the entire human reference genome, which is roughly half of that of Bowtie2. mrsFAST-Ultra is open source and it can be accessed at http://mrsfast.sourceforge.net. Oxford University Press 2014-07-01 2014-05-08 /pmc/articles/PMC4086126/ /pubmed/24810850 http://dx.doi.org/10.1093/nar/gku370 Text en © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Article
Hach, Faraz
Sarrafi, Iman
Hormozdiari, Farhad
Alkan, Can
Eichler, Evan E.
Sahinalp, S. Cenk
mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications
title mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications
title_full mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications
title_fullStr mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications
title_full_unstemmed mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications
title_short mrsFAST-Ultra: a compact, SNP-aware mapper for high performance sequencing applications
title_sort mrsfast-ultra: a compact, snp-aware mapper for high performance sequencing applications
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4086126/
https://www.ncbi.nlm.nih.gov/pubmed/24810850
http://dx.doi.org/10.1093/nar/gku370
work_keys_str_mv AT hachfaraz mrsfastultraacompactsnpawaremapperforhighperformancesequencingapplications
AT sarrafiiman mrsfastultraacompactsnpawaremapperforhighperformancesequencingapplications
AT hormozdiarifarhad mrsfastultraacompactsnpawaremapperforhighperformancesequencingapplications
AT alkancan mrsfastultraacompactsnpawaremapperforhighperformancesequencingapplications
AT eichlerevane mrsfastultraacompactsnpawaremapperforhighperformancesequencingapplications
AT sahinalpscenk mrsfastultraacompactsnpawaremapperforhighperformancesequencingapplications