Cargando…

Classification of DNA sequences using Bloom filters

Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the ‘novel’ sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A...

Descripción completa

Detalles Bibliográficos
Autores principales: Stranneheim, Henrik, Käller, Max, Allander, Tobias, Andersson, Björn, Arvestad, Lars, Lundeberg, Joakim
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887045/
https://www.ncbi.nlm.nih.gov/pubmed/20472541
http://dx.doi.org/10.1093/bioinformatics/btq230
_version_ 1782182498596290560
author Stranneheim, Henrik
Käller, Max
Allander, Tobias
Andersson, Björn
Arvestad, Lars
Lundeberg, Joakim
author_facet Stranneheim, Henrik
Käller, Max
Allander, Tobias
Andersson, Björn
Arvestad, Lars
Lundeberg, Joakim
author_sort Stranneheim, Henrik
collection PubMed
description Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the ‘novel’ sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences. Availability: Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/∼palvaro/Bloom-Faster-1.6/ Contacts: henrik.stranneheim@biotech.kth.se; joakiml@biotech.kth.se Supplementary information: Supplementary data are available at Bioinformatics online.
format Text
id pubmed-2887045
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28870452010-06-18 Classification of DNA sequences using Bloom filters Stranneheim, Henrik Käller, Max Allander, Tobias Andersson, Björn Arvestad, Lars Lundeberg, Joakim Bioinformatics Original Papers Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the ‘novel’ sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences. Availability: Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/∼palvaro/Bloom-Faster-1.6/ Contacts: henrik.stranneheim@biotech.kth.se; joakiml@biotech.kth.se Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2010-07-01 2010-05-13 /pmc/articles/PMC2887045/ /pubmed/20472541 http://dx.doi.org/10.1093/bioinformatics/btq230 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Stranneheim, Henrik
Käller, Max
Allander, Tobias
Andersson, Björn
Arvestad, Lars
Lundeberg, Joakim
Classification of DNA sequences using Bloom filters
title Classification of DNA sequences using Bloom filters
title_full Classification of DNA sequences using Bloom filters
title_fullStr Classification of DNA sequences using Bloom filters
title_full_unstemmed Classification of DNA sequences using Bloom filters
title_short Classification of DNA sequences using Bloom filters
title_sort classification of dna sequences using bloom filters
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887045/
https://www.ncbi.nlm.nih.gov/pubmed/20472541
http://dx.doi.org/10.1093/bioinformatics/btq230
work_keys_str_mv AT stranneheimhenrik classificationofdnasequencesusingbloomfilters
AT kallermax classificationofdnasequencesusingbloomfilters
AT allandertobias classificationofdnasequencesusingbloomfilters
AT anderssonbjorn classificationofdnasequencesusingbloomfilters
AT arvestadlars classificationofdnasequencesusingbloomfilters
AT lundebergjoakim classificationofdnasequencesusingbloomfilters