Cargando…
Classification of DNA sequences using Bloom filters
Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the ‘novel’ sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887045/ https://www.ncbi.nlm.nih.gov/pubmed/20472541 http://dx.doi.org/10.1093/bioinformatics/btq230 |
_version_ | 1782182498596290560 |
---|---|
author | Stranneheim, Henrik Käller, Max Allander, Tobias Andersson, Björn Arvestad, Lars Lundeberg, Joakim |
author_facet | Stranneheim, Henrik Käller, Max Allander, Tobias Andersson, Björn Arvestad, Lars Lundeberg, Joakim |
author_sort | Stranneheim, Henrik |
collection | PubMed |
description | Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the ‘novel’ sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences. Availability: Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/∼palvaro/Bloom-Faster-1.6/ Contacts: henrik.stranneheim@biotech.kth.se; joakiml@biotech.kth.se Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Text |
id | pubmed-2887045 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-28870452010-06-18 Classification of DNA sequences using Bloom filters Stranneheim, Henrik Käller, Max Allander, Tobias Andersson, Björn Arvestad, Lars Lundeberg, Joakim Bioinformatics Original Papers Motivation: New generation sequencing technologies producing increasingly complex datasets demand new efficient and specialized sequence analysis algorithms. Often, it is only the ‘novel’ sequences in a complex dataset that are of interest and the superfluous sequences need to be removed. Results: A novel algorithm, fast and accurate classification of sequences (FACSs), is introduced that can accurately and rapidly classify sequences as belonging or not belonging to a reference sequence. FACS was first optimized and validated using a synthetic metagenome dataset. An experimental metagenome dataset was then used to show that FACS achieves comparable accuracy as BLAT and SSAHA2 but is at least 21 times faster in classifying sequences. Availability: Source code for FACS, Bloom filters and MetaSim dataset used is available at http://facs.biotech.kth.se. The Bloom::Faster 1.6 Perl module can be downloaded from CPAN at http://search.cpan.org/∼palvaro/Bloom-Faster-1.6/ Contacts: henrik.stranneheim@biotech.kth.se; joakiml@biotech.kth.se Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2010-07-01 2010-05-13 /pmc/articles/PMC2887045/ /pubmed/20472541 http://dx.doi.org/10.1093/bioinformatics/btq230 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Stranneheim, Henrik Käller, Max Allander, Tobias Andersson, Björn Arvestad, Lars Lundeberg, Joakim Classification of DNA sequences using Bloom filters |
title | Classification of DNA sequences using Bloom filters |
title_full | Classification of DNA sequences using Bloom filters |
title_fullStr | Classification of DNA sequences using Bloom filters |
title_full_unstemmed | Classification of DNA sequences using Bloom filters |
title_short | Classification of DNA sequences using Bloom filters |
title_sort | classification of dna sequences using bloom filters |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2887045/ https://www.ncbi.nlm.nih.gov/pubmed/20472541 http://dx.doi.org/10.1093/bioinformatics/btq230 |
work_keys_str_mv | AT stranneheimhenrik classificationofdnasequencesusingbloomfilters AT kallermax classificationofdnasequencesusingbloomfilters AT allandertobias classificationofdnasequencesusingbloomfilters AT anderssonbjorn classificationofdnasequencesusingbloomfilters AT arvestadlars classificationofdnasequencesusingbloomfilters AT lundebergjoakim classificationofdnasequencesusingbloomfilters |