Cargando…

Scalable metagenomic taxonomy classification using a reference genome database

Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, d...

Descripción completa

Detalles Bibliográficos
Autores principales: Ames, Sasha K., Hysom, David A., Gardner, Shea N., Lloyd, G. Scott, Gokhale, Maya B., Allen, Jonathan E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3753567/
https://www.ncbi.nlm.nih.gov/pubmed/23828782
http://dx.doi.org/10.1093/bioinformatics/btt389
_version_ 1782281858169438208
author Ames, Sasha K.
Hysom, David A.
Gardner, Shea N.
Lloyd, G. Scott
Gokhale, Maya B.
Allen, Jonathan E.
author_facet Ames, Sasha K.
Hysom, David A.
Gardner, Shea N.
Lloyd, G. Scott
Gokhale, Maya B.
Allen, Jonathan E.
author_sort Ames, Sasha K.
collection PubMed
description Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat Contact: allen99@llnl.gov Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3753567
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-37535672013-08-27 Scalable metagenomic taxonomy classification using a reference genome database Ames, Sasha K. Hysom, David A. Gardner, Shea N. Lloyd, G. Scott Gokhale, Maya B. Allen, Jonathan E. Bioinformatics Original Papers Motivation: Deep metagenomic sequencing of biological samples has the potential to recover otherwise difficult-to-detect microorganisms and accurately characterize biological samples with limited prior knowledge of sample contents. Existing metagenomic taxonomic classification algorithms, however, do not scale well to analyze large metagenomic datasets, and balancing classification accuracy with computational efficiency presents a fundamental challenge. Results: A method is presented to shift computational costs to an off-line computation by creating a taxonomy/genome index that supports scalable metagenomic classification. Scalable performance is demonstrated on real and simulated data to show accurate classification in the presence of novel organisms on samples that include viruses, prokaryotes, fungi and protists. Taxonomic classification of the previously published 150 giga-base Tyrolean Iceman dataset was found to take <20 h on a single node 40 core large memory machine and provide new insights on the metagenomic contents of the sample. Availability: Software was implemented in C++ and is freely available at http://sourceforge.net/projects/lmat Contact: allen99@llnl.gov Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2013-09-15 2013-07-04 /pmc/articles/PMC3753567/ /pubmed/23828782 http://dx.doi.org/10.1093/bioinformatics/btt389 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Ames, Sasha K.
Hysom, David A.
Gardner, Shea N.
Lloyd, G. Scott
Gokhale, Maya B.
Allen, Jonathan E.
Scalable metagenomic taxonomy classification using a reference genome database
title Scalable metagenomic taxonomy classification using a reference genome database
title_full Scalable metagenomic taxonomy classification using a reference genome database
title_fullStr Scalable metagenomic taxonomy classification using a reference genome database
title_full_unstemmed Scalable metagenomic taxonomy classification using a reference genome database
title_short Scalable metagenomic taxonomy classification using a reference genome database
title_sort scalable metagenomic taxonomy classification using a reference genome database
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3753567/
https://www.ncbi.nlm.nih.gov/pubmed/23828782
http://dx.doi.org/10.1093/bioinformatics/btt389
work_keys_str_mv AT amessashak scalablemetagenomictaxonomyclassificationusingareferencegenomedatabase
AT hysomdavida scalablemetagenomictaxonomyclassificationusingareferencegenomedatabase
AT gardnershean scalablemetagenomictaxonomyclassificationusingareferencegenomedatabase
AT lloydgscott scalablemetagenomictaxonomyclassificationusingareferencegenomedatabase
AT gokhalemayab scalablemetagenomictaxonomyclassificationusingareferencegenomedatabase
AT allenjonathane scalablemetagenomictaxonomyclassificationusingareferencegenomedatabase