Cargando…

Analysis and comparison of very large metagenomes with fast clustering and functional annotation

BACKGROUND: The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely...

Descripción completa

Detalles Bibliográficos
Autor principal: Li, Weizhong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2774329/
https://www.ncbi.nlm.nih.gov/pubmed/19863816
http://dx.doi.org/10.1186/1471-2105-10-359
_version_ 1782173931063476224
author Li, Weizhong
author_facet Li, Weizhong
author_sort Li, Weizhong
collection PubMed
description BACKGROUND: The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. RESULTS: The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". CONCLUSION: RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from .
format Text
id pubmed-2774329
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27743292009-11-07 Analysis and comparison of very large metagenomes with fast clustering and functional annotation Li, Weizhong BMC Bioinformatics Methodology Article BACKGROUND: The remarkable advance of metagenomics presents significant new challenges in data analysis. Metagenomic datasets (metagenomes) are large collections of sequencing reads from anonymous species within particular environments. Computational analyses for very large metagenomes are extremely time-consuming, and there are often many novel sequences in these metagenomes that are not fully utilized. The number of available metagenomes is rapidly increasing, so fast and efficient metagenome comparison methods are in great demand. RESULTS: The new metagenomic data analysis method Rapid Analysis of Multiple Metagenomes with a Clustering and Annotation Pipeline (RAMMCAP) was developed using an ultra-fast sequence clustering algorithm, fast protein family annotation tools, and a novel statistical metagenome comparison method that employs a unique graphic interface. RAMMCAP processes extremely large datasets with only moderate computational effort. It identifies raw read clusters and protein clusters that may include novel gene families, and compares metagenomes using clusters or functional annotations calculated by RAMMCAP. In this study, RAMMCAP was applied to the two largest available metagenomic collections, the "Global Ocean Sampling" and the "Metagenomic Profiling of Nine Biomes". CONCLUSION: RAMMCAP is a very fast method that can cluster and annotate one million metagenomic reads in only hundreds of CPU hours. It is available from . BioMed Central 2009-10-28 /pmc/articles/PMC2774329/ /pubmed/19863816 http://dx.doi.org/10.1186/1471-2105-10-359 Text en Copyright © 2009 Li; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Li, Weizhong
Analysis and comparison of very large metagenomes with fast clustering and functional annotation
title Analysis and comparison of very large metagenomes with fast clustering and functional annotation
title_full Analysis and comparison of very large metagenomes with fast clustering and functional annotation
title_fullStr Analysis and comparison of very large metagenomes with fast clustering and functional annotation
title_full_unstemmed Analysis and comparison of very large metagenomes with fast clustering and functional annotation
title_short Analysis and comparison of very large metagenomes with fast clustering and functional annotation
title_sort analysis and comparison of very large metagenomes with fast clustering and functional annotation
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2774329/
https://www.ncbi.nlm.nih.gov/pubmed/19863816
http://dx.doi.org/10.1186/1471-2105-10-359
work_keys_str_mv AT liweizhong analysisandcomparisonofverylargemetagenomeswithfastclusteringandfunctionalannotation