Cargando…

Libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons

BACKGROUND: Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative...

Descripción completa

Detalles Bibliográficos
Autores principales: Choi, Illyoung, Ponsero, Alise J, Bomhoff, Matthew, Youens-Clark, Ken, Hartman, John H, Hurwitz, Bonnie L
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6354030/
https://www.ncbi.nlm.nih.gov/pubmed/30597002
http://dx.doi.org/10.1093/gigascience/giy165
_version_ 1783391091177816064
author Choi, Illyoung
Ponsero, Alise J
Bomhoff, Matthew
Youens-Clark, Ken
Hartman, John H
Hurwitz, Bonnie L
author_facet Choi, Illyoung
Ponsero, Alise J
Bomhoff, Matthew
Youens-Clark, Ken
Hartman, John H
Hurwitz, Bonnie L
author_sort Choi, Illyoung
collection PubMed
description BACKGROUND: Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative metagenomics enables the comparison of metagenomes based on their total genetic content. RESULTS: We developed a tool called Libra that performs an all-vs-all comparison of metagenomes for precise clustering based on their k-mer content. Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth, and a web-based implementation in iMicrobe (http://imicrobe.us) that uses the CyVerse advanced cyberinfrastructure to promote broad use of the tool by the scientific community. CONCLUSIONS: A comparison of Libra to equivalent tools using both simulated and real metagenomic datasets, ranging from 80 million to 4.2 billion reads, reveals that methods commonly implemented to reduce compute time for large datasets, such as data reduction, read count normalization, and presence/absence distance metrics, greatly diminish the resolution of large-scale comparative analyses. In contrast, Libra uses all of the reads to calculate k-mer abundance in a Hadoop architecture that can scale to any size dataset to enable global-scale analyses and link microbial signatures to biological processes.
format Online
Article
Text
id pubmed-6354030
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63540302019-02-05 Libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons Choi, Illyoung Ponsero, Alise J Bomhoff, Matthew Youens-Clark, Ken Hartman, John H Hurwitz, Bonnie L Gigascience Technical Note BACKGROUND: Shotgun metagenomics provides powerful insights into microbial community biodiversity and function. Yet, inferences from metagenomic studies are often limited by dataset size and complexity and are restricted by the availability and completeness of existing databases. De novo comparative metagenomics enables the comparison of metagenomes based on their total genetic content. RESULTS: We developed a tool called Libra that performs an all-vs-all comparison of metagenomes for precise clustering based on their k-mer content. Libra uses a scalable Hadoop framework for massive metagenome comparisons, Cosine Similarity for calculating the distance using sequence composition and abundance while normalizing for sequencing depth, and a web-based implementation in iMicrobe (http://imicrobe.us) that uses the CyVerse advanced cyberinfrastructure to promote broad use of the tool by the scientific community. CONCLUSIONS: A comparison of Libra to equivalent tools using both simulated and real metagenomic datasets, ranging from 80 million to 4.2 billion reads, reveals that methods commonly implemented to reduce compute time for large datasets, such as data reduction, read count normalization, and presence/absence distance metrics, greatly diminish the resolution of large-scale comparative analyses. In contrast, Libra uses all of the reads to calculate k-mer abundance in a Hadoop architecture that can scale to any size dataset to enable global-scale analyses and link microbial signatures to biological processes. Oxford University Press 2018-12-28 /pmc/articles/PMC6354030/ /pubmed/30597002 http://dx.doi.org/10.1093/gigascience/giy165 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Choi, Illyoung
Ponsero, Alise J
Bomhoff, Matthew
Youens-Clark, Ken
Hartman, John H
Hurwitz, Bonnie L
Libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons
title Libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons
title_full Libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons
title_fullStr Libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons
title_full_unstemmed Libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons
title_short Libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons
title_sort libra: scalable k-mer–based tool for massive all-vs-all metagenome comparisons
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6354030/
https://www.ncbi.nlm.nih.gov/pubmed/30597002
http://dx.doi.org/10.1093/gigascience/giy165
work_keys_str_mv AT choiillyoung librascalablekmerbasedtoolformassiveallvsallmetagenomecomparisons
AT ponseroalisej librascalablekmerbasedtoolformassiveallvsallmetagenomecomparisons
AT bomhoffmatthew librascalablekmerbasedtoolformassiveallvsallmetagenomecomparisons
AT youensclarkken librascalablekmerbasedtoolformassiveallvsallmetagenomecomparisons
AT hartmanjohnh librascalablekmerbasedtoolformassiveallvsallmetagenomecomparisons
AT hurwitzbonniel librascalablekmerbasedtoolformassiveallvsallmetagenomecomparisons