Cargando…
Comparison of k-mer-based de novo comparative metagenomic tools and approaches
Aim: Comparative metagenomic analysis requires measuring a pairwise similarity between metagenomes in the dataset. Reference-based methods that compute a beta-diversity distance between two metagenomes are highly dependent on the quality and completeness of the reference database, and their applicat...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
OAE Publishing Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10696585/ http://dx.doi.org/10.20517/mrr.2023.26 |
_version_ | 1785154599979581440 |
---|---|
author | Ponsero, Alise Jany Miller, Matthew Hurwitz, Bonnie Louise |
author_facet | Ponsero, Alise Jany Miller, Matthew Hurwitz, Bonnie Louise |
author_sort | Ponsero, Alise Jany |
collection | PubMed |
description | Aim: Comparative metagenomic analysis requires measuring a pairwise similarity between metagenomes in the dataset. Reference-based methods that compute a beta-diversity distance between two metagenomes are highly dependent on the quality and completeness of the reference database, and their application on less studied microbiota can be challenging. On the other hand, de-novo comparative metagenomic methods only rely on the sequence composition of metagenomes to compare datasets. While each one of these approaches has its strengths and limitations, their comparison is currently limited. Methods: We developed sets of simulated short-reads metagenomes to (1) compare k-mer-based and taxonomy-based distances and evaluate the impact of technical and biological variables on these metrics and (2) evaluate the effect of k-mer sketching and filtering. We used a real-world metagenomic dataset to provide an overview of the currently available tools for de novo metagenomic comparative analysis. Results: Using simulated metagenomes of known composition and controlled error rate, we showed that k-mer-based distance metrics were well correlated to the taxonomic distance metric for quantitative Beta-diversity metrics, but the correlation was low for presence/absence distances. The community complexity in terms of taxa richness and the sequencing depth significantly affected the quality of the k-mer-based distances, while the impact of low amounts of sequence contamination and sequencing error was limited. Finally, we benchmarked currently available de-novo comparative metagenomic tools and compared their output on two datasets of fecal metagenomes and showed that most k-mer-based tools were able to recapitulate the data structure observed using taxonomic approaches. Conclusion: This study expands our understanding of the strength and limitations of k-mer-based de novo comparative metagenomic approaches and aims to provide concrete guidelines for researchers interested in applying these approaches to their metagenomic datasets. |
format | Online Article Text |
id | pubmed-10696585 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | OAE Publishing Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-106965852023-12-06 Comparison of k-mer-based de novo comparative metagenomic tools and approaches Ponsero, Alise Jany Miller, Matthew Hurwitz, Bonnie Louise Microbiome Res Rep Original Article Aim: Comparative metagenomic analysis requires measuring a pairwise similarity between metagenomes in the dataset. Reference-based methods that compute a beta-diversity distance between two metagenomes are highly dependent on the quality and completeness of the reference database, and their application on less studied microbiota can be challenging. On the other hand, de-novo comparative metagenomic methods only rely on the sequence composition of metagenomes to compare datasets. While each one of these approaches has its strengths and limitations, their comparison is currently limited. Methods: We developed sets of simulated short-reads metagenomes to (1) compare k-mer-based and taxonomy-based distances and evaluate the impact of technical and biological variables on these metrics and (2) evaluate the effect of k-mer sketching and filtering. We used a real-world metagenomic dataset to provide an overview of the currently available tools for de novo metagenomic comparative analysis. Results: Using simulated metagenomes of known composition and controlled error rate, we showed that k-mer-based distance metrics were well correlated to the taxonomic distance metric for quantitative Beta-diversity metrics, but the correlation was low for presence/absence distances. The community complexity in terms of taxa richness and the sequencing depth significantly affected the quality of the k-mer-based distances, while the impact of low amounts of sequence contamination and sequencing error was limited. Finally, we benchmarked currently available de-novo comparative metagenomic tools and compared their output on two datasets of fecal metagenomes and showed that most k-mer-based tools were able to recapitulate the data structure observed using taxonomic approaches. Conclusion: This study expands our understanding of the strength and limitations of k-mer-based de novo comparative metagenomic approaches and aims to provide concrete guidelines for researchers interested in applying these approaches to their metagenomic datasets. OAE Publishing Inc. 2023-07-20 /pmc/articles/PMC10696585/ http://dx.doi.org/10.20517/mrr.2023.26 Text en © The Author(s) 2023. https://creativecommons.org/licenses/by/4.0/© The Author(s) 2023. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, sharing, adaptation, distribution and reproduction in any medium or format, for any purpose, even commercially, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. |
spellingShingle | Original Article Ponsero, Alise Jany Miller, Matthew Hurwitz, Bonnie Louise Comparison of k-mer-based de novo comparative metagenomic tools and approaches |
title | Comparison of k-mer-based de novo comparative metagenomic tools and approaches |
title_full | Comparison of k-mer-based de novo comparative metagenomic tools and approaches |
title_fullStr | Comparison of k-mer-based de novo comparative metagenomic tools and approaches |
title_full_unstemmed | Comparison of k-mer-based de novo comparative metagenomic tools and approaches |
title_short | Comparison of k-mer-based de novo comparative metagenomic tools and approaches |
title_sort | comparison of k-mer-based de novo comparative metagenomic tools and approaches |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10696585/ http://dx.doi.org/10.20517/mrr.2023.26 |
work_keys_str_mv | AT ponseroalisejany comparisonofkmerbaseddenovocomparativemetagenomictoolsandapproaches AT millermatthew comparisonofkmerbaseddenovocomparativemetagenomictoolsandapproaches AT hurwitzbonnielouise comparisonofkmerbaseddenovocomparativemetagenomictoolsandapproaches |