Cargando…
YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample
In metagenomics, the study of environmentally associated microbial communities from their sampled DNA, one of the most fundamental computational tasks is that of determining which genomes from a reference database are present or absent in a given sample metagenome. While tools exist to answer this q...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153212/ https://www.ncbi.nlm.nih.gov/pubmed/37131762 http://dx.doi.org/10.1101/2023.04.18.537298 |
_version_ | 1785035890212470784 |
---|---|
author | Koslicki, David White, Stephen Ma, Chunyu Novikov, Alexei |
author_facet | Koslicki, David White, Stephen Ma, Chunyu Novikov, Alexei |
author_sort | Koslicki, David |
collection | PubMed |
description | In metagenomics, the study of environmentally associated microbial communities from their sampled DNA, one of the most fundamental computational tasks is that of determining which genomes from a reference database are present or absent in a given sample metagenome. While tools exist to answer this question, all existing approaches to date return point estimates, with no associated confidence or uncertainty associated with it. This has led to practitioners experiencing difficulty when interpreting the results from these tools, particularly for low abundance organisms as these often reside in the “noisy tail” of incorrect predictions. Furthermore, no tools to date account for the fact that reference databases are often incomplete and rarely, if ever, contain exact replicas of genomes present in an environmentally derived metagenome. In this work, we present solutions for these issues by introducing the algorithm YACHT: Yes/No Answers to Community membership via Hypothesis Testing. This approach introduces a statistical framework that accounts for sequence divergence between the reference and sample genomes, in terms of average nucleotide identity, as well as incomplete sequencing depth, thus providing a hypothesis test for determining the presence or absence of a reference genome in a sample. After introducing our approach, we quantify its statistical power as well as quantify theoretically how this changes with varying parameters. Subsequently, we perform extensive experiments using both simulated and real data to confirm the accuracy and scalability of this approach. Code implementing this approach, as well as all experiments performed, is available at https://github.com/KoslickiLab/YACHT. |
format | Online Article Text |
id | pubmed-10153212 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-101532122023-05-03 YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample Koslicki, David White, Stephen Ma, Chunyu Novikov, Alexei bioRxiv Article In metagenomics, the study of environmentally associated microbial communities from their sampled DNA, one of the most fundamental computational tasks is that of determining which genomes from a reference database are present or absent in a given sample metagenome. While tools exist to answer this question, all existing approaches to date return point estimates, with no associated confidence or uncertainty associated with it. This has led to practitioners experiencing difficulty when interpreting the results from these tools, particularly for low abundance organisms as these often reside in the “noisy tail” of incorrect predictions. Furthermore, no tools to date account for the fact that reference databases are often incomplete and rarely, if ever, contain exact replicas of genomes present in an environmentally derived metagenome. In this work, we present solutions for these issues by introducing the algorithm YACHT: Yes/No Answers to Community membership via Hypothesis Testing. This approach introduces a statistical framework that accounts for sequence divergence between the reference and sample genomes, in terms of average nucleotide identity, as well as incomplete sequencing depth, thus providing a hypothesis test for determining the presence or absence of a reference genome in a sample. After introducing our approach, we quantify its statistical power as well as quantify theoretically how this changes with varying parameters. Subsequently, we perform extensive experiments using both simulated and real data to confirm the accuracy and scalability of this approach. Code implementing this approach, as well as all experiments performed, is available at https://github.com/KoslickiLab/YACHT. Cold Spring Harbor Laboratory 2023-04-20 /pmc/articles/PMC10153212/ /pubmed/37131762 http://dx.doi.org/10.1101/2023.04.18.537298 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Koslicki, David White, Stephen Ma, Chunyu Novikov, Alexei YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample |
title | YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample |
title_full | YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample |
title_fullStr | YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample |
title_full_unstemmed | YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample |
title_short | YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample |
title_sort | yacht: an ani-based statistical test to detect microbial presence/absence in a metagenomic sample |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153212/ https://www.ncbi.nlm.nih.gov/pubmed/37131762 http://dx.doi.org/10.1101/2023.04.18.537298 |
work_keys_str_mv | AT koslickidavid yachtananibasedstatisticaltesttodetectmicrobialpresenceabsenceinametagenomicsample AT whitestephen yachtananibasedstatisticaltesttodetectmicrobialpresenceabsenceinametagenomicsample AT machunyu yachtananibasedstatisticaltesttodetectmicrobialpresenceabsenceinametagenomicsample AT novikovalexei yachtananibasedstatisticaltesttodetectmicrobialpresenceabsenceinametagenomicsample |