Cargando…

YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample

In metagenomics, the study of environmentally associated microbial communities from their sampled DNA, one of the most fundamental computational tasks is that of determining which genomes from a reference database are present or absent in a given sample metagenome. While tools exist to answer this q...

Descripción completa

Detalles Bibliográficos
Autores principales: Koslicki, David, White, Stephen, Ma, Chunyu, Novikov, Alexei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153212/
https://www.ncbi.nlm.nih.gov/pubmed/37131762
http://dx.doi.org/10.1101/2023.04.18.537298
_version_ 1785035890212470784
author Koslicki, David
White, Stephen
Ma, Chunyu
Novikov, Alexei
author_facet Koslicki, David
White, Stephen
Ma, Chunyu
Novikov, Alexei
author_sort Koslicki, David
collection PubMed
description In metagenomics, the study of environmentally associated microbial communities from their sampled DNA, one of the most fundamental computational tasks is that of determining which genomes from a reference database are present or absent in a given sample metagenome. While tools exist to answer this question, all existing approaches to date return point estimates, with no associated confidence or uncertainty associated with it. This has led to practitioners experiencing difficulty when interpreting the results from these tools, particularly for low abundance organisms as these often reside in the “noisy tail” of incorrect predictions. Furthermore, no tools to date account for the fact that reference databases are often incomplete and rarely, if ever, contain exact replicas of genomes present in an environmentally derived metagenome. In this work, we present solutions for these issues by introducing the algorithm YACHT: Yes/No Answers to Community membership via Hypothesis Testing. This approach introduces a statistical framework that accounts for sequence divergence between the reference and sample genomes, in terms of average nucleotide identity, as well as incomplete sequencing depth, thus providing a hypothesis test for determining the presence or absence of a reference genome in a sample. After introducing our approach, we quantify its statistical power as well as quantify theoretically how this changes with varying parameters. Subsequently, we perform extensive experiments using both simulated and real data to confirm the accuracy and scalability of this approach. Code implementing this approach, as well as all experiments performed, is available at https://github.com/KoslickiLab/YACHT.
format Online
Article
Text
id pubmed-10153212
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-101532122023-05-03 YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample Koslicki, David White, Stephen Ma, Chunyu Novikov, Alexei bioRxiv Article In metagenomics, the study of environmentally associated microbial communities from their sampled DNA, one of the most fundamental computational tasks is that of determining which genomes from a reference database are present or absent in a given sample metagenome. While tools exist to answer this question, all existing approaches to date return point estimates, with no associated confidence or uncertainty associated with it. This has led to practitioners experiencing difficulty when interpreting the results from these tools, particularly for low abundance organisms as these often reside in the “noisy tail” of incorrect predictions. Furthermore, no tools to date account for the fact that reference databases are often incomplete and rarely, if ever, contain exact replicas of genomes present in an environmentally derived metagenome. In this work, we present solutions for these issues by introducing the algorithm YACHT: Yes/No Answers to Community membership via Hypothesis Testing. This approach introduces a statistical framework that accounts for sequence divergence between the reference and sample genomes, in terms of average nucleotide identity, as well as incomplete sequencing depth, thus providing a hypothesis test for determining the presence or absence of a reference genome in a sample. After introducing our approach, we quantify its statistical power as well as quantify theoretically how this changes with varying parameters. Subsequently, we perform extensive experiments using both simulated and real data to confirm the accuracy and scalability of this approach. Code implementing this approach, as well as all experiments performed, is available at https://github.com/KoslickiLab/YACHT. Cold Spring Harbor Laboratory 2023-04-20 /pmc/articles/PMC10153212/ /pubmed/37131762 http://dx.doi.org/10.1101/2023.04.18.537298 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Koslicki, David
White, Stephen
Ma, Chunyu
Novikov, Alexei
YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample
title YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample
title_full YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample
title_fullStr YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample
title_full_unstemmed YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample
title_short YACHT: an ANI-based statistical test to detect microbial presence/absence in a metagenomic sample
title_sort yacht: an ani-based statistical test to detect microbial presence/absence in a metagenomic sample
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10153212/
https://www.ncbi.nlm.nih.gov/pubmed/37131762
http://dx.doi.org/10.1101/2023.04.18.537298
work_keys_str_mv AT koslickidavid yachtananibasedstatisticaltesttodetectmicrobialpresenceabsenceinametagenomicsample
AT whitestephen yachtananibasedstatisticaltesttodetectmicrobialpresenceabsenceinametagenomicsample
AT machunyu yachtananibasedstatisticaltesttodetectmicrobialpresenceabsenceinametagenomicsample
AT novikovalexei yachtananibasedstatisticaltesttodetectmicrobialpresenceabsenceinametagenomicsample