Cargando…

From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools

In metagenomic analyses of microbiomes, one of the first steps is usually the taxonomic classification of reads by comparison to a database of previously taxonomically classified genomes. While different studies comparing metagenomic taxonomic classification methods have determined that different to...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wright, Robyn J., Comeau, Andrè M., Langille, Morgan G. I.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Microbiology Society 2023
Materias:	Research Articles
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10132073/ https://www.ncbi.nlm.nih.gov/pubmed/36867161 http://dx.doi.org/10.1099/mgen.0.000949

_version_	1785031321377046528
author	Wright, Robyn J. Comeau, Andrè M. Langille, Morgan G. I.
author_facet	Wright, Robyn J. Comeau, Andrè M. Langille, Morgan G. I.
author_sort	Wright, Robyn J.
collection	PubMed
description	In metagenomic analyses of microbiomes, one of the first steps is usually the taxonomic classification of reads by comparison to a database of previously taxonomically classified genomes. While different studies comparing metagenomic taxonomic classification methods have determined that different tools are ‘best’, there are two tools that have been used the most to-date: Kraken (k-mer-based classification against a user-constructed database) and MetaPhlAn (classification by alignment to clade-specific marker genes), the latest versions of which are Kraken2 and MetaPhlAn 3, respectively. We found large discrepancies in both the proportion of reads that were classified as well as the number of species that were identified when we used both Kraken2 and MetaPhlAn 3 to classify reads within metagenomes from human-associated or environmental datasets. We then investigated which of these tools would give classifications closest to the real composition of metagenomic samples using a range of simulated and mock samples and examined the combined impact of tool–parameter–database choice on the taxonomic classifications given. This revealed that there may not be a one-size-fits-all ‘best’ choice. While Kraken2 can achieve better overall performance, with higher precision, recall and F1 scores, as well as alpha- and beta-diversity measures closer to the known composition than MetaPhlAn 3, the computational resources required for this may be prohibitive for many researchers, and the default database and parameters should not be used. We therefore conclude that the best tool–parameter–database choice for a particular application depends on the scientific question of interest, which performance metric is most important for this question and the limit of available computational resources.
format	Online Article Text
id	pubmed-10132073
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Microbiology Society
record_format	MEDLINE/PubMed
spelling	pubmed-101320732023-04-27 From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools Wright, Robyn J. Comeau, Andrè M. Langille, Morgan G. I. Microb Genom Research Articles In metagenomic analyses of microbiomes, one of the first steps is usually the taxonomic classification of reads by comparison to a database of previously taxonomically classified genomes. While different studies comparing metagenomic taxonomic classification methods have determined that different tools are ‘best’, there are two tools that have been used the most to-date: Kraken (k-mer-based classification against a user-constructed database) and MetaPhlAn (classification by alignment to clade-specific marker genes), the latest versions of which are Kraken2 and MetaPhlAn 3, respectively. We found large discrepancies in both the proportion of reads that were classified as well as the number of species that were identified when we used both Kraken2 and MetaPhlAn 3 to classify reads within metagenomes from human-associated or environmental datasets. We then investigated which of these tools would give classifications closest to the real composition of metagenomic samples using a range of simulated and mock samples and examined the combined impact of tool–parameter–database choice on the taxonomic classifications given. This revealed that there may not be a one-size-fits-all ‘best’ choice. While Kraken2 can achieve better overall performance, with higher precision, recall and F1 scores, as well as alpha- and beta-diversity measures closer to the known composition than MetaPhlAn 3, the computational resources required for this may be prohibitive for many researchers, and the default database and parameters should not be used. We therefore conclude that the best tool–parameter–database choice for a particular application depends on the scientific question of interest, which performance metric is most important for this question and the limit of available computational resources. Microbiology Society 2023-03-03 /pmc/articles/PMC10132073/ /pubmed/36867161 http://dx.doi.org/10.1099/mgen.0.000949 Text en © 2023 The Authors https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License.
spellingShingle	Research Articles Wright, Robyn J. Comeau, Andrè M. Langille, Morgan G. I. From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools
title	From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools
title_full	From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools
title_fullStr	From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools
title_full_unstemmed	From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools
title_short	From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools
title_sort	from defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools
topic	Research Articles
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10132073/ https://www.ncbi.nlm.nih.gov/pubmed/36867161 http://dx.doi.org/10.1099/mgen.0.000949
work_keys_str_mv	AT wrightrobynj fromdefaultstodatabasesparameteranddatabasechoicedramaticallyimpacttheperformanceofmetagenomictaxonomicclassificationtools AT comeauandrem fromdefaultstodatabasesparameteranddatabasechoicedramaticallyimpacttheperformanceofmetagenomictaxonomicclassificationtools AT langillemorgangi fromdefaultstodatabasesparameteranddatabasechoicedramaticallyimpacttheperformanceofmetagenomictaxonomicclassificationtools

From defaults to databases: parameter and database choice dramatically impact the performance of metagenomic taxonomic classification tools

Ejemplares similares