Cargando…

Benchmarking metagenomics classifiers on ancient viral DNA: a simulation study

Owing to technological advances in ancient DNA, it is now possible to sequence viruses from the past to track down their origin and evolution. However, ancient DNA data is considerably more degraded and contaminated than modern data making the identification of ancient viral genomes particularly cha...

Descripción completa

Detalles Bibliográficos
Autores principales: Arizmendi Cárdenas, Yami Ommar, Neuenschwander, Samuel, Malaspinas, Anna-Sapfo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8958974/
https://www.ncbi.nlm.nih.gov/pubmed/35356467
http://dx.doi.org/10.7717/peerj.12784
_version_ 1784677054931795968
author Arizmendi Cárdenas, Yami Ommar
Neuenschwander, Samuel
Malaspinas, Anna-Sapfo
author_facet Arizmendi Cárdenas, Yami Ommar
Neuenschwander, Samuel
Malaspinas, Anna-Sapfo
author_sort Arizmendi Cárdenas, Yami Ommar
collection PubMed
description Owing to technological advances in ancient DNA, it is now possible to sequence viruses from the past to track down their origin and evolution. However, ancient DNA data is considerably more degraded and contaminated than modern data making the identification of ancient viral genomes particularly challenging. Several methods to characterise the modern microbiome (and, within this, the virome) have been developed; in particular, tools that assign sequenced reads to specific taxa in order to characterise the organisms present in a sample of interest. While these existing tools are routinely used in modern data, their performance when applied to ancient microbiome data to screen for ancient viruses remains unknown. In this work, we conducted an extensive simulation study using public viral sequences to establish which tool is the most suitable to screen ancient samples for human DNA viruses. We compared the performance of four widely used classifiers, namely Centrifuge, Kraken2, DIAMOND and MetaPhlAn2, in correctly assigning sequencing reads to the corresponding viruses. To do so, we simulated reads by adding noise typical of ancient DNA to a set of publicly available human DNA viral sequences and to the human genome. We fragmented the DNA into different lengths, added sequencing error and C to T and G to A deamination substitutions at the read termini. Then we measured the resulting sensitivity and precision for all classifiers. Across most simulations, more than 228 out of the 233 simulated viruses were recovered by Centrifuge, Kraken2 and DIAMOND, in contrast to MetaPhlAn2 which recovered only around one third. Overall, Centrifuge and Kraken2 had the best performance with the highest values of sensitivity and precision. We found that deamination damage had little impact on the performance of the classifiers, less than the sequencing error and the length of the reads. Since Centrifuge can handle short reads (in contrast to DIAMOND and Kraken2 with default settings) and since it achieve the highest sensitivity and precision at the species level across all the simulations performed, it is our recommended tool. Regardless of the tool used, our simulations indicate that, for ancient human studies, users should use strict filters to remove all reads of potential human origin. Finally, we recommend that users verify which species are present in the database used, as it might happen that default databases lack sequences for viruses of interest.
format Online
Article
Text
id pubmed-8958974
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-89589742022-03-29 Benchmarking metagenomics classifiers on ancient viral DNA: a simulation study Arizmendi Cárdenas, Yami Ommar Neuenschwander, Samuel Malaspinas, Anna-Sapfo PeerJ Bioinformatics Owing to technological advances in ancient DNA, it is now possible to sequence viruses from the past to track down their origin and evolution. However, ancient DNA data is considerably more degraded and contaminated than modern data making the identification of ancient viral genomes particularly challenging. Several methods to characterise the modern microbiome (and, within this, the virome) have been developed; in particular, tools that assign sequenced reads to specific taxa in order to characterise the organisms present in a sample of interest. While these existing tools are routinely used in modern data, their performance when applied to ancient microbiome data to screen for ancient viruses remains unknown. In this work, we conducted an extensive simulation study using public viral sequences to establish which tool is the most suitable to screen ancient samples for human DNA viruses. We compared the performance of four widely used classifiers, namely Centrifuge, Kraken2, DIAMOND and MetaPhlAn2, in correctly assigning sequencing reads to the corresponding viruses. To do so, we simulated reads by adding noise typical of ancient DNA to a set of publicly available human DNA viral sequences and to the human genome. We fragmented the DNA into different lengths, added sequencing error and C to T and G to A deamination substitutions at the read termini. Then we measured the resulting sensitivity and precision for all classifiers. Across most simulations, more than 228 out of the 233 simulated viruses were recovered by Centrifuge, Kraken2 and DIAMOND, in contrast to MetaPhlAn2 which recovered only around one third. Overall, Centrifuge and Kraken2 had the best performance with the highest values of sensitivity and precision. We found that deamination damage had little impact on the performance of the classifiers, less than the sequencing error and the length of the reads. Since Centrifuge can handle short reads (in contrast to DIAMOND and Kraken2 with default settings) and since it achieve the highest sensitivity and precision at the species level across all the simulations performed, it is our recommended tool. Regardless of the tool used, our simulations indicate that, for ancient human studies, users should use strict filters to remove all reads of potential human origin. Finally, we recommend that users verify which species are present in the database used, as it might happen that default databases lack sequences for viruses of interest. PeerJ Inc. 2022-03-24 /pmc/articles/PMC8958974/ /pubmed/35356467 http://dx.doi.org/10.7717/peerj.12784 Text en ©2022 Arizmendi Cárdenas et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Arizmendi Cárdenas, Yami Ommar
Neuenschwander, Samuel
Malaspinas, Anna-Sapfo
Benchmarking metagenomics classifiers on ancient viral DNA: a simulation study
title Benchmarking metagenomics classifiers on ancient viral DNA: a simulation study
title_full Benchmarking metagenomics classifiers on ancient viral DNA: a simulation study
title_fullStr Benchmarking metagenomics classifiers on ancient viral DNA: a simulation study
title_full_unstemmed Benchmarking metagenomics classifiers on ancient viral DNA: a simulation study
title_short Benchmarking metagenomics classifiers on ancient viral DNA: a simulation study
title_sort benchmarking metagenomics classifiers on ancient viral dna: a simulation study
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8958974/
https://www.ncbi.nlm.nih.gov/pubmed/35356467
http://dx.doi.org/10.7717/peerj.12784
work_keys_str_mv AT arizmendicardenasyamiommar benchmarkingmetagenomicsclassifiersonancientviraldnaasimulationstudy
AT neuenschwandersamuel benchmarkingmetagenomicsclassifiersonancientviraldnaasimulationstudy
AT malaspinasannasapfo benchmarkingmetagenomicsclassifiersonancientviraldnaasimulationstudy