Cargando…

A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes

Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits t...

Descripción completa

Detalles Bibliográficos
Autores principales: Garcia, Benjamin J., Simha, Ramanuja, Garvin, Michael, Furches, Anna, Jones, Piet, Gazolla, Joao G.F.M., Hyatt, P. Doug, Schadt, Christopher W., Pelletier, Dale, Jacobson, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8605058/
https://www.ncbi.nlm.nih.gov/pubmed/34849195
http://dx.doi.org/10.1016/j.csbj.2021.10.029
_version_ 1784602092043763712
author Garcia, Benjamin J.
Simha, Ramanuja
Garvin, Michael
Furches, Anna
Jones, Piet
Gazolla, Joao G.F.M.
Hyatt, P. Doug
Schadt, Christopher W.
Pelletier, Dale
Jacobson, Daniel
author_facet Garcia, Benjamin J.
Simha, Ramanuja
Garvin, Michael
Furches, Anna
Jones, Piet
Gazolla, Joao G.F.M.
Hyatt, P. Doug
Schadt, Christopher W.
Pelletier, Dale
Jacobson, Daniel
author_sort Garcia, Benjamin J.
collection PubMed
description Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples, even if they do not have a taxonomy. To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses to create a database of metagenomic viruses. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. We then integrated the viral classification database with databases created with genomes from NCBI for use with ParaKraken (a parallelized version of Kraken provided in Supplemental Zip 1), a metagenomic/transcriptomic classifier. To illustrate the breadth of our utility for classifying metagenome viruses, we analyzed data from a plant metagenome study identifying genotypic and compartment specific differences between two Populus genotypes in three different compartments. We also identified a significant increase in abundance of eight viral sequences in post mortem brains in a human metatranscriptome study comparing Autism Spectrum Disorder patients and controls. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we validate the accuracy of viral classification with NCBI databases containing viruses with taxonomy to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples. Our method represents the compulsory first step in better understanding the role of viruses in the microbiome by allowing for a more complete identification of sequences without taxonomy. Better classification of viruses will improve identifying associations between viruses and their hosts as well as viruses and other microbiome members. Despite the lack of taxonomy, this database of metagenomic viruses can be used with any tool that utilizes a taxonomy, such as Kraken, for accurate classification of viruses.
format Online
Article
Text
id pubmed-8605058
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-86050582021-11-29 A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes Garcia, Benjamin J. Simha, Ramanuja Garvin, Michael Furches, Anna Jones, Piet Gazolla, Joao G.F.M. Hyatt, P. Doug Schadt, Christopher W. Pelletier, Dale Jacobson, Daniel Comput Struct Biotechnol J Research Article Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples, even if they do not have a taxonomy. To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses to create a database of metagenomic viruses. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. We then integrated the viral classification database with databases created with genomes from NCBI for use with ParaKraken (a parallelized version of Kraken provided in Supplemental Zip 1), a metagenomic/transcriptomic classifier. To illustrate the breadth of our utility for classifying metagenome viruses, we analyzed data from a plant metagenome study identifying genotypic and compartment specific differences between two Populus genotypes in three different compartments. We also identified a significant increase in abundance of eight viral sequences in post mortem brains in a human metatranscriptome study comparing Autism Spectrum Disorder patients and controls. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we validate the accuracy of viral classification with NCBI databases containing viruses with taxonomy to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples. Our method represents the compulsory first step in better understanding the role of viruses in the microbiome by allowing for a more complete identification of sequences without taxonomy. Better classification of viruses will improve identifying associations between viruses and their hosts as well as viruses and other microbiome members. Despite the lack of taxonomy, this database of metagenomic viruses can be used with any tool that utilizes a taxonomy, such as Kraken, for accurate classification of viruses. Research Network of Computational and Structural Biotechnology 2021-10-25 /pmc/articles/PMC8605058/ /pubmed/34849195 http://dx.doi.org/10.1016/j.csbj.2021.10.029 Text en © 2021 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Garcia, Benjamin J.
Simha, Ramanuja
Garvin, Michael
Furches, Anna
Jones, Piet
Gazolla, Joao G.F.M.
Hyatt, P. Doug
Schadt, Christopher W.
Pelletier, Dale
Jacobson, Daniel
A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
title A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
title_full A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
title_fullStr A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
title_full_unstemmed A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
title_short A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
title_sort k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8605058/
https://www.ncbi.nlm.nih.gov/pubmed/34849195
http://dx.doi.org/10.1016/j.csbj.2021.10.029
work_keys_str_mv AT garciabenjaminj akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT simharamanuja akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT garvinmichael akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT furchesanna akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT jonespiet akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT gazollajoaogfm akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT hyattpdoug akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT schadtchristopherw akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT pelletierdale akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT jacobsondaniel akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT garciabenjaminj kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT simharamanuja kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT garvinmichael kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT furchesanna kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT jonespiet kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT gazollajoaogfm kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT hyattpdoug kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT schadtchristopherw kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT pelletierdale kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes
AT jacobsondaniel kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes