Cargando…
A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes
Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits t...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Research Network of Computational and Structural Biotechnology
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8605058/ https://www.ncbi.nlm.nih.gov/pubmed/34849195 http://dx.doi.org/10.1016/j.csbj.2021.10.029 |
_version_ | 1784602092043763712 |
---|---|
author | Garcia, Benjamin J. Simha, Ramanuja Garvin, Michael Furches, Anna Jones, Piet Gazolla, Joao G.F.M. Hyatt, P. Doug Schadt, Christopher W. Pelletier, Dale Jacobson, Daniel |
author_facet | Garcia, Benjamin J. Simha, Ramanuja Garvin, Michael Furches, Anna Jones, Piet Gazolla, Joao G.F.M. Hyatt, P. Doug Schadt, Christopher W. Pelletier, Dale Jacobson, Daniel |
author_sort | Garcia, Benjamin J. |
collection | PubMed |
description | Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples, even if they do not have a taxonomy. To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses to create a database of metagenomic viruses. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. We then integrated the viral classification database with databases created with genomes from NCBI for use with ParaKraken (a parallelized version of Kraken provided in Supplemental Zip 1), a metagenomic/transcriptomic classifier. To illustrate the breadth of our utility for classifying metagenome viruses, we analyzed data from a plant metagenome study identifying genotypic and compartment specific differences between two Populus genotypes in three different compartments. We also identified a significant increase in abundance of eight viral sequences in post mortem brains in a human metatranscriptome study comparing Autism Spectrum Disorder patients and controls. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we validate the accuracy of viral classification with NCBI databases containing viruses with taxonomy to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples. Our method represents the compulsory first step in better understanding the role of viruses in the microbiome by allowing for a more complete identification of sequences without taxonomy. Better classification of viruses will improve identifying associations between viruses and their hosts as well as viruses and other microbiome members. Despite the lack of taxonomy, this database of metagenomic viruses can be used with any tool that utilizes a taxonomy, such as Kraken, for accurate classification of viruses. |
format | Online Article Text |
id | pubmed-8605058 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Research Network of Computational and Structural Biotechnology |
record_format | MEDLINE/PubMed |
spelling | pubmed-86050582021-11-29 A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes Garcia, Benjamin J. Simha, Ramanuja Garvin, Michael Furches, Anna Jones, Piet Gazolla, Joao G.F.M. Hyatt, P. Doug Schadt, Christopher W. Pelletier, Dale Jacobson, Daniel Comput Struct Biotechnol J Research Article Viruses are an underrepresented taxa in the study and identification of microbiome constituents; however, they play an essential role in health, microbiome regulation, and transfer of genetic material. Only a few thousand viruses have been isolated, sequenced, and assigned a taxonomy, which limits the ability to identify and quantify viruses in the microbiome. Additionally, the vast diversity of viruses represents a challenge for classification, not only in constructing a viral taxonomy, but also in identifying similarities between a virus’ genotype and its phenotype. However, the diversity of viral sequences can be leveraged to classify their sequences in metagenomic and metatranscriptomic samples, even if they do not have a taxonomy. To identify and quantify viruses in transcriptomic and genomic samples, we developed a dynamic programming algorithm for creating a classification tree out of 715,672 metagenome viruses. To create the classification tree, we clustered proportional similarity scores generated from the k-mer profiles of each of the metagenome viruses to create a database of metagenomic viruses. The resulting Kraken2 database of the metagenomic viruses can be found here: https://www.osti.gov/biblio/1615774 and is compatible with Kraken2. We then integrated the viral classification database with databases created with genomes from NCBI for use with ParaKraken (a parallelized version of Kraken provided in Supplemental Zip 1), a metagenomic/transcriptomic classifier. To illustrate the breadth of our utility for classifying metagenome viruses, we analyzed data from a plant metagenome study identifying genotypic and compartment specific differences between two Populus genotypes in three different compartments. We also identified a significant increase in abundance of eight viral sequences in post mortem brains in a human metatranscriptome study comparing Autism Spectrum Disorder patients and controls. We also show the potential accuracy for classifying viruses by utilizing both the JGI and NCBI viral databases to identify the uniqueness of viral sequences. Finally, we validate the accuracy of viral classification with NCBI databases containing viruses with taxonomy to identify pathogenic viruses in known COVID-19 and cassava brown streak virus infection samples. Our method represents the compulsory first step in better understanding the role of viruses in the microbiome by allowing for a more complete identification of sequences without taxonomy. Better classification of viruses will improve identifying associations between viruses and their hosts as well as viruses and other microbiome members. Despite the lack of taxonomy, this database of metagenomic viruses can be used with any tool that utilizes a taxonomy, such as Kraken, for accurate classification of viruses. Research Network of Computational and Structural Biotechnology 2021-10-25 /pmc/articles/PMC8605058/ /pubmed/34849195 http://dx.doi.org/10.1016/j.csbj.2021.10.029 Text en © 2021 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Research Article Garcia, Benjamin J. Simha, Ramanuja Garvin, Michael Furches, Anna Jones, Piet Gazolla, Joao G.F.M. Hyatt, P. Doug Schadt, Christopher W. Pelletier, Dale Jacobson, Daniel A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
title | A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
title_full | A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
title_fullStr | A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
title_full_unstemmed | A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
title_short | A k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
title_sort | k-mer based approach for classifying viruses without taxonomy identifies viral associations in human autism and plant microbiomes |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8605058/ https://www.ncbi.nlm.nih.gov/pubmed/34849195 http://dx.doi.org/10.1016/j.csbj.2021.10.029 |
work_keys_str_mv | AT garciabenjaminj akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT simharamanuja akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT garvinmichael akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT furchesanna akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT jonespiet akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT gazollajoaogfm akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT hyattpdoug akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT schadtchristopherw akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT pelletierdale akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT jacobsondaniel akmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT garciabenjaminj kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT simharamanuja kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT garvinmichael kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT furchesanna kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT jonespiet kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT gazollajoaogfm kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT hyattpdoug kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT schadtchristopherw kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT pelletierdale kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes AT jacobsondaniel kmerbasedapproachforclassifyingviruseswithouttaxonomyidentifiesviralassociationsinhumanautismandplantmicrobiomes |