Cargando…

Identification of RNA Virus–Derived RdRp Sequences in Publicly Available Transcriptomic Data Sets

RNA viruses are abundant and highly diverse and infect all or most eukaryotic organisms. However, only a tiny fraction of the number and diversity of RNA virus species have been catalogued. To cost-effectively expand the diversity of known RNA virus sequences, we mined publicly available transcripto...

Descripción completa

Detalles Bibliográficos
Autores principales: Olendraite, Ingrida, Brown, Katherine, Firth, Andrew E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101049/
https://www.ncbi.nlm.nih.gov/pubmed/37014783
http://dx.doi.org/10.1093/molbev/msad060
_version_ 1785025421940621312
author Olendraite, Ingrida
Brown, Katherine
Firth, Andrew E
author_facet Olendraite, Ingrida
Brown, Katherine
Firth, Andrew E
author_sort Olendraite, Ingrida
collection PubMed
description RNA viruses are abundant and highly diverse and infect all or most eukaryotic organisms. However, only a tiny fraction of the number and diversity of RNA virus species have been catalogued. To cost-effectively expand the diversity of known RNA virus sequences, we mined publicly available transcriptomic data sets. We developed 77 family-level Hidden Markov Model profiles for the viral RNA-dependent RNA polymerase (RdRp)—the only universal “hallmark” gene of RNA viruses. By using these to search the National Center for Biotechnology Information Transcriptome Shotgun Assembly database, we identified 5,867 contigs encoding RNA virus RdRps or fragments thereof and analyzed their diversity, taxonomic classification, phylogeny, and host associations. Our study expands the known diversity of RNA viruses, and the 77 curated RdRp Profile Hidden Markov Models provide a useful resource for the virus discovery community.
format Online
Article
Text
id pubmed-10101049
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-101010492023-04-14 Identification of RNA Virus–Derived RdRp Sequences in Publicly Available Transcriptomic Data Sets Olendraite, Ingrida Brown, Katherine Firth, Andrew E Mol Biol Evol Discoveries RNA viruses are abundant and highly diverse and infect all or most eukaryotic organisms. However, only a tiny fraction of the number and diversity of RNA virus species have been catalogued. To cost-effectively expand the diversity of known RNA virus sequences, we mined publicly available transcriptomic data sets. We developed 77 family-level Hidden Markov Model profiles for the viral RNA-dependent RNA polymerase (RdRp)—the only universal “hallmark” gene of RNA viruses. By using these to search the National Center for Biotechnology Information Transcriptome Shotgun Assembly database, we identified 5,867 contigs encoding RNA virus RdRps or fragments thereof and analyzed their diversity, taxonomic classification, phylogeny, and host associations. Our study expands the known diversity of RNA viruses, and the 77 curated RdRp Profile Hidden Markov Models provide a useful resource for the virus discovery community. Oxford University Press 2023-04-04 /pmc/articles/PMC10101049/ /pubmed/37014783 http://dx.doi.org/10.1093/molbev/msad060 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Discoveries
Olendraite, Ingrida
Brown, Katherine
Firth, Andrew E
Identification of RNA Virus–Derived RdRp Sequences in Publicly Available Transcriptomic Data Sets
title Identification of RNA Virus–Derived RdRp Sequences in Publicly Available Transcriptomic Data Sets
title_full Identification of RNA Virus–Derived RdRp Sequences in Publicly Available Transcriptomic Data Sets
title_fullStr Identification of RNA Virus–Derived RdRp Sequences in Publicly Available Transcriptomic Data Sets
title_full_unstemmed Identification of RNA Virus–Derived RdRp Sequences in Publicly Available Transcriptomic Data Sets
title_short Identification of RNA Virus–Derived RdRp Sequences in Publicly Available Transcriptomic Data Sets
title_sort identification of rna virus–derived rdrp sequences in publicly available transcriptomic data sets
topic Discoveries
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10101049/
https://www.ncbi.nlm.nih.gov/pubmed/37014783
http://dx.doi.org/10.1093/molbev/msad060
work_keys_str_mv AT olendraiteingrida identificationofrnavirusderivedrdrpsequencesinpubliclyavailabletranscriptomicdatasets
AT brownkatherine identificationofrnavirusderivedrdrpsequencesinpubliclyavailabletranscriptomicdatasets
AT firthandrewe identificationofrnavirusderivedrdrpsequencesinpubliclyavailabletranscriptomicdatasets