Cargando…

RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data

Despite a rapid expansion in the number of documented viruses following the advent of metagenomic sequencing, the identification and annotation of highly divergent RNA viruses remain challenging, particularly from poorly characterized hosts and environmental samples. Protein structures are more cons...

Descripción completa

Detalles Bibliográficos
Autores principales: Charon, Justine, Buchmann, Jan P, Sadiq, Sabrina, Holmes, Edward C
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9752661/
https://www.ncbi.nlm.nih.gov/pubmed/36533143
http://dx.doi.org/10.1093/ve/veac082
_version_ 1784850782387961856
author Charon, Justine
Buchmann, Jan P
Sadiq, Sabrina
Holmes, Edward C
author_facet Charon, Justine
Buchmann, Jan P
Sadiq, Sabrina
Holmes, Edward C
author_sort Charon, Justine
collection PubMed
description Despite a rapid expansion in the number of documented viruses following the advent of metagenomic sequencing, the identification and annotation of highly divergent RNA viruses remain challenging, particularly from poorly characterized hosts and environmental samples. Protein structures are more conserved than primary sequence data, such that structure-based comparisons provide an opportunity to reveal the viral ‘dusk matter’: viral sequences with low, but detectable, levels of sequence identity to known viruses with available protein structures. Here, we present a new open computational resource—RdRp-scan—that contains a standardized bioinformatic toolkit to identify and annotate divergent RNA viruses in metagenomic sequence data based on the detection of RNA-dependent RNA polymerase (RdRp) sequences. By combining RdRp-specific hidden Markov models (HMMs) and structural comparisons, we show that RdRp-scan can efficiently detect RdRp sequences with identity levels as low as 10 per cent to those from known viruses and not identifiable using standard sequence-to-sequence comparisons. In addition, to facilitate the annotation and placement of newly detected and divergent virus-like sequences into the diversity of RNA viruses, RdRp-scan provides new custom and curated databases of viral RdRp sequences and core motifs, as well as pre-built RdRp multiple sequence alignments. In parallel, our analysis of the sequence diversity detected by the RdRp-scan revealed that while most of the taxonomically unassigned RdRps fell into pre-established clusters, some fell into potentially new orders of RNA viruses related to the Wolframvirales and Tolivirales. Finally, a survey of the conserved A, B, and C RdRp motifs within the RdRp-scan sequence database revealed additional variations of both sequence and position that might provide new insights into the structure, function, and evolution of viral polymerases.
format Online
Article
Text
id pubmed-9752661
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-97526612022-12-16 RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data Charon, Justine Buchmann, Jan P Sadiq, Sabrina Holmes, Edward C Virus Evol Research Article Despite a rapid expansion in the number of documented viruses following the advent of metagenomic sequencing, the identification and annotation of highly divergent RNA viruses remain challenging, particularly from poorly characterized hosts and environmental samples. Protein structures are more conserved than primary sequence data, such that structure-based comparisons provide an opportunity to reveal the viral ‘dusk matter’: viral sequences with low, but detectable, levels of sequence identity to known viruses with available protein structures. Here, we present a new open computational resource—RdRp-scan—that contains a standardized bioinformatic toolkit to identify and annotate divergent RNA viruses in metagenomic sequence data based on the detection of RNA-dependent RNA polymerase (RdRp) sequences. By combining RdRp-specific hidden Markov models (HMMs) and structural comparisons, we show that RdRp-scan can efficiently detect RdRp sequences with identity levels as low as 10 per cent to those from known viruses and not identifiable using standard sequence-to-sequence comparisons. In addition, to facilitate the annotation and placement of newly detected and divergent virus-like sequences into the diversity of RNA viruses, RdRp-scan provides new custom and curated databases of viral RdRp sequences and core motifs, as well as pre-built RdRp multiple sequence alignments. In parallel, our analysis of the sequence diversity detected by the RdRp-scan revealed that while most of the taxonomically unassigned RdRps fell into pre-established clusters, some fell into potentially new orders of RNA viruses related to the Wolframvirales and Tolivirales. Finally, a survey of the conserved A, B, and C RdRp motifs within the RdRp-scan sequence database revealed additional variations of both sequence and position that might provide new insights into the structure, function, and evolution of viral polymerases. Oxford University Press 2022-09-01 /pmc/articles/PMC9752661/ /pubmed/36533143 http://dx.doi.org/10.1093/ve/veac082 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research Article
Charon, Justine
Buchmann, Jan P
Sadiq, Sabrina
Holmes, Edward C
RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data
title RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data
title_full RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data
title_fullStr RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data
title_full_unstemmed RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data
title_short RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data
title_sort rdrp-scan: a bioinformatic resource to identify and annotate divergent rna viruses in metagenomic sequence data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9752661/
https://www.ncbi.nlm.nih.gov/pubmed/36533143
http://dx.doi.org/10.1093/ve/veac082
work_keys_str_mv AT charonjustine rdrpscanabioinformaticresourcetoidentifyandannotatedivergentrnavirusesinmetagenomicsequencedata
AT buchmannjanp rdrpscanabioinformaticresourcetoidentifyandannotatedivergentrnavirusesinmetagenomicsequencedata
AT sadiqsabrina rdrpscanabioinformaticresourcetoidentifyandannotatedivergentrnavirusesinmetagenomicsequencedata
AT holmesedwardc rdrpscanabioinformaticresourcetoidentifyandannotatedivergentrnavirusesinmetagenomicsequencedata