Cargando…
RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data
Despite a rapid expansion in the number of documented viruses following the advent of metagenomic sequencing, the identification and annotation of highly divergent RNA viruses remain challenging, particularly from poorly characterized hosts and environmental samples. Protein structures are more cons...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9752661/ https://www.ncbi.nlm.nih.gov/pubmed/36533143 http://dx.doi.org/10.1093/ve/veac082 |
_version_ | 1784850782387961856 |
---|---|
author | Charon, Justine Buchmann, Jan P Sadiq, Sabrina Holmes, Edward C |
author_facet | Charon, Justine Buchmann, Jan P Sadiq, Sabrina Holmes, Edward C |
author_sort | Charon, Justine |
collection | PubMed |
description | Despite a rapid expansion in the number of documented viruses following the advent of metagenomic sequencing, the identification and annotation of highly divergent RNA viruses remain challenging, particularly from poorly characterized hosts and environmental samples. Protein structures are more conserved than primary sequence data, such that structure-based comparisons provide an opportunity to reveal the viral ‘dusk matter’: viral sequences with low, but detectable, levels of sequence identity to known viruses with available protein structures. Here, we present a new open computational resource—RdRp-scan—that contains a standardized bioinformatic toolkit to identify and annotate divergent RNA viruses in metagenomic sequence data based on the detection of RNA-dependent RNA polymerase (RdRp) sequences. By combining RdRp-specific hidden Markov models (HMMs) and structural comparisons, we show that RdRp-scan can efficiently detect RdRp sequences with identity levels as low as 10 per cent to those from known viruses and not identifiable using standard sequence-to-sequence comparisons. In addition, to facilitate the annotation and placement of newly detected and divergent virus-like sequences into the diversity of RNA viruses, RdRp-scan provides new custom and curated databases of viral RdRp sequences and core motifs, as well as pre-built RdRp multiple sequence alignments. In parallel, our analysis of the sequence diversity detected by the RdRp-scan revealed that while most of the taxonomically unassigned RdRps fell into pre-established clusters, some fell into potentially new orders of RNA viruses related to the Wolframvirales and Tolivirales. Finally, a survey of the conserved A, B, and C RdRp motifs within the RdRp-scan sequence database revealed additional variations of both sequence and position that might provide new insights into the structure, function, and evolution of viral polymerases. |
format | Online Article Text |
id | pubmed-9752661 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-97526612022-12-16 RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data Charon, Justine Buchmann, Jan P Sadiq, Sabrina Holmes, Edward C Virus Evol Research Article Despite a rapid expansion in the number of documented viruses following the advent of metagenomic sequencing, the identification and annotation of highly divergent RNA viruses remain challenging, particularly from poorly characterized hosts and environmental samples. Protein structures are more conserved than primary sequence data, such that structure-based comparisons provide an opportunity to reveal the viral ‘dusk matter’: viral sequences with low, but detectable, levels of sequence identity to known viruses with available protein structures. Here, we present a new open computational resource—RdRp-scan—that contains a standardized bioinformatic toolkit to identify and annotate divergent RNA viruses in metagenomic sequence data based on the detection of RNA-dependent RNA polymerase (RdRp) sequences. By combining RdRp-specific hidden Markov models (HMMs) and structural comparisons, we show that RdRp-scan can efficiently detect RdRp sequences with identity levels as low as 10 per cent to those from known viruses and not identifiable using standard sequence-to-sequence comparisons. In addition, to facilitate the annotation and placement of newly detected and divergent virus-like sequences into the diversity of RNA viruses, RdRp-scan provides new custom and curated databases of viral RdRp sequences and core motifs, as well as pre-built RdRp multiple sequence alignments. In parallel, our analysis of the sequence diversity detected by the RdRp-scan revealed that while most of the taxonomically unassigned RdRps fell into pre-established clusters, some fell into potentially new orders of RNA viruses related to the Wolframvirales and Tolivirales. Finally, a survey of the conserved A, B, and C RdRp motifs within the RdRp-scan sequence database revealed additional variations of both sequence and position that might provide new insights into the structure, function, and evolution of viral polymerases. Oxford University Press 2022-09-01 /pmc/articles/PMC9752661/ /pubmed/36533143 http://dx.doi.org/10.1093/ve/veac082 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research Article Charon, Justine Buchmann, Jan P Sadiq, Sabrina Holmes, Edward C RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data |
title | RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data |
title_full | RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data |
title_fullStr | RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data |
title_full_unstemmed | RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data |
title_short | RdRp-scan: A bioinformatic resource to identify and annotate divergent RNA viruses in metagenomic sequence data |
title_sort | rdrp-scan: a bioinformatic resource to identify and annotate divergent rna viruses in metagenomic sequence data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9752661/ https://www.ncbi.nlm.nih.gov/pubmed/36533143 http://dx.doi.org/10.1093/ve/veac082 |
work_keys_str_mv | AT charonjustine rdrpscanabioinformaticresourcetoidentifyandannotatedivergentrnavirusesinmetagenomicsequencedata AT buchmannjanp rdrpscanabioinformaticresourcetoidentifyandannotatedivergentrnavirusesinmetagenomicsequencedata AT sadiqsabrina rdrpscanabioinformaticresourcetoidentifyandannotatedivergentrnavirusesinmetagenomicsequencedata AT holmesedwardc rdrpscanabioinformaticresourcetoidentifyandannotatedivergentrnavirusesinmetagenomicsequencedata |