Cargando…

NeoRdRp: A Comprehensive Dataset for Identifying RNA-dependent RNA Polymerases of Various RNA Viruses from Metatranscriptomic Data

RNA viruses are distributed throughout various environments, and most have recently been identified by metatranscriptome sequencing. However, due to the high nucleotide diversity of RNA viruses, it is still challenging to identify novel RNA viruses from metatranscriptome data. To overcome this issue...

Descripción completa

Detalles Bibliográficos
Autores principales: Sakaguchi, Shoichi, Urayama, Syun-ichi, Takaki, Yoshihiro, Hirosuna, Kensuke, Wu, Hong, Suzuki, Youichi, Nunoura, Takuro, Nakano, Takashi, Nakagawa, So
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Japanese Society of Microbial Ecology / Japanese Society of Soil Microbiology / Taiwan Society of Microbial Ecology / Japanese Society of Plant Microbe Interactions / Japanese Society for Extremophiles 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9530720/
https://www.ncbi.nlm.nih.gov/pubmed/36002304
http://dx.doi.org/10.1264/jsme2.ME22001
Descripción
Sumario:RNA viruses are distributed throughout various environments, and most have recently been identified by metatranscriptome sequencing. However, due to the high nucleotide diversity of RNA viruses, it is still challenging to identify novel RNA viruses from metatranscriptome data. To overcome this issue, we created a dataset of RNA-dependent RNA polymerase (RdRp) domains that are essential for all RNA viruses belonging to Orthornavirae. Genes with RdRp domains from various RNA viruses were clustered based on amino acid sequence similarities. A multiple sequence alignment was generated for each cluster, and a hidden Markov model (HMM) profile was created when the number of sequences was greater than three. We further refined 426 HMM profiles by detecting RefSeq RNA virus sequences and subsequently combined the hit sequences with the RdRp domains. As a result, 1,182 HMM profiles were generated from 12,502 RdRp domain sequences, and the dataset was named NeoRdRp. The majority of NeoRdRp HMM profiles successfully detected RdRp domains, specifically in the UniProt dataset. Furthermore, we compared the NeoRdRp dataset with two previously reported methods for RNA virus detection using metatranscriptome sequencing data. Our methods successfully identified the majority of RNA viruses in the datasets; however, some RNA viruses were not detected, similar to the other two methods. NeoRdRp may be repeatedly improved by the addition of new RdRp sequences and is applicable as a system for detecting various RNA viruses from diverse metatranscriptome data.