Cargando…

Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses

Transcription regulatory sequences (TRSs), which occur upstream of structural and accessory genes as well as the [Formula: see text] end of a coronavirus genome, play a critical role in discontinuous transcription in coronaviruses. We introduce two problems collectively aimed at identifying these re...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Chuanyi, Sashittal, Palash, Xiang, Michael, Zhang, Yichi, Kazi, Ayesha, El-Kebir, Mohammed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9214144/
https://www.ncbi.nlm.nih.gov/pubmed/35700225
http://dx.doi.org/10.1093/molbev/msac133
_version_ 1784730949992316928
author Zhang, Chuanyi
Sashittal, Palash
Xiang, Michael
Zhang, Yichi
Kazi, Ayesha
El-Kebir, Mohammed
author_facet Zhang, Chuanyi
Sashittal, Palash
Xiang, Michael
Zhang, Yichi
Kazi, Ayesha
El-Kebir, Mohammed
author_sort Zhang, Chuanyi
collection PubMed
description Transcription regulatory sequences (TRSs), which occur upstream of structural and accessory genes as well as the [Formula: see text] end of a coronavirus genome, play a critical role in discontinuous transcription in coronaviruses. We introduce two problems collectively aimed at identifying these regulatory sequences as well as their associated genes. First, we formulate the TRS Identification problem of identifying TRS sites in a coronavirus genome sequence with prescribed gene locations. We introduce CORSID-A, an algorithm that solves this problem to optimality in polynomial time. We demonstrate that CORSID-A outperforms existing motif-based methods in identifying TRS sites in coronaviruses. Second, we demonstrate for the first time how TRS sites can be leveraged to identify gene locations in the coronavirus genome. To that end, we formulate the TRS and Gene Identification problem of simultaneously identifying TRS sites and gene locations in unannotated coronavirus genomes. We introduce CORSID to solve this problem, which includes a web-based visualization tool to explore the space of near-optimal solutions. We show that CORSID outperforms state-of-the-art gene finding methods in coronavirus genomes. Furthermore, we demonstrate that CORSID enables de novo identification of TRS sites and genes in previously unannotated coronavirus genomes. CORSID is the first method to perform accurate and simultaneous identification of TRS sites and genes in coronavirus genomes without the use of any prior information.
format Online
Article
Text
id pubmed-9214144
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92141442022-06-22 Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses Zhang, Chuanyi Sashittal, Palash Xiang, Michael Zhang, Yichi Kazi, Ayesha El-Kebir, Mohammed Mol Biol Evol Methods Transcription regulatory sequences (TRSs), which occur upstream of structural and accessory genes as well as the [Formula: see text] end of a coronavirus genome, play a critical role in discontinuous transcription in coronaviruses. We introduce two problems collectively aimed at identifying these regulatory sequences as well as their associated genes. First, we formulate the TRS Identification problem of identifying TRS sites in a coronavirus genome sequence with prescribed gene locations. We introduce CORSID-A, an algorithm that solves this problem to optimality in polynomial time. We demonstrate that CORSID-A outperforms existing motif-based methods in identifying TRS sites in coronaviruses. Second, we demonstrate for the first time how TRS sites can be leveraged to identify gene locations in the coronavirus genome. To that end, we formulate the TRS and Gene Identification problem of simultaneously identifying TRS sites and gene locations in unannotated coronavirus genomes. We introduce CORSID to solve this problem, which includes a web-based visualization tool to explore the space of near-optimal solutions. We show that CORSID outperforms state-of-the-art gene finding methods in coronavirus genomes. Furthermore, we demonstrate that CORSID enables de novo identification of TRS sites and genes in previously unannotated coronavirus genomes. CORSID is the first method to perform accurate and simultaneous identification of TRS sites and genes in coronavirus genomes without the use of any prior information. Oxford University Press 2022-06-14 /pmc/articles/PMC9214144/ /pubmed/35700225 http://dx.doi.org/10.1093/molbev/msac133 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods
Zhang, Chuanyi
Sashittal, Palash
Xiang, Michael
Zhang, Yichi
Kazi, Ayesha
El-Kebir, Mohammed
Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses
title Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses
title_full Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses
title_fullStr Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses
title_full_unstemmed Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses
title_short Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses
title_sort accurate identification of transcription regulatory sequences and genes in coronaviruses
topic Methods
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9214144/
https://www.ncbi.nlm.nih.gov/pubmed/35700225
http://dx.doi.org/10.1093/molbev/msac133
work_keys_str_mv AT zhangchuanyi accurateidentificationoftranscriptionregulatorysequencesandgenesincoronaviruses
AT sashittalpalash accurateidentificationoftranscriptionregulatorysequencesandgenesincoronaviruses
AT xiangmichael accurateidentificationoftranscriptionregulatorysequencesandgenesincoronaviruses
AT zhangyichi accurateidentificationoftranscriptionregulatorysequencesandgenesincoronaviruses
AT kaziayesha accurateidentificationoftranscriptionregulatorysequencesandgenesincoronaviruses
AT elkebirmohammed accurateidentificationoftranscriptionregulatorysequencesandgenesincoronaviruses