Cargando…
Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses
Transcription regulatory sequences (TRSs), which occur upstream of structural and accessory genes as well as the [Formula: see text] end of a coronavirus genome, play a critical role in discontinuous transcription in coronaviruses. We introduce two problems collectively aimed at identifying these re...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9214144/ https://www.ncbi.nlm.nih.gov/pubmed/35700225 http://dx.doi.org/10.1093/molbev/msac133 |
_version_ | 1784730949992316928 |
---|---|
author | Zhang, Chuanyi Sashittal, Palash Xiang, Michael Zhang, Yichi Kazi, Ayesha El-Kebir, Mohammed |
author_facet | Zhang, Chuanyi Sashittal, Palash Xiang, Michael Zhang, Yichi Kazi, Ayesha El-Kebir, Mohammed |
author_sort | Zhang, Chuanyi |
collection | PubMed |
description | Transcription regulatory sequences (TRSs), which occur upstream of structural and accessory genes as well as the [Formula: see text] end of a coronavirus genome, play a critical role in discontinuous transcription in coronaviruses. We introduce two problems collectively aimed at identifying these regulatory sequences as well as their associated genes. First, we formulate the TRS Identification problem of identifying TRS sites in a coronavirus genome sequence with prescribed gene locations. We introduce CORSID-A, an algorithm that solves this problem to optimality in polynomial time. We demonstrate that CORSID-A outperforms existing motif-based methods in identifying TRS sites in coronaviruses. Second, we demonstrate for the first time how TRS sites can be leveraged to identify gene locations in the coronavirus genome. To that end, we formulate the TRS and Gene Identification problem of simultaneously identifying TRS sites and gene locations in unannotated coronavirus genomes. We introduce CORSID to solve this problem, which includes a web-based visualization tool to explore the space of near-optimal solutions. We show that CORSID outperforms state-of-the-art gene finding methods in coronavirus genomes. Furthermore, we demonstrate that CORSID enables de novo identification of TRS sites and genes in previously unannotated coronavirus genomes. CORSID is the first method to perform accurate and simultaneous identification of TRS sites and genes in coronavirus genomes without the use of any prior information. |
format | Online Article Text |
id | pubmed-9214144 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-92141442022-06-22 Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses Zhang, Chuanyi Sashittal, Palash Xiang, Michael Zhang, Yichi Kazi, Ayesha El-Kebir, Mohammed Mol Biol Evol Methods Transcription regulatory sequences (TRSs), which occur upstream of structural and accessory genes as well as the [Formula: see text] end of a coronavirus genome, play a critical role in discontinuous transcription in coronaviruses. We introduce two problems collectively aimed at identifying these regulatory sequences as well as their associated genes. First, we formulate the TRS Identification problem of identifying TRS sites in a coronavirus genome sequence with prescribed gene locations. We introduce CORSID-A, an algorithm that solves this problem to optimality in polynomial time. We demonstrate that CORSID-A outperforms existing motif-based methods in identifying TRS sites in coronaviruses. Second, we demonstrate for the first time how TRS sites can be leveraged to identify gene locations in the coronavirus genome. To that end, we formulate the TRS and Gene Identification problem of simultaneously identifying TRS sites and gene locations in unannotated coronavirus genomes. We introduce CORSID to solve this problem, which includes a web-based visualization tool to explore the space of near-optimal solutions. We show that CORSID outperforms state-of-the-art gene finding methods in coronavirus genomes. Furthermore, we demonstrate that CORSID enables de novo identification of TRS sites and genes in previously unannotated coronavirus genomes. CORSID is the first method to perform accurate and simultaneous identification of TRS sites and genes in coronavirus genomes without the use of any prior information. Oxford University Press 2022-06-14 /pmc/articles/PMC9214144/ /pubmed/35700225 http://dx.doi.org/10.1093/molbev/msac133 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of Society for Molecular Biology and Evolution. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Zhang, Chuanyi Sashittal, Palash Xiang, Michael Zhang, Yichi Kazi, Ayesha El-Kebir, Mohammed Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses |
title | Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses |
title_full | Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses |
title_fullStr | Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses |
title_full_unstemmed | Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses |
title_short | Accurate Identification of Transcription Regulatory Sequences and Genes in Coronaviruses |
title_sort | accurate identification of transcription regulatory sequences and genes in coronaviruses |
topic | Methods |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9214144/ https://www.ncbi.nlm.nih.gov/pubmed/35700225 http://dx.doi.org/10.1093/molbev/msac133 |
work_keys_str_mv | AT zhangchuanyi accurateidentificationoftranscriptionregulatorysequencesandgenesincoronaviruses AT sashittalpalash accurateidentificationoftranscriptionregulatorysequencesandgenesincoronaviruses AT xiangmichael accurateidentificationoftranscriptionregulatorysequencesandgenesincoronaviruses AT zhangyichi accurateidentificationoftranscriptionregulatorysequencesandgenesincoronaviruses AT kaziayesha accurateidentificationoftranscriptionregulatorysequencesandgenesincoronaviruses AT elkebirmohammed accurateidentificationoftranscriptionregulatorysequencesandgenesincoronaviruses |