Cargando…

Genome-Wide Prediction of Transcription Start Sites in Conifers

The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcripti...

Descripción completa

Detalles Bibliográficos
Autores principales: Bondar, Eugeniya I., Troukhan, Maxim E., Krutovsky, Konstantin V., Tatarinova, Tatiana V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8836283/
https://www.ncbi.nlm.nih.gov/pubmed/35163661
http://dx.doi.org/10.3390/ijms23031735
_version_ 1784649639364919296
author Bondar, Eugeniya I.
Troukhan, Maxim E.
Krutovsky, Konstantin V.
Tatarinova, Tatiana V.
author_facet Bondar, Eugeniya I.
Troukhan, Maxim E.
Krutovsky, Konstantin V.
Tatarinova, Tatiana V.
author_sort Bondar, Eugeniya I.
collection PubMed
description The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms.
format Online
Article
Text
id pubmed-8836283
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-88362832022-02-12 Genome-Wide Prediction of Transcription Start Sites in Conifers Bondar, Eugeniya I. Troukhan, Maxim E. Krutovsky, Konstantin V. Tatarinova, Tatiana V. Int J Mol Sci Article The identification of promoters is an essential step in the genome annotation process, providing a framework for gene regulatory networks and their role in transcription regulation. Despite considerable advances in the high-throughput determination of transcription start sites (TSSs) and transcription factor binding sites (TFBSs), experimental methods are still time-consuming and expensive. Instead, several computational approaches have been developed to provide fast and reliable means for predicting the location of TSSs and regulatory motifs on a genome-wide scale. Numerous studies have been carried out on the regulatory elements of mammalian genomes, but plant promoters, especially in gymnosperms, have been left out of the limelight and, therefore, have been poorly investigated. The aim of this study was to enhance and expand the existing genome annotations using computational approaches for genome-wide prediction of TSSs in the four conifer species: loblolly pine, white spruce, Norway spruce, and Siberian larch. Our pipeline will be useful for TSS predictions in other genomes, especially for draft assemblies, where reliable TSS predictions are not usually available. We also explored some of the features of the nucleotide composition of the predicted promoters and compared the GC properties of conifer genes with model monocot and dicot plants. Here, we demonstrate that even incomplete genome assemblies and partial annotations can be a reliable starting point for TSS annotation. The results of the TSS prediction in four conifer species have been deposited in the Persephone genome browser, which allows smooth visualization and is optimized for large data sets. This work provides the initial basis for future experimental validation and the study of the regulatory regions to understand gene regulation in gymnosperms. MDPI 2022-02-03 /pmc/articles/PMC8836283/ /pubmed/35163661 http://dx.doi.org/10.3390/ijms23031735 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Bondar, Eugeniya I.
Troukhan, Maxim E.
Krutovsky, Konstantin V.
Tatarinova, Tatiana V.
Genome-Wide Prediction of Transcription Start Sites in Conifers
title Genome-Wide Prediction of Transcription Start Sites in Conifers
title_full Genome-Wide Prediction of Transcription Start Sites in Conifers
title_fullStr Genome-Wide Prediction of Transcription Start Sites in Conifers
title_full_unstemmed Genome-Wide Prediction of Transcription Start Sites in Conifers
title_short Genome-Wide Prediction of Transcription Start Sites in Conifers
title_sort genome-wide prediction of transcription start sites in conifers
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8836283/
https://www.ncbi.nlm.nih.gov/pubmed/35163661
http://dx.doi.org/10.3390/ijms23031735
work_keys_str_mv AT bondareugeniyai genomewidepredictionoftranscriptionstartsitesinconifers
AT troukhanmaxime genomewidepredictionoftranscriptionstartsitesinconifers
AT krutovskykonstantinv genomewidepredictionoftranscriptionstartsitesinconifers
AT tatarinovatatianav genomewidepredictionoftranscriptionstartsitesinconifers