Cargando…

Explainable artificial intelligence as a reliable annotator of archaeal promoter regions

Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key m...

Descripción completa

Detalles Bibliográficos
Autores principales: Sganzerla Martinez, Gustavo, Perez-Rueda, Ernesto, Kumar, Aditya, Sarkar, Sharmilee, de Avila e Silva, Scheila
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9889792/
https://www.ncbi.nlm.nih.gov/pubmed/36720898
http://dx.doi.org/10.1038/s41598-023-28571-7
_version_ 1784880808014643200
author Sganzerla Martinez, Gustavo
Perez-Rueda, Ernesto
Kumar, Aditya
Sarkar, Sharmilee
de Avila e Silva, Scheila
author_facet Sganzerla Martinez, Gustavo
Perez-Rueda, Ernesto
Kumar, Aditya
Sarkar, Sharmilee
de Avila e Silva, Scheila
author_sort Sganzerla Martinez, Gustavo
collection PubMed
description Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position − 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (− 33), the PPE (at − 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before (https://pcyt.unam.mx/gene-regulation/). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites.
format Online
Article
Text
id pubmed-9889792
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-98897922023-02-02 Explainable artificial intelligence as a reliable annotator of archaeal promoter regions Sganzerla Martinez, Gustavo Perez-Rueda, Ernesto Kumar, Aditya Sarkar, Sharmilee de Avila e Silva, Scheila Sci Rep Article Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position − 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (− 33), the PPE (at − 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before (https://pcyt.unam.mx/gene-regulation/). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites. Nature Publishing Group UK 2023-01-31 /pmc/articles/PMC9889792/ /pubmed/36720898 http://dx.doi.org/10.1038/s41598-023-28571-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Sganzerla Martinez, Gustavo
Perez-Rueda, Ernesto
Kumar, Aditya
Sarkar, Sharmilee
de Avila e Silva, Scheila
Explainable artificial intelligence as a reliable annotator of archaeal promoter regions
title Explainable artificial intelligence as a reliable annotator of archaeal promoter regions
title_full Explainable artificial intelligence as a reliable annotator of archaeal promoter regions
title_fullStr Explainable artificial intelligence as a reliable annotator of archaeal promoter regions
title_full_unstemmed Explainable artificial intelligence as a reliable annotator of archaeal promoter regions
title_short Explainable artificial intelligence as a reliable annotator of archaeal promoter regions
title_sort explainable artificial intelligence as a reliable annotator of archaeal promoter regions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9889792/
https://www.ncbi.nlm.nih.gov/pubmed/36720898
http://dx.doi.org/10.1038/s41598-023-28571-7
work_keys_str_mv AT sganzerlamartinezgustavo explainableartificialintelligenceasareliableannotatorofarchaealpromoterregions
AT perezruedaernesto explainableartificialintelligenceasareliableannotatorofarchaealpromoterregions
AT kumaraditya explainableartificialintelligenceasareliableannotatorofarchaealpromoterregions
AT sarkarsharmilee explainableartificialintelligenceasareliableannotatorofarchaealpromoterregions
AT deavilaesilvascheila explainableartificialintelligenceasareliableannotatorofarchaealpromoterregions