Cargando…
Explainable artificial intelligence as a reliable annotator of archaeal promoter regions
Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key m...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9889792/ https://www.ncbi.nlm.nih.gov/pubmed/36720898 http://dx.doi.org/10.1038/s41598-023-28571-7 |
_version_ | 1784880808014643200 |
---|---|
author | Sganzerla Martinez, Gustavo Perez-Rueda, Ernesto Kumar, Aditya Sarkar, Sharmilee de Avila e Silva, Scheila |
author_facet | Sganzerla Martinez, Gustavo Perez-Rueda, Ernesto Kumar, Aditya Sarkar, Sharmilee de Avila e Silva, Scheila |
author_sort | Sganzerla Martinez, Gustavo |
collection | PubMed |
description | Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position − 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (− 33), the PPE (at − 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before (https://pcyt.unam.mx/gene-regulation/). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites. |
format | Online Article Text |
id | pubmed-9889792 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-98897922023-02-02 Explainable artificial intelligence as a reliable annotator of archaeal promoter regions Sganzerla Martinez, Gustavo Perez-Rueda, Ernesto Kumar, Aditya Sarkar, Sharmilee de Avila e Silva, Scheila Sci Rep Article Archaea are a vast and unexplored cellular domain that thrive in a high diversity of environments, having central roles in processes mediating global carbon and nutrient fluxes. For these organisms to balance their metabolism, the appropriate regulation of their gene expression is essential. A key momentum in regulating genes responsible for the life maintenance of archaea is when transcription factor proteins bind to the promoter element. This DNA segment is conserved, which enables its exploration by machine learning techniques. Here, we trained and tested a support vector machine with 3935 known archaeal promoter sequences. All promoter sequences were coded into DNA Duplex Stability. After, we performed a model interpretation task to map the decision pattern of the classification procedure. We also used a dataset of known-promoter sequences for validation. Our results showed that an AT rich region around position − 27 upstream (relative to the start TSS) is the most conserved in the analyzed organisms. In addition, we were able to identify the BRE element (− 33), the PPE (at − 10) and a position at + 3, that provides a more understandable picture of how promoters are organized in all the archaeal organisms. Finally, we used the interpreted model to identify potential promoter sequences of 135 unannotated organisms, delivering regulatory regions annotation of archaea in a scale never accomplished before (https://pcyt.unam.mx/gene-regulation/). We consider that this approach will be useful to understand how gene regulation is achieved in other organisms apart from the already established transcription factor binding sites. Nature Publishing Group UK 2023-01-31 /pmc/articles/PMC9889792/ /pubmed/36720898 http://dx.doi.org/10.1038/s41598-023-28571-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Sganzerla Martinez, Gustavo Perez-Rueda, Ernesto Kumar, Aditya Sarkar, Sharmilee de Avila e Silva, Scheila Explainable artificial intelligence as a reliable annotator of archaeal promoter regions |
title | Explainable artificial intelligence as a reliable annotator of archaeal promoter regions |
title_full | Explainable artificial intelligence as a reliable annotator of archaeal promoter regions |
title_fullStr | Explainable artificial intelligence as a reliable annotator of archaeal promoter regions |
title_full_unstemmed | Explainable artificial intelligence as a reliable annotator of archaeal promoter regions |
title_short | Explainable artificial intelligence as a reliable annotator of archaeal promoter regions |
title_sort | explainable artificial intelligence as a reliable annotator of archaeal promoter regions |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9889792/ https://www.ncbi.nlm.nih.gov/pubmed/36720898 http://dx.doi.org/10.1038/s41598-023-28571-7 |
work_keys_str_mv | AT sganzerlamartinezgustavo explainableartificialintelligenceasareliableannotatorofarchaealpromoterregions AT perezruedaernesto explainableartificialintelligenceasareliableannotatorofarchaealpromoterregions AT kumaraditya explainableartificialintelligenceasareliableannotatorofarchaealpromoterregions AT sarkarsharmilee explainableartificialintelligenceasareliableannotatorofarchaealpromoterregions AT deavilaesilvascheila explainableartificialintelligenceasareliableannotatorofarchaealpromoterregions |