Cargando…
Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice
Plants have evolved various mechanisms to adapt to adverse environmental stresses, such as the modulation of gene expression. Expression of stress-responsive genes is controlled by specific regulators, including transcription factors (TFs), that bind to sequence-specific binding sites, representing...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390317/ https://www.ncbi.nlm.nih.gov/pubmed/37528982 http://dx.doi.org/10.3389/fpls.2023.1212073 |
_version_ | 1785082454234628096 |
---|---|
author | Smet, Dajo Opdebeeck, Helder Vandepoele, Klaas |
author_facet | Smet, Dajo Opdebeeck, Helder Vandepoele, Klaas |
author_sort | Smet, Dajo |
collection | PubMed |
description | Plants have evolved various mechanisms to adapt to adverse environmental stresses, such as the modulation of gene expression. Expression of stress-responsive genes is controlled by specific regulators, including transcription factors (TFs), that bind to sequence-specific binding sites, representing key components of cis-regulatory elements and regulatory networks. Our understanding of the underlying regulatory code remains, however, incomplete. Recent studies have shown that, by training machine learning (ML) algorithms on genomic sequence features, it is possible to predict which genes will transcriptionally respond to a specific stress. By identifying the most important features for gene expression prediction, these trained ML models allow, in theory, to further elucidate the regulatory code underlying the transcriptional response to abiotic stress. Here, we trained random forest ML models to predict gene expression in rice (Oryza sativa) in response to heat or drought stress. Apart from thoroughly assessing model performance and robustness across various input training data, the importance of promoter and gene body sequence features to train ML models was evaluated. The use of enriched promoter oligomers, complementing known TF binding sites, allowed us to gain novel insights in DNA motifs contributing to the stress regulatory code. By comparing genomic feature importance scores for drought and heat stress over time, general and stress-specific genomic features contributing to the performance of the learned models and their temporal variation were identified. This study provides a solid foundation to build and interpret ML models accurately predicting transcriptional responses and enables novel insights in biological sequence features that are important for abiotic stress responses. |
format | Online Article Text |
id | pubmed-10390317 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-103903172023-08-01 Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice Smet, Dajo Opdebeeck, Helder Vandepoele, Klaas Front Plant Sci Plant Science Plants have evolved various mechanisms to adapt to adverse environmental stresses, such as the modulation of gene expression. Expression of stress-responsive genes is controlled by specific regulators, including transcription factors (TFs), that bind to sequence-specific binding sites, representing key components of cis-regulatory elements and regulatory networks. Our understanding of the underlying regulatory code remains, however, incomplete. Recent studies have shown that, by training machine learning (ML) algorithms on genomic sequence features, it is possible to predict which genes will transcriptionally respond to a specific stress. By identifying the most important features for gene expression prediction, these trained ML models allow, in theory, to further elucidate the regulatory code underlying the transcriptional response to abiotic stress. Here, we trained random forest ML models to predict gene expression in rice (Oryza sativa) in response to heat or drought stress. Apart from thoroughly assessing model performance and robustness across various input training data, the importance of promoter and gene body sequence features to train ML models was evaluated. The use of enriched promoter oligomers, complementing known TF binding sites, allowed us to gain novel insights in DNA motifs contributing to the stress regulatory code. By comparing genomic feature importance scores for drought and heat stress over time, general and stress-specific genomic features contributing to the performance of the learned models and their temporal variation were identified. This study provides a solid foundation to build and interpret ML models accurately predicting transcriptional responses and enables novel insights in biological sequence features that are important for abiotic stress responses. Frontiers Media S.A. 2023-07-17 /pmc/articles/PMC10390317/ /pubmed/37528982 http://dx.doi.org/10.3389/fpls.2023.1212073 Text en Copyright © 2023 Smet, Opdebeeck and Vandepoele https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Plant Science Smet, Dajo Opdebeeck, Helder Vandepoele, Klaas Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice |
title | Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice |
title_full | Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice |
title_fullStr | Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice |
title_full_unstemmed | Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice |
title_short | Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice |
title_sort | predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice |
topic | Plant Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390317/ https://www.ncbi.nlm.nih.gov/pubmed/37528982 http://dx.doi.org/10.3389/fpls.2023.1212073 |
work_keys_str_mv | AT smetdajo predictingtranscriptionalresponsestoheatanddroughtstressfromgenomicfeaturesusingamachinelearningapproachinrice AT opdebeeckhelder predictingtranscriptionalresponsestoheatanddroughtstressfromgenomicfeaturesusingamachinelearningapproachinrice AT vandepoeleklaas predictingtranscriptionalresponsestoheatanddroughtstressfromgenomicfeaturesusingamachinelearningapproachinrice |