Cargando…

Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice

Plants have evolved various mechanisms to adapt to adverse environmental stresses, such as the modulation of gene expression. Expression of stress-responsive genes is controlled by specific regulators, including transcription factors (TFs), that bind to sequence-specific binding sites, representing...

Descripción completa

Detalles Bibliográficos
Autores principales: Smet, Dajo, Opdebeeck, Helder, Vandepoele, Klaas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390317/
https://www.ncbi.nlm.nih.gov/pubmed/37528982
http://dx.doi.org/10.3389/fpls.2023.1212073
_version_ 1785082454234628096
author Smet, Dajo
Opdebeeck, Helder
Vandepoele, Klaas
author_facet Smet, Dajo
Opdebeeck, Helder
Vandepoele, Klaas
author_sort Smet, Dajo
collection PubMed
description Plants have evolved various mechanisms to adapt to adverse environmental stresses, such as the modulation of gene expression. Expression of stress-responsive genes is controlled by specific regulators, including transcription factors (TFs), that bind to sequence-specific binding sites, representing key components of cis-regulatory elements and regulatory networks. Our understanding of the underlying regulatory code remains, however, incomplete. Recent studies have shown that, by training machine learning (ML) algorithms on genomic sequence features, it is possible to predict which genes will transcriptionally respond to a specific stress. By identifying the most important features for gene expression prediction, these trained ML models allow, in theory, to further elucidate the regulatory code underlying the transcriptional response to abiotic stress. Here, we trained random forest ML models to predict gene expression in rice (Oryza sativa) in response to heat or drought stress. Apart from thoroughly assessing model performance and robustness across various input training data, the importance of promoter and gene body sequence features to train ML models was evaluated. The use of enriched promoter oligomers, complementing known TF binding sites, allowed us to gain novel insights in DNA motifs contributing to the stress regulatory code. By comparing genomic feature importance scores for drought and heat stress over time, general and stress-specific genomic features contributing to the performance of the learned models and their temporal variation were identified. This study provides a solid foundation to build and interpret ML models accurately predicting transcriptional responses and enables novel insights in biological sequence features that are important for abiotic stress responses.
format Online
Article
Text
id pubmed-10390317
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-103903172023-08-01 Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice Smet, Dajo Opdebeeck, Helder Vandepoele, Klaas Front Plant Sci Plant Science Plants have evolved various mechanisms to adapt to adverse environmental stresses, such as the modulation of gene expression. Expression of stress-responsive genes is controlled by specific regulators, including transcription factors (TFs), that bind to sequence-specific binding sites, representing key components of cis-regulatory elements and regulatory networks. Our understanding of the underlying regulatory code remains, however, incomplete. Recent studies have shown that, by training machine learning (ML) algorithms on genomic sequence features, it is possible to predict which genes will transcriptionally respond to a specific stress. By identifying the most important features for gene expression prediction, these trained ML models allow, in theory, to further elucidate the regulatory code underlying the transcriptional response to abiotic stress. Here, we trained random forest ML models to predict gene expression in rice (Oryza sativa) in response to heat or drought stress. Apart from thoroughly assessing model performance and robustness across various input training data, the importance of promoter and gene body sequence features to train ML models was evaluated. The use of enriched promoter oligomers, complementing known TF binding sites, allowed us to gain novel insights in DNA motifs contributing to the stress regulatory code. By comparing genomic feature importance scores for drought and heat stress over time, general and stress-specific genomic features contributing to the performance of the learned models and their temporal variation were identified. This study provides a solid foundation to build and interpret ML models accurately predicting transcriptional responses and enables novel insights in biological sequence features that are important for abiotic stress responses. Frontiers Media S.A. 2023-07-17 /pmc/articles/PMC10390317/ /pubmed/37528982 http://dx.doi.org/10.3389/fpls.2023.1212073 Text en Copyright © 2023 Smet, Opdebeeck and Vandepoele https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Smet, Dajo
Opdebeeck, Helder
Vandepoele, Klaas
Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice
title Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice
title_full Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice
title_fullStr Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice
title_full_unstemmed Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice
title_short Predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice
title_sort predicting transcriptional responses to heat and drought stress from genomic features using a machine learning approach in rice
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10390317/
https://www.ncbi.nlm.nih.gov/pubmed/37528982
http://dx.doi.org/10.3389/fpls.2023.1212073
work_keys_str_mv AT smetdajo predictingtranscriptionalresponsestoheatanddroughtstressfromgenomicfeaturesusingamachinelearningapproachinrice
AT opdebeeckhelder predictingtranscriptionalresponsestoheatanddroughtstressfromgenomicfeaturesusingamachinelearningapproachinrice
AT vandepoeleklaas predictingtranscriptionalresponsestoheatanddroughtstressfromgenomicfeaturesusingamachinelearningapproachinrice