Cargando…
Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9204276/ https://www.ncbi.nlm.nih.gov/pubmed/35719692 http://dx.doi.org/10.3389/frai.2022.830170 |
_version_ | 1784728888284282880 |
---|---|
author | Cho, Kyoung Tak Sen, Taner Z. Andorf, Carson M. |
author_facet | Cho, Kyoung Tak Sen, Taner Z. Andorf, Carson M. |
author_sort | Cho, Kyoung Tak |
collection | PubMed |
description | Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated to proteins potentially, but given that not all genes are expressed in every condition and tissue, the challenge remains to predict condition-specific expression. To address this gap, we developed a machine learning approach to predict tissue-specific gene expression across 23 different tissues in maize, solely based on DNA promoter and protein sequences. For class labels, we defined high and low expression levels for mRNA and protein abundance and optimized classifiers by systematically exploring various methods and combinations of k-mer sequences in a two-phase approach. In the first phase, we developed Markov model classifiers for each tissue and built a feature vector based on the predictions. In the second phase, the feature vector was used as an input to a Bayesian network for final classification. Our results show that these methods can achieve high classification accuracy of up to 95% for predicting gene expression for individual tissues. By relying on sequence alone, our method works in settings where costly experimental data are unavailable and reveals useful insights into the functional, evolutionary, and regulatory characteristics of genes. |
format | Online Article Text |
id | pubmed-9204276 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-92042762022-06-18 Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach Cho, Kyoung Tak Sen, Taner Z. Andorf, Carson M. Front Artif Intell Artificial Intelligence Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated to proteins potentially, but given that not all genes are expressed in every condition and tissue, the challenge remains to predict condition-specific expression. To address this gap, we developed a machine learning approach to predict tissue-specific gene expression across 23 different tissues in maize, solely based on DNA promoter and protein sequences. For class labels, we defined high and low expression levels for mRNA and protein abundance and optimized classifiers by systematically exploring various methods and combinations of k-mer sequences in a two-phase approach. In the first phase, we developed Markov model classifiers for each tissue and built a feature vector based on the predictions. In the second phase, the feature vector was used as an input to a Bayesian network for final classification. Our results show that these methods can achieve high classification accuracy of up to 95% for predicting gene expression for individual tissues. By relying on sequence alone, our method works in settings where costly experimental data are unavailable and reveals useful insights into the functional, evolutionary, and regulatory characteristics of genes. Frontiers Media S.A. 2022-05-26 /pmc/articles/PMC9204276/ /pubmed/35719692 http://dx.doi.org/10.3389/frai.2022.830170 Text en Copyright © 2022 Cho, Sen and Andorf. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Artificial Intelligence Cho, Kyoung Tak Sen, Taner Z. Andorf, Carson M. Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach |
title | Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach |
title_full | Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach |
title_fullStr | Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach |
title_full_unstemmed | Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach |
title_short | Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach |
title_sort | predicting tissue-specific mrna and protein abundance in maize: a machine learning approach |
topic | Artificial Intelligence |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9204276/ https://www.ncbi.nlm.nih.gov/pubmed/35719692 http://dx.doi.org/10.3389/frai.2022.830170 |
work_keys_str_mv | AT chokyoungtak predictingtissuespecificmrnaandproteinabundanceinmaizeamachinelearningapproach AT sentanerz predictingtissuespecificmrnaandproteinabundanceinmaizeamachinelearningapproach AT andorfcarsonm predictingtissuespecificmrnaandproteinabundanceinmaizeamachinelearningapproach |