Cargando…

Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach

Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated...

Descripción completa

Detalles Bibliográficos
Autores principales: Cho, Kyoung Tak, Sen, Taner Z., Andorf, Carson M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9204276/
https://www.ncbi.nlm.nih.gov/pubmed/35719692
http://dx.doi.org/10.3389/frai.2022.830170
_version_ 1784728888284282880
author Cho, Kyoung Tak
Sen, Taner Z.
Andorf, Carson M.
author_facet Cho, Kyoung Tak
Sen, Taner Z.
Andorf, Carson M.
author_sort Cho, Kyoung Tak
collection PubMed
description Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated to proteins potentially, but given that not all genes are expressed in every condition and tissue, the challenge remains to predict condition-specific expression. To address this gap, we developed a machine learning approach to predict tissue-specific gene expression across 23 different tissues in maize, solely based on DNA promoter and protein sequences. For class labels, we defined high and low expression levels for mRNA and protein abundance and optimized classifiers by systematically exploring various methods and combinations of k-mer sequences in a two-phase approach. In the first phase, we developed Markov model classifiers for each tissue and built a feature vector based on the predictions. In the second phase, the feature vector was used as an input to a Bayesian network for final classification. Our results show that these methods can achieve high classification accuracy of up to 95% for predicting gene expression for individual tissues. By relying on sequence alone, our method works in settings where costly experimental data are unavailable and reveals useful insights into the functional, evolutionary, and regulatory characteristics of genes.
format Online
Article
Text
id pubmed-9204276
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-92042762022-06-18 Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach Cho, Kyoung Tak Sen, Taner Z. Andorf, Carson M. Front Artif Intell Artificial Intelligence Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated to proteins potentially, but given that not all genes are expressed in every condition and tissue, the challenge remains to predict condition-specific expression. To address this gap, we developed a machine learning approach to predict tissue-specific gene expression across 23 different tissues in maize, solely based on DNA promoter and protein sequences. For class labels, we defined high and low expression levels for mRNA and protein abundance and optimized classifiers by systematically exploring various methods and combinations of k-mer sequences in a two-phase approach. In the first phase, we developed Markov model classifiers for each tissue and built a feature vector based on the predictions. In the second phase, the feature vector was used as an input to a Bayesian network for final classification. Our results show that these methods can achieve high classification accuracy of up to 95% for predicting gene expression for individual tissues. By relying on sequence alone, our method works in settings where costly experimental data are unavailable and reveals useful insights into the functional, evolutionary, and regulatory characteristics of genes. Frontiers Media S.A. 2022-05-26 /pmc/articles/PMC9204276/ /pubmed/35719692 http://dx.doi.org/10.3389/frai.2022.830170 Text en Copyright © 2022 Cho, Sen and Andorf. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Artificial Intelligence
Cho, Kyoung Tak
Sen, Taner Z.
Andorf, Carson M.
Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
title Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
title_full Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
title_fullStr Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
title_full_unstemmed Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
title_short Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
title_sort predicting tissue-specific mrna and protein abundance in maize: a machine learning approach
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9204276/
https://www.ncbi.nlm.nih.gov/pubmed/35719692
http://dx.doi.org/10.3389/frai.2022.830170
work_keys_str_mv AT chokyoungtak predictingtissuespecificmrnaandproteinabundanceinmaizeamachinelearningapproach
AT sentanerz predictingtissuespecificmrnaandproteinabundanceinmaizeamachinelearningapproach
AT andorfcarsonm predictingtissuespecificmrnaandproteinabundanceinmaizeamachinelearningapproach