Cargando…

Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods

RNA N(6)-methyladenosine (m(6)A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m(6)A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To th...

Descripción completa

Detalles Bibliográficos
Autores principales: Xue, Hao, Wei, Zhen, Chen, Kunqi, Tang, Yujiao, Wu, Xiangyu, Su, Jionglong, Meng, Jia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7372605/
https://www.ncbi.nlm.nih.gov/pubmed/32733123
http://dx.doi.org/10.1177/1176934320915707
_version_ 1783561347979542528
author Xue, Hao
Wei, Zhen
Chen, Kunqi
Tang, Yujiao
Wu, Xiangyu
Su, Jionglong
Meng, Jia
author_facet Xue, Hao
Wei, Zhen
Chen, Kunqi
Tang, Yujiao
Wu, Xiangyu
Su, Jionglong
Meng, Jia
author_sort Xue, Hao
collection PubMed
description RNA N(6)-methyladenosine (m(6)A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m(6)A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To this day, experimental detection and quantification of RNA m(6)A modification are still time-consuming and labor-intensive. There is only a limited number of epitranscriptome samples in existing databases, and a matched RNA methylation profile is not often available for a biological problem of interests. As gene expression data are usually readily available for most biological problems, it could be appealing if we can estimate the RNA methylation status from gene expression data using in silico methods. In this study, we explored the possibility of computational prediction of RNA methylation status from gene expression data using classification and regression methods based on mouse RNA methylation data collected from 73 experimental conditions. Elastic Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and Random Forests (RF) were constructed for classification. Both SVM and RF achieved the best performance with the mean area under the curve (AUC) = 0.84 across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was conducted on those sites selected by ENLR as predictors to access the biological significance of the model. Three functional annotation terms were found statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and endoplasmic reticulum. All 3 terms were found to be closely related to m(6)A pathway. For regression analysis, Elastic Net was implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a mean Spearman correlation coefficient = 0.64. Our exploratory study suggested that gene expression data could be used to construct predictors for m(6)A methylation status with adequate accuracy. Our work showed for the first time that RNA methylation status may be predicted from the matched gene expression data. This finding may facilitate RNA modification research in various biological contexts when a matched RNA methylation profile is not available, especially in the very early stage of the study.
format Online
Article
Text
id pubmed-7372605
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-73726052020-07-29 Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods Xue, Hao Wei, Zhen Chen, Kunqi Tang, Yujiao Wu, Xiangyu Su, Jionglong Meng, Jia Evol Bioinform Online Machine Learning Models for Multi-omics Data Integration RNA N(6)-methyladenosine (m(6)A) has emerged as an important epigenetic modification for its role in regulating the stability, structure, processing, and translation of RNA. Instability of m(6)A homeostasis may result in flaws in stem cell regulation, decrease in fertility, and risk of cancer. To this day, experimental detection and quantification of RNA m(6)A modification are still time-consuming and labor-intensive. There is only a limited number of epitranscriptome samples in existing databases, and a matched RNA methylation profile is not often available for a biological problem of interests. As gene expression data are usually readily available for most biological problems, it could be appealing if we can estimate the RNA methylation status from gene expression data using in silico methods. In this study, we explored the possibility of computational prediction of RNA methylation status from gene expression data using classification and regression methods based on mouse RNA methylation data collected from 73 experimental conditions. Elastic Net-regularized Logistic Regression (ENLR), Support Vector Machine (SVM), and Random Forests (RF) were constructed for classification. Both SVM and RF achieved the best performance with the mean area under the curve (AUC) = 0.84 across samples; SVM had a narrower AUC spread. Gene Site Enrichment Analysis was conducted on those sites selected by ENLR as predictors to access the biological significance of the model. Three functional annotation terms were found statistically significant: phosphoprotein, SRC Homology 3 (SH3) domain, and endoplasmic reticulum. All 3 terms were found to be closely related to m(6)A pathway. For regression analysis, Elastic Net was implemented, which yielded a mean Pearson correlation coefficient = 0.68 and a mean Spearman correlation coefficient = 0.64. Our exploratory study suggested that gene expression data could be used to construct predictors for m(6)A methylation status with adequate accuracy. Our work showed for the first time that RNA methylation status may be predicted from the matched gene expression data. This finding may facilitate RNA modification research in various biological contexts when a matched RNA methylation profile is not available, especially in the very early stage of the study. SAGE Publications 2020-07-20 /pmc/articles/PMC7372605/ /pubmed/32733123 http://dx.doi.org/10.1177/1176934320915707 Text en © The Author(s) 2020 https://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (https://creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Machine Learning Models for Multi-omics Data Integration
Xue, Hao
Wei, Zhen
Chen, Kunqi
Tang, Yujiao
Wu, Xiangyu
Su, Jionglong
Meng, Jia
Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods
title Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods
title_full Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods
title_fullStr Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods
title_full_unstemmed Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods
title_short Prediction of RNA Methylation Status From Gene Expression Data Using Classification and Regression Methods
title_sort prediction of rna methylation status from gene expression data using classification and regression methods
topic Machine Learning Models for Multi-omics Data Integration
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7372605/
https://www.ncbi.nlm.nih.gov/pubmed/32733123
http://dx.doi.org/10.1177/1176934320915707
work_keys_str_mv AT xuehao predictionofrnamethylationstatusfromgeneexpressiondatausingclassificationandregressionmethods
AT weizhen predictionofrnamethylationstatusfromgeneexpressiondatausingclassificationandregressionmethods
AT chenkunqi predictionofrnamethylationstatusfromgeneexpressiondatausingclassificationandregressionmethods
AT tangyujiao predictionofrnamethylationstatusfromgeneexpressiondatausingclassificationandregressionmethods
AT wuxiangyu predictionofrnamethylationstatusfromgeneexpressiondatausingclassificationandregressionmethods
AT sujionglong predictionofrnamethylationstatusfromgeneexpressiondatausingclassificationandregressionmethods
AT mengjia predictionofrnamethylationstatusfromgeneexpressiondatausingclassificationandregressionmethods