Cargando…

Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations

Development of machine learning solutions for prediction of functional and clinical significance of cancer driver genes and mutations are paramount in modern biomedical research and have gained a significant momentum in a recent decade. In this work, we integrate different machine learning approache...

Descripción completa

Detalles Bibliográficos
Autores principales: Agajanian, Steve, Oluyemi, Odeyemi, Verkhivker, Gennady M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6579812/
https://www.ncbi.nlm.nih.gov/pubmed/31245384
http://dx.doi.org/10.3389/fmolb.2019.00044
_version_ 1783427907598680064
author Agajanian, Steve
Oluyemi, Odeyemi
Verkhivker, Gennady M.
author_facet Agajanian, Steve
Oluyemi, Odeyemi
Verkhivker, Gennady M.
author_sort Agajanian, Steve
collection PubMed
description Development of machine learning solutions for prediction of functional and clinical significance of cancer driver genes and mutations are paramount in modern biomedical research and have gained a significant momentum in a recent decade. In this work, we integrate different machine learning approaches, including tree based methods, random forest and gradient boosted tree (GBT) classifiers along with deep convolutional neural networks (CNN) for prediction of cancer driver mutations in the genomic datasets. The feasibility of CNN in using raw nucleotide sequences for classification of cancer driver mutations was initially explored by employing label encoding, one hot encoding, and embedding to preprocess the DNA information. These classifiers were benchmarked against their tree-based alternatives in order to evaluate the performance on a relative scale. We then integrated DNA-based scores generated by CNN with various categories of conservational, evolutionary and functional features into a generalized random forest classifier. The results of this study have demonstrated that CNN can learn high level features from genomic information that are complementary to the ensemble-based predictors often employed for classification of cancer mutations. By combining deep learning-generated score with only two main ensemble-based functional features, we can achieve a superior performance of various machine learning classifiers. Our findings have also suggested that synergy of nucleotide-based deep learning scores and integrated metrics derived from protein sequence conservation scores can allow for robust classification of cancer driver mutations with a limited number of highly informative features. Machine learning predictions are leveraged in molecular simulations, protein stability, and network-based analysis of cancer mutations in the protein kinase genes to obtain insights about molecular signatures of driver mutations and enhance the interpretability of cancer-specific classification models.
format Online
Article
Text
id pubmed-6579812
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-65798122019-06-26 Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations Agajanian, Steve Oluyemi, Odeyemi Verkhivker, Gennady M. Front Mol Biosci Molecular Biosciences Development of machine learning solutions for prediction of functional and clinical significance of cancer driver genes and mutations are paramount in modern biomedical research and have gained a significant momentum in a recent decade. In this work, we integrate different machine learning approaches, including tree based methods, random forest and gradient boosted tree (GBT) classifiers along with deep convolutional neural networks (CNN) for prediction of cancer driver mutations in the genomic datasets. The feasibility of CNN in using raw nucleotide sequences for classification of cancer driver mutations was initially explored by employing label encoding, one hot encoding, and embedding to preprocess the DNA information. These classifiers were benchmarked against their tree-based alternatives in order to evaluate the performance on a relative scale. We then integrated DNA-based scores generated by CNN with various categories of conservational, evolutionary and functional features into a generalized random forest classifier. The results of this study have demonstrated that CNN can learn high level features from genomic information that are complementary to the ensemble-based predictors often employed for classification of cancer mutations. By combining deep learning-generated score with only two main ensemble-based functional features, we can achieve a superior performance of various machine learning classifiers. Our findings have also suggested that synergy of nucleotide-based deep learning scores and integrated metrics derived from protein sequence conservation scores can allow for robust classification of cancer driver mutations with a limited number of highly informative features. Machine learning predictions are leveraged in molecular simulations, protein stability, and network-based analysis of cancer mutations in the protein kinase genes to obtain insights about molecular signatures of driver mutations and enhance the interpretability of cancer-specific classification models. Frontiers Media S.A. 2019-06-11 /pmc/articles/PMC6579812/ /pubmed/31245384 http://dx.doi.org/10.3389/fmolb.2019.00044 Text en Copyright © 2019 Agajanian, Oluyemi and Verkhivker. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Molecular Biosciences
Agajanian, Steve
Oluyemi, Odeyemi
Verkhivker, Gennady M.
Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations
title Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations
title_full Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations
title_fullStr Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations
title_full_unstemmed Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations
title_short Integration of Random Forest Classifiers and Deep Convolutional Neural Networks for Classification and Biomolecular Modeling of Cancer Driver Mutations
title_sort integration of random forest classifiers and deep convolutional neural networks for classification and biomolecular modeling of cancer driver mutations
topic Molecular Biosciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6579812/
https://www.ncbi.nlm.nih.gov/pubmed/31245384
http://dx.doi.org/10.3389/fmolb.2019.00044
work_keys_str_mv AT agajaniansteve integrationofrandomforestclassifiersanddeepconvolutionalneuralnetworksforclassificationandbiomolecularmodelingofcancerdrivermutations
AT oluyemiodeyemi integrationofrandomforestclassifiersanddeepconvolutionalneuralnetworksforclassificationandbiomolecularmodelingofcancerdrivermutations
AT verkhivkergennadym integrationofrandomforestclassifiersanddeepconvolutionalneuralnetworksforclassificationandbiomolecularmodelingofcancerdrivermutations