Cargando…
Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale
Genome-wide analysis of miRNA molecules can reveal important information for understanding the biology of cancer. Typically, miRNAs are used as features in statistical learning methods in order to train learning models to predict cancer. This motivates us to propose a method that integrates clusteri...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7691578/ https://www.ncbi.nlm.nih.gov/pubmed/33281862 http://dx.doi.org/10.3389/fgene.2020.00982 |
_version_ | 1783614323084492800 |
---|---|
author | Sarkar, Jnanendra Prasad Saha, Indrajit Lancucki, Adrian Ghosh, Nimisha Wlasnowolski, Michal Bokota, Grzegorz Dey, Ashmita Lipinski, Piotr Plewczynski, Dariusz |
author_facet | Sarkar, Jnanendra Prasad Saha, Indrajit Lancucki, Adrian Ghosh, Nimisha Wlasnowolski, Michal Bokota, Grzegorz Dey, Ashmita Lipinski, Piotr Plewczynski, Dariusz |
author_sort | Sarkar, Jnanendra Prasad |
collection | PubMed |
description | Genome-wide analysis of miRNA molecules can reveal important information for understanding the biology of cancer. Typically, miRNAs are used as features in statistical learning methods in order to train learning models to predict cancer. This motivates us to propose a method that integrates clustering and classification techniques for diverse cancer types with survival analysis via regression to identify miRNAs that can potentially play a crucial role in the prediction of different types of tumors. Our method has two parts. The first part is a feature selection procedure, called the stochastic covariance evolutionary strategy with forward selection (SCES-FS), which is developed by integrating stochastic neighbor embedding (SNE), the covariance matrix adaptation evolutionary strategy (CMA-ES), and classifiers, with the primary objective of selecting biomarkers. SNE is used to reorder the features by performing an implicit clustering with highly correlated neighboring features. A subset of features is selected heuristically to perform multi-class classification for diverse cancer types. In the second part of our method, the most important features identified in the first part are used to perform survival analysis via Cox regression, primarily to examine the effectiveness of the selected features. For this purpose, we have analyzed next generation sequencing data from The Cancer Genome Atlas in form of miRNA expression of 1,707 samples of 10 different cancer types and 333 normal samples. The SCES-FS method is compared with well-known feature selection methods and it is found to perform better in multi-class classification for the 17 selected miRNAs, achieving an accuracy of 96%. Moreover, the biological significance of the selected miRNAs is demonstrated with the help of network analysis, expression analysis using hierarchical clustering, KEGG pathway analysis, GO enrichment analysis, and protein-protein interaction analysis. Overall, the results indicate that the 17 selected miRNAs are associated with many key cancer regulators, such as MYC, VEGFA, AKT1, CDKN1A, RHOA, and PTEN, through their targets. Therefore the selected miRNAs can be regarded as putative biomarkers for 10 types of cancer. |
format | Online Article Text |
id | pubmed-7691578 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-76915782020-12-04 Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale Sarkar, Jnanendra Prasad Saha, Indrajit Lancucki, Adrian Ghosh, Nimisha Wlasnowolski, Michal Bokota, Grzegorz Dey, Ashmita Lipinski, Piotr Plewczynski, Dariusz Front Genet Genetics Genome-wide analysis of miRNA molecules can reveal important information for understanding the biology of cancer. Typically, miRNAs are used as features in statistical learning methods in order to train learning models to predict cancer. This motivates us to propose a method that integrates clustering and classification techniques for diverse cancer types with survival analysis via regression to identify miRNAs that can potentially play a crucial role in the prediction of different types of tumors. Our method has two parts. The first part is a feature selection procedure, called the stochastic covariance evolutionary strategy with forward selection (SCES-FS), which is developed by integrating stochastic neighbor embedding (SNE), the covariance matrix adaptation evolutionary strategy (CMA-ES), and classifiers, with the primary objective of selecting biomarkers. SNE is used to reorder the features by performing an implicit clustering with highly correlated neighboring features. A subset of features is selected heuristically to perform multi-class classification for diverse cancer types. In the second part of our method, the most important features identified in the first part are used to perform survival analysis via Cox regression, primarily to examine the effectiveness of the selected features. For this purpose, we have analyzed next generation sequencing data from The Cancer Genome Atlas in form of miRNA expression of 1,707 samples of 10 different cancer types and 333 normal samples. The SCES-FS method is compared with well-known feature selection methods and it is found to perform better in multi-class classification for the 17 selected miRNAs, achieving an accuracy of 96%. Moreover, the biological significance of the selected miRNAs is demonstrated with the help of network analysis, expression analysis using hierarchical clustering, KEGG pathway analysis, GO enrichment analysis, and protein-protein interaction analysis. Overall, the results indicate that the 17 selected miRNAs are associated with many key cancer regulators, such as MYC, VEGFA, AKT1, CDKN1A, RHOA, and PTEN, through their targets. Therefore the selected miRNAs can be regarded as putative biomarkers for 10 types of cancer. Frontiers Media S.A. 2020-11-13 /pmc/articles/PMC7691578/ /pubmed/33281862 http://dx.doi.org/10.3389/fgene.2020.00982 Text en Copyright © 2020 Sarkar, Saha, Lancucki, Ghosh, Wlasnowolski, Bokota, Dey, Lipinski and Plewczynski. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Sarkar, Jnanendra Prasad Saha, Indrajit Lancucki, Adrian Ghosh, Nimisha Wlasnowolski, Michal Bokota, Grzegorz Dey, Ashmita Lipinski, Piotr Plewczynski, Dariusz Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale |
title | Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale |
title_full | Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale |
title_fullStr | Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale |
title_full_unstemmed | Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale |
title_short | Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale |
title_sort | identification of mirna biomarkers for diverse cancer types using statistical learning methods at the whole-genome scale |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7691578/ https://www.ncbi.nlm.nih.gov/pubmed/33281862 http://dx.doi.org/10.3389/fgene.2020.00982 |
work_keys_str_mv | AT sarkarjnanendraprasad identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale AT sahaindrajit identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale AT lancuckiadrian identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale AT ghoshnimisha identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale AT wlasnowolskimichal identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale AT bokotagrzegorz identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale AT deyashmita identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale AT lipinskipiotr identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale AT plewczynskidariusz identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale |