Cargando…

Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale

Genome-wide analysis of miRNA molecules can reveal important information for understanding the biology of cancer. Typically, miRNAs are used as features in statistical learning methods in order to train learning models to predict cancer. This motivates us to propose a method that integrates clusteri...

Descripción completa

Detalles Bibliográficos
Autores principales: Sarkar, Jnanendra Prasad, Saha, Indrajit, Lancucki, Adrian, Ghosh, Nimisha, Wlasnowolski, Michal, Bokota, Grzegorz, Dey, Ashmita, Lipinski, Piotr, Plewczynski, Dariusz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7691578/
https://www.ncbi.nlm.nih.gov/pubmed/33281862
http://dx.doi.org/10.3389/fgene.2020.00982
_version_ 1783614323084492800
author Sarkar, Jnanendra Prasad
Saha, Indrajit
Lancucki, Adrian
Ghosh, Nimisha
Wlasnowolski, Michal
Bokota, Grzegorz
Dey, Ashmita
Lipinski, Piotr
Plewczynski, Dariusz
author_facet Sarkar, Jnanendra Prasad
Saha, Indrajit
Lancucki, Adrian
Ghosh, Nimisha
Wlasnowolski, Michal
Bokota, Grzegorz
Dey, Ashmita
Lipinski, Piotr
Plewczynski, Dariusz
author_sort Sarkar, Jnanendra Prasad
collection PubMed
description Genome-wide analysis of miRNA molecules can reveal important information for understanding the biology of cancer. Typically, miRNAs are used as features in statistical learning methods in order to train learning models to predict cancer. This motivates us to propose a method that integrates clustering and classification techniques for diverse cancer types with survival analysis via regression to identify miRNAs that can potentially play a crucial role in the prediction of different types of tumors. Our method has two parts. The first part is a feature selection procedure, called the stochastic covariance evolutionary strategy with forward selection (SCES-FS), which is developed by integrating stochastic neighbor embedding (SNE), the covariance matrix adaptation evolutionary strategy (CMA-ES), and classifiers, with the primary objective of selecting biomarkers. SNE is used to reorder the features by performing an implicit clustering with highly correlated neighboring features. A subset of features is selected heuristically to perform multi-class classification for diverse cancer types. In the second part of our method, the most important features identified in the first part are used to perform survival analysis via Cox regression, primarily to examine the effectiveness of the selected features. For this purpose, we have analyzed next generation sequencing data from The Cancer Genome Atlas in form of miRNA expression of 1,707 samples of 10 different cancer types and 333 normal samples. The SCES-FS method is compared with well-known feature selection methods and it is found to perform better in multi-class classification for the 17 selected miRNAs, achieving an accuracy of 96%. Moreover, the biological significance of the selected miRNAs is demonstrated with the help of network analysis, expression analysis using hierarchical clustering, KEGG pathway analysis, GO enrichment analysis, and protein-protein interaction analysis. Overall, the results indicate that the 17 selected miRNAs are associated with many key cancer regulators, such as MYC, VEGFA, AKT1, CDKN1A, RHOA, and PTEN, through their targets. Therefore the selected miRNAs can be regarded as putative biomarkers for 10 types of cancer.
format Online
Article
Text
id pubmed-7691578
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-76915782020-12-04 Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale Sarkar, Jnanendra Prasad Saha, Indrajit Lancucki, Adrian Ghosh, Nimisha Wlasnowolski, Michal Bokota, Grzegorz Dey, Ashmita Lipinski, Piotr Plewczynski, Dariusz Front Genet Genetics Genome-wide analysis of miRNA molecules can reveal important information for understanding the biology of cancer. Typically, miRNAs are used as features in statistical learning methods in order to train learning models to predict cancer. This motivates us to propose a method that integrates clustering and classification techniques for diverse cancer types with survival analysis via regression to identify miRNAs that can potentially play a crucial role in the prediction of different types of tumors. Our method has two parts. The first part is a feature selection procedure, called the stochastic covariance evolutionary strategy with forward selection (SCES-FS), which is developed by integrating stochastic neighbor embedding (SNE), the covariance matrix adaptation evolutionary strategy (CMA-ES), and classifiers, with the primary objective of selecting biomarkers. SNE is used to reorder the features by performing an implicit clustering with highly correlated neighboring features. A subset of features is selected heuristically to perform multi-class classification for diverse cancer types. In the second part of our method, the most important features identified in the first part are used to perform survival analysis via Cox regression, primarily to examine the effectiveness of the selected features. For this purpose, we have analyzed next generation sequencing data from The Cancer Genome Atlas in form of miRNA expression of 1,707 samples of 10 different cancer types and 333 normal samples. The SCES-FS method is compared with well-known feature selection methods and it is found to perform better in multi-class classification for the 17 selected miRNAs, achieving an accuracy of 96%. Moreover, the biological significance of the selected miRNAs is demonstrated with the help of network analysis, expression analysis using hierarchical clustering, KEGG pathway analysis, GO enrichment analysis, and protein-protein interaction analysis. Overall, the results indicate that the 17 selected miRNAs are associated with many key cancer regulators, such as MYC, VEGFA, AKT1, CDKN1A, RHOA, and PTEN, through their targets. Therefore the selected miRNAs can be regarded as putative biomarkers for 10 types of cancer. Frontiers Media S.A. 2020-11-13 /pmc/articles/PMC7691578/ /pubmed/33281862 http://dx.doi.org/10.3389/fgene.2020.00982 Text en Copyright © 2020 Sarkar, Saha, Lancucki, Ghosh, Wlasnowolski, Bokota, Dey, Lipinski and Plewczynski. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Sarkar, Jnanendra Prasad
Saha, Indrajit
Lancucki, Adrian
Ghosh, Nimisha
Wlasnowolski, Michal
Bokota, Grzegorz
Dey, Ashmita
Lipinski, Piotr
Plewczynski, Dariusz
Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale
title Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale
title_full Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale
title_fullStr Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale
title_full_unstemmed Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale
title_short Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale
title_sort identification of mirna biomarkers for diverse cancer types using statistical learning methods at the whole-genome scale
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7691578/
https://www.ncbi.nlm.nih.gov/pubmed/33281862
http://dx.doi.org/10.3389/fgene.2020.00982
work_keys_str_mv AT sarkarjnanendraprasad identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale
AT sahaindrajit identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale
AT lancuckiadrian identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale
AT ghoshnimisha identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale
AT wlasnowolskimichal identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale
AT bokotagrzegorz identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale
AT deyashmita identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale
AT lipinskipiotr identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale
AT plewczynskidariusz identificationofmirnabiomarkersfordiversecancertypesusingstatisticallearningmethodsatthewholegenomescale