Cargando…
Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection
BACKGROUND: MicroRNAs (miRNAs) are noncoding RNA molecules heavily involved in human tumors, in which few of them circulating the human body. Finding a tumor-associated signature of miRNA, that is, the minimum miRNA entities to be measured for discriminating both different types of cancer and normal...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6751684/ https://www.ncbi.nlm.nih.gov/pubmed/31533612 http://dx.doi.org/10.1186/s12859-019-3050-8 |
_version_ | 1783452660387545088 |
---|---|
author | Lopez-Rincon, Alejandro Martinez-Archundia, Marlet Martinez-Ruiz, Gustavo U. Schoenhuth, Alexander Tonda, Alberto |
author_facet | Lopez-Rincon, Alejandro Martinez-Archundia, Marlet Martinez-Ruiz, Gustavo U. Schoenhuth, Alexander Tonda, Alberto |
author_sort | Lopez-Rincon, Alejandro |
collection | PubMed |
description | BACKGROUND: MicroRNAs (miRNAs) are noncoding RNA molecules heavily involved in human tumors, in which few of them circulating the human body. Finding a tumor-associated signature of miRNA, that is, the minimum miRNA entities to be measured for discriminating both different types of cancer and normal tissues, is of utmost importance. Feature selection techniques applied in machine learning can help however they often provide naive or biased results. RESULTS: An ensemble feature selection strategy for miRNA signatures is proposed. miRNAs are chosen based on consensus on feature relevance from high-accuracy classifiers of different typologies. This methodology aims to identify signatures that are considerably more robust and reliable when used in clinically relevant prediction tasks. Using the proposed method, a 100-miRNA signature is identified in a dataset of 8023 samples, extracted from TCGA. When running eight-state-of-the-art classifiers along with the 100-miRNA signature against the original 1046 features, it could be detected that global accuracy differs only by 1.4%. Importantly, this 100-miRNA signature is sufficient to distinguish between tumor and normal tissues. The approach is then compared against other feature selection methods, such as UFS, RFE, EN, LASSO, Genetic Algorithms, and EFS-CLA. The proposed approach provides better accuracy when tested on a 10-fold cross-validation with different classifiers and it is applied to several GEO datasets across different platforms with some classifiers showing more than 90% classification accuracy, which proves its cross-platform applicability. CONCLUSIONS: The 100-miRNA signature is sufficiently stable to provide almost the same classification accuracy as the complete TCGA dataset, and it is further validated on several GEO datasets, across different types of cancer and platforms. Furthermore, a bibliographic analysis confirms that 77 out of the 100 miRNAs in the signature appear in lists of circulating miRNAs used in cancer studies, in stem-loop or mature-sequence form. The remaining 23 miRNAs offer potentially promising avenues for future research. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3050-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6751684 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-67516842019-09-23 Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection Lopez-Rincon, Alejandro Martinez-Archundia, Marlet Martinez-Ruiz, Gustavo U. Schoenhuth, Alexander Tonda, Alberto BMC Bioinformatics Research Article BACKGROUND: MicroRNAs (miRNAs) are noncoding RNA molecules heavily involved in human tumors, in which few of them circulating the human body. Finding a tumor-associated signature of miRNA, that is, the minimum miRNA entities to be measured for discriminating both different types of cancer and normal tissues, is of utmost importance. Feature selection techniques applied in machine learning can help however they often provide naive or biased results. RESULTS: An ensemble feature selection strategy for miRNA signatures is proposed. miRNAs are chosen based on consensus on feature relevance from high-accuracy classifiers of different typologies. This methodology aims to identify signatures that are considerably more robust and reliable when used in clinically relevant prediction tasks. Using the proposed method, a 100-miRNA signature is identified in a dataset of 8023 samples, extracted from TCGA. When running eight-state-of-the-art classifiers along with the 100-miRNA signature against the original 1046 features, it could be detected that global accuracy differs only by 1.4%. Importantly, this 100-miRNA signature is sufficient to distinguish between tumor and normal tissues. The approach is then compared against other feature selection methods, such as UFS, RFE, EN, LASSO, Genetic Algorithms, and EFS-CLA. The proposed approach provides better accuracy when tested on a 10-fold cross-validation with different classifiers and it is applied to several GEO datasets across different platforms with some classifiers showing more than 90% classification accuracy, which proves its cross-platform applicability. CONCLUSIONS: The 100-miRNA signature is sufficiently stable to provide almost the same classification accuracy as the complete TCGA dataset, and it is further validated on several GEO datasets, across different types of cancer and platforms. Furthermore, a bibliographic analysis confirms that 77 out of the 100 miRNAs in the signature appear in lists of circulating miRNAs used in cancer studies, in stem-loop or mature-sequence form. The remaining 23 miRNAs offer potentially promising avenues for future research. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3050-8) contains supplementary material, which is available to authorized users. BioMed Central 2019-09-18 /pmc/articles/PMC6751684/ /pubmed/31533612 http://dx.doi.org/10.1186/s12859-019-3050-8 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Lopez-Rincon, Alejandro Martinez-Archundia, Marlet Martinez-Ruiz, Gustavo U. Schoenhuth, Alexander Tonda, Alberto Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection |
title | Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection |
title_full | Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection |
title_fullStr | Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection |
title_full_unstemmed | Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection |
title_short | Automatic discovery of 100-miRNA signature for cancer classification using ensemble feature selection |
title_sort | automatic discovery of 100-mirna signature for cancer classification using ensemble feature selection |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6751684/ https://www.ncbi.nlm.nih.gov/pubmed/31533612 http://dx.doi.org/10.1186/s12859-019-3050-8 |
work_keys_str_mv | AT lopezrinconalejandro automaticdiscoveryof100mirnasignatureforcancerclassificationusingensemblefeatureselection AT martinezarchundiamarlet automaticdiscoveryof100mirnasignatureforcancerclassificationusingensemblefeatureselection AT martinezruizgustavou automaticdiscoveryof100mirnasignatureforcancerclassificationusingensemblefeatureselection AT schoenhuthalexander automaticdiscoveryof100mirnasignatureforcancerclassificationusingensemblefeatureselection AT tondaalberto automaticdiscoveryof100mirnasignatureforcancerclassificationusingensemblefeatureselection |