Cargando…

Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets

High-Throughput technologies provide genomic and trascriptomic data that are suitable for biomarker detection for classification purposes. However, the high dimension of the output of such technologies and the characteristics of the data sets analysed represent an issue for the classification task....

Descripción completa

Detalles Bibliográficos
Autores principales:	Martina, Federica, Beccuti, Marco, Balbo, Gianfranco, Cordero, Francesca
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5555681/ https://www.ncbi.nlm.nih.gov/pubmed/28806759 http://dx.doi.org/10.1371/journal.pone.0177475

_version_	1783256957466968064
author	Martina, Federica Beccuti, Marco Balbo, Gianfranco Cordero, Francesca
author_facet	Martina, Federica Beccuti, Marco Balbo, Gianfranco Cordero, Francesca
author_sort	Martina, Federica
collection	PubMed
description	High-Throughput technologies provide genomic and trascriptomic data that are suitable for biomarker detection for classification purposes. However, the high dimension of the output of such technologies and the characteristics of the data sets analysed represent an issue for the classification task. Here we present a new feature selection method based on three steps to detect class-specific biomarkers in case of high-dimensional data sets. The first step detects the differentially expressed genes according to the experimental conditions tested in the experimental design, the second step filters out the features with low discriminative power and the third step detects the class-specific features and defines the final biomarker as the union of the class-specific features. The proposed procedure is tested on two microarray datasets, one characterized by a strong imbalance between the size of classes and the other one where the size of classes is perfectly balanced. We show that, using the proposed feature selection procedure, the classification performances of a Support Vector Machine on the imbalanced data set reach a 82% whereas other methods do not exceed 73%. Furthermore, in case of perfectly balanced dataset, the classification performances are comparable with other methods. Finally, the Gene Ontology enrichments performed on the signatures selected with the proposed pipeline, confirm the biological relevance of our methodology. The download of the package with the implementation of Peculiar Genes Selection, ‘PGS’, is available for R users at: http://github.com/mbeccuti/PGS.
format	Online Article Text
id	pubmed-5555681
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-55556812017-08-28 Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets Martina, Federica Beccuti, Marco Balbo, Gianfranco Cordero, Francesca PLoS One Research Article High-Throughput technologies provide genomic and trascriptomic data that are suitable for biomarker detection for classification purposes. However, the high dimension of the output of such technologies and the characteristics of the data sets analysed represent an issue for the classification task. Here we present a new feature selection method based on three steps to detect class-specific biomarkers in case of high-dimensional data sets. The first step detects the differentially expressed genes according to the experimental conditions tested in the experimental design, the second step filters out the features with low discriminative power and the third step detects the class-specific features and defines the final biomarker as the union of the class-specific features. The proposed procedure is tested on two microarray datasets, one characterized by a strong imbalance between the size of classes and the other one where the size of classes is perfectly balanced. We show that, using the proposed feature selection procedure, the classification performances of a Support Vector Machine on the imbalanced data set reach a 82% whereas other methods do not exceed 73%. Furthermore, in case of perfectly balanced dataset, the classification performances are comparable with other methods. Finally, the Gene Ontology enrichments performed on the signatures selected with the proposed pipeline, confirm the biological relevance of our methodology. The download of the package with the implementation of Peculiar Genes Selection, ‘PGS’, is available for R users at: http://github.com/mbeccuti/PGS. Public Library of Science 2017-08-14 /pmc/articles/PMC5555681/ /pubmed/28806759 http://dx.doi.org/10.1371/journal.pone.0177475 Text en © 2017 Martina et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Martina, Federica Beccuti, Marco Balbo, Gianfranco Cordero, Francesca Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets
title	Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets
title_full	Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets
title_fullStr	Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets
title_full_unstemmed	Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets
title_short	Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets
title_sort	peculiar genes selection: a new features selection method to improve classification performances in imbalanced data sets
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5555681/ https://www.ncbi.nlm.nih.gov/pubmed/28806759 http://dx.doi.org/10.1371/journal.pone.0177475
work_keys_str_mv	AT martinafederica peculiargenesselectionanewfeaturesselectionmethodtoimproveclassificationperformancesinimbalanceddatasets AT beccutimarco peculiargenesselectionanewfeaturesselectionmethodtoimproveclassificationperformancesinimbalanceddatasets AT balbogianfranco peculiargenesselectionanewfeaturesselectionmethodtoimproveclassificationperformancesinimbalanceddatasets AT corderofrancesca peculiargenesselectionanewfeaturesselectionmethodtoimproveclassificationperformancesinimbalanceddatasets

Peculiar Genes Selection: A new features selection method to improve classification performances in imbalanced data sets

Ejemplares similares