Cargando…

Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis

One challenge in applying bioinformatic tools to clinical or biological data is high number of features that might be provided to the learning algorithm without any prior knowledge on which ones should be used. In such applications, the number of features can drastically exceed the number of trainin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zare, Habil, Haffari, Gholamreza, Gupta, Arvind, Brinkman, Ryan R
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549810/ https://www.ncbi.nlm.nih.gov/pubmed/23369194 http://dx.doi.org/10.1186/1471-2164-14-S1-S14

_version_	1782256475232534528
author	Zare, Habil Haffari, Gholamreza Gupta, Arvind Brinkman, Ryan R
author_facet	Zare, Habil Haffari, Gholamreza Gupta, Arvind Brinkman, Ryan R
author_sort	Zare, Habil
collection	PubMed
description	One challenge in applying bioinformatic tools to clinical or biological data is high number of features that might be provided to the learning algorithm without any prior knowledge on which ones should be used. In such applications, the number of features can drastically exceed the number of training instances which is often limited by the number of available samples for the study. The Lasso is one of many regularization methods that have been developed to prevent overfitting and improve prediction performance in high-dimensional settings. In this paper, we propose a novel algorithm for feature selection based on the Lasso and our hypothesis is that defining a scoring scheme that measures the "quality" of each feature can provide a more robust feature selection method. Our approach is to generate several samples from the training data by bootstrapping, determine the best relevance-ordering of the features for each sample, and finally combine these relevance-orderings to select highly relevant features. In addition to the theoretical analysis of our feature scoring scheme, we provided empirical evaluations on six real datasets from different fields to confirm the superiority of our method in exploratory data analysis and prediction performance. For example, we applied FeaLect, our feature scoring algorithm, to a lymphoma dataset, and according to a human expert, our method led to selecting more meaningful features than those commonly used in the clinics. This case study built a basis for discovering interesting new criteria for lymphoma diagnosis. Furthermore, to facilitate the use of our algorithm in other applications, the source code that implements our algorithm was released as FeaLect, a documented R package in CRAN.
format	Online Article Text
id	pubmed-3549810
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35498102013-01-23 Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis Zare, Habil Haffari, Gholamreza Gupta, Arvind Brinkman, Ryan R BMC Genomics Proceedings One challenge in applying bioinformatic tools to clinical or biological data is high number of features that might be provided to the learning algorithm without any prior knowledge on which ones should be used. In such applications, the number of features can drastically exceed the number of training instances which is often limited by the number of available samples for the study. The Lasso is one of many regularization methods that have been developed to prevent overfitting and improve prediction performance in high-dimensional settings. In this paper, we propose a novel algorithm for feature selection based on the Lasso and our hypothesis is that defining a scoring scheme that measures the "quality" of each feature can provide a more robust feature selection method. Our approach is to generate several samples from the training data by bootstrapping, determine the best relevance-ordering of the features for each sample, and finally combine these relevance-orderings to select highly relevant features. In addition to the theoretical analysis of our feature scoring scheme, we provided empirical evaluations on six real datasets from different fields to confirm the superiority of our method in exploratory data analysis and prediction performance. For example, we applied FeaLect, our feature scoring algorithm, to a lymphoma dataset, and according to a human expert, our method led to selecting more meaningful features than those commonly used in the clinics. This case study built a basis for discovering interesting new criteria for lymphoma diagnosis. Furthermore, to facilitate the use of our algorithm in other applications, the source code that implements our algorithm was released as FeaLect, a documented R package in CRAN. BioMed Central 2013-01-21 /pmc/articles/PMC3549810/ /pubmed/23369194 http://dx.doi.org/10.1186/1471-2164-14-S1-S14 Text en Copyright ©2013 Zare et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Zare, Habil Haffari, Gholamreza Gupta, Arvind Brinkman, Ryan R Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis
title	Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis
title_full	Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis
title_fullStr	Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis
title_full_unstemmed	Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis
title_short	Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis
title_sort	scoring relevancy of features based on combinatorial analysis of lasso with application to lymphoma diagnosis
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3549810/ https://www.ncbi.nlm.nih.gov/pubmed/23369194 http://dx.doi.org/10.1186/1471-2164-14-S1-S14
work_keys_str_mv	AT zarehabil scoringrelevancyoffeaturesbasedoncombinatorialanalysisoflassowithapplicationtolymphomadiagnosis AT haffarigholamreza scoringrelevancyoffeaturesbasedoncombinatorialanalysisoflassowithapplicationtolymphomadiagnosis AT guptaarvind scoringrelevancyoffeaturesbasedoncombinatorialanalysisoflassowithapplicationtolymphomadiagnosis AT brinkmanryanr scoringrelevancyoffeaturesbasedoncombinatorialanalysisoflassowithapplicationtolymphomadiagnosis

Scoring relevancy of features based on combinatorial analysis of Lasso with application to lymphoma diagnosis

Ejemplares similares