Cargando…

Feature selection for splice site prediction: A new method using EDA-based feature ranking

BACKGROUND: The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutio...

Descripción completa

Detalles Bibliográficos
Autores principales: Saeys, Yvan, Degroeve, Sven, Aeyels, Dirk, Rouzé, Pierre, Van de Peer, Yves
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC421631/
https://www.ncbi.nlm.nih.gov/pubmed/15154966
http://dx.doi.org/10.1186/1471-2105-5-64
_version_ 1782121486340849664
author Saeys, Yvan
Degroeve, Sven
Aeyels, Dirk
Rouzé, Pierre
Van de Peer, Yves
author_facet Saeys, Yvan
Degroeve, Sven
Aeyels, Dirk
Rouzé, Pierre
Van de Peer, Yves
author_sort Saeys, Yvan
collection PubMed
description BACKGROUND: The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. RESULTS: In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing. CONCLUSION: We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features.
format Text
id pubmed-421631
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-4216312004-06-11 Feature selection for splice site prediction: A new method using EDA-based feature ranking Saeys, Yvan Degroeve, Sven Aeyels, Dirk Rouzé, Pierre Van de Peer, Yves BMC Bioinformatics Methodology Article BACKGROUND: The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data. RESULTS: In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing. CONCLUSION: We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features. BioMed Central 2004-05-21 /pmc/articles/PMC421631/ /pubmed/15154966 http://dx.doi.org/10.1186/1471-2105-5-64 Text en Copyright © 2004 Saeys et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Methodology Article
Saeys, Yvan
Degroeve, Sven
Aeyels, Dirk
Rouzé, Pierre
Van de Peer, Yves
Feature selection for splice site prediction: A new method using EDA-based feature ranking
title Feature selection for splice site prediction: A new method using EDA-based feature ranking
title_full Feature selection for splice site prediction: A new method using EDA-based feature ranking
title_fullStr Feature selection for splice site prediction: A new method using EDA-based feature ranking
title_full_unstemmed Feature selection for splice site prediction: A new method using EDA-based feature ranking
title_short Feature selection for splice site prediction: A new method using EDA-based feature ranking
title_sort feature selection for splice site prediction: a new method using eda-based feature ranking
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC421631/
https://www.ncbi.nlm.nih.gov/pubmed/15154966
http://dx.doi.org/10.1186/1471-2105-5-64
work_keys_str_mv AT saeysyvan featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking
AT degroevesven featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking
AT aeyelsdirk featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking
AT rouzepierre featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking
AT vandepeeryves featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking