Cargando…

Discriminative and informative features for biomolecular text mining with ensemble feature selection

Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providin...

Descripción completa

Detalles Bibliográficos
Autores principales: Van Landeghem, Sofie, Abeel, Thomas, Saeys, Yvan, Van de Peer, Yves
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935429/
https://www.ncbi.nlm.nih.gov/pubmed/20823321
http://dx.doi.org/10.1093/bioinformatics/btq381
_version_ 1782186400985120768
author Van Landeghem, Sofie
Abeel, Thomas
Saeys, Yvan
Van de Peer, Yves
author_facet Van Landeghem, Sofie
Abeel, Thomas
Saeys, Yvan
Van de Peer, Yves
author_sort Van Landeghem, Sofie
collection PubMed
description Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. Results: We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. Availability: The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/). Contact: yves.vandepeer@psb.ugent.be
format Text
id pubmed-2935429
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29354292010-09-08 Discriminative and informative features for biomolecular text mining with ensemble feature selection Van Landeghem, Sofie Abeel, Thomas Saeys, Yvan Van de Peer, Yves Bioinformatics Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. Results: We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. Availability: The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/). Contact: yves.vandepeer@psb.ugent.be Oxford University Press 2010-09-15 2010-09-04 /pmc/articles/PMC2935429/ /pubmed/20823321 http://dx.doi.org/10.1093/bioinformatics/btq381 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium
Van Landeghem, Sofie
Abeel, Thomas
Saeys, Yvan
Van de Peer, Yves
Discriminative and informative features for biomolecular text mining with ensemble feature selection
title Discriminative and informative features for biomolecular text mining with ensemble feature selection
title_full Discriminative and informative features for biomolecular text mining with ensemble feature selection
title_fullStr Discriminative and informative features for biomolecular text mining with ensemble feature selection
title_full_unstemmed Discriminative and informative features for biomolecular text mining with ensemble feature selection
title_short Discriminative and informative features for biomolecular text mining with ensemble feature selection
title_sort discriminative and informative features for biomolecular text mining with ensemble feature selection
topic Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935429/
https://www.ncbi.nlm.nih.gov/pubmed/20823321
http://dx.doi.org/10.1093/bioinformatics/btq381
work_keys_str_mv AT vanlandeghemsofie discriminativeandinformativefeaturesforbiomoleculartextminingwithensemblefeatureselection
AT abeelthomas discriminativeandinformativefeaturesforbiomoleculartextminingwithensemblefeatureselection
AT saeysyvan discriminativeandinformativefeaturesforbiomoleculartextminingwithensemblefeatureselection
AT vandepeeryves discriminativeandinformativefeaturesforbiomoleculartextminingwithensemblefeatureselection