Cargando…
Discriminative and informative features for biomolecular text mining with ensemble feature selection
Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providin...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935429/ https://www.ncbi.nlm.nih.gov/pubmed/20823321 http://dx.doi.org/10.1093/bioinformatics/btq381 |
_version_ | 1782186400985120768 |
---|---|
author | Van Landeghem, Sofie Abeel, Thomas Saeys, Yvan Van de Peer, Yves |
author_facet | Van Landeghem, Sofie Abeel, Thomas Saeys, Yvan Van de Peer, Yves |
author_sort | Van Landeghem, Sofie |
collection | PubMed |
description | Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. Results: We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. Availability: The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/). Contact: yves.vandepeer@psb.ugent.be |
format | Text |
id | pubmed-2935429 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-29354292010-09-08 Discriminative and informative features for biomolecular text mining with ensemble feature selection Van Landeghem, Sofie Abeel, Thomas Saeys, Yvan Van de Peer, Yves Bioinformatics Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium Motivation: In the field of biomolecular text mining, black box behavior of machine learning systems currently limits understanding of the true nature of the predictions. However, feature selection (FS) is capable of identifying the most relevant features in any supervised learning setting, providing insight into the specific properties of the classification algorithm. This allows us to build more accurate classifiers while at the same time bridging the gap between the black box behavior and the end-user who has to interpret the results. Results: We show that our FS methodology successfully discards a large fraction of machine-generated features, improving classification performance of state-of-the-art text mining algorithms. Furthermore, we illustrate how FS can be applied to gain understanding in the predictions of a framework for biomolecular event extraction from text. We include numerous examples of highly discriminative features that model either biological reality or common linguistic constructs. Finally, we discuss a number of insights from our FS analyses that will provide the opportunity to considerably improve upon current text mining tools. Availability: The FS algorithms and classifiers are available in Java-ML (http://java-ml.sf.net). The datasets are publicly available from the BioNLP'09 Shared Task web site (http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/). Contact: yves.vandepeer@psb.ugent.be Oxford University Press 2010-09-15 2010-09-04 /pmc/articles/PMC2935429/ /pubmed/20823321 http://dx.doi.org/10.1093/bioinformatics/btq381 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium Van Landeghem, Sofie Abeel, Thomas Saeys, Yvan Van de Peer, Yves Discriminative and informative features for biomolecular text mining with ensemble feature selection |
title | Discriminative and informative features for biomolecular text mining with ensemble feature selection |
title_full | Discriminative and informative features for biomolecular text mining with ensemble feature selection |
title_fullStr | Discriminative and informative features for biomolecular text mining with ensemble feature selection |
title_full_unstemmed | Discriminative and informative features for biomolecular text mining with ensemble feature selection |
title_short | Discriminative and informative features for biomolecular text mining with ensemble feature selection |
title_sort | discriminative and informative features for biomolecular text mining with ensemble feature selection |
topic | Eccb 2010 Conference Proceedings September 26 to September 29, 2010, Ghent, Belgium |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2935429/ https://www.ncbi.nlm.nih.gov/pubmed/20823321 http://dx.doi.org/10.1093/bioinformatics/btq381 |
work_keys_str_mv | AT vanlandeghemsofie discriminativeandinformativefeaturesforbiomoleculartextminingwithensemblefeatureselection AT abeelthomas discriminativeandinformativefeaturesforbiomoleculartextminingwithensemblefeatureselection AT saeysyvan discriminativeandinformativefeaturesforbiomoleculartextminingwithensemblefeatureselection AT vandepeeryves discriminativeandinformativefeaturesforbiomoleculartextminingwithensemblefeatureselection |