Cargando…

Exploiting and integrating rich features for biological literature classification

BACKGROUND: Efficient features play an important role in automated text classification, which definitely facilitates the access of large-scale data. In the bioscience field, biological structures and terminologies are described by a large number of features; domain dependent features would significa...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Hongning, Huang, Minlie, Ding, Shilin, Zhu, Xiaoyan
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2349297/ https://www.ncbi.nlm.nih.gov/pubmed/18426549 http://dx.doi.org/10.1186/1471-2105-9-S3-S4

_version_	1782152850740084736
author	Wang, Hongning Huang, Minlie Ding, Shilin Zhu, Xiaoyan
author_facet	Wang, Hongning Huang, Minlie Ding, Shilin Zhu, Xiaoyan
author_sort	Wang, Hongning
collection	PubMed
description	BACKGROUND: Efficient features play an important role in automated text classification, which definitely facilitates the access of large-scale data. In the bioscience field, biological structures and terminologies are described by a large number of features; domain dependent features would significantly improve the classification performance. How to effectively select and integrate different types of features to improve the biological literature classification performance is the major issue studied in this paper. RESULTS: To efficiently classify the biological literatures, we propose a novel feature value schema TF*ML, features covering from lower level domain independent “string feature” to higher level domain dependent “semantic template feature”, and proper integrations among the features. Compared to our previous approaches, the performance is improved in terms of AUC and F-Score by 11.5% and 8.8% respectively, and outperforms the best performance achieved in BioCreAtIvE 2006. CONCLUSIONS: Different types of features possess different discriminative capabilities in literature classification; proper integration of domain independent and dependent features would significantly improve the performance and overcome the over-fitting on data distribution.
format	Text
id	pubmed-2349297
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-23492972008-04-29 Exploiting and integrating rich features for biological literature classification Wang, Hongning Huang, Minlie Ding, Shilin Zhu, Xiaoyan BMC Bioinformatics Proceedings BACKGROUND: Efficient features play an important role in automated text classification, which definitely facilitates the access of large-scale data. In the bioscience field, biological structures and terminologies are described by a large number of features; domain dependent features would significantly improve the classification performance. How to effectively select and integrate different types of features to improve the biological literature classification performance is the major issue studied in this paper. RESULTS: To efficiently classify the biological literatures, we propose a novel feature value schema TF*ML, features covering from lower level domain independent “string feature” to higher level domain dependent “semantic template feature”, and proper integrations among the features. Compared to our previous approaches, the performance is improved in terms of AUC and F-Score by 11.5% and 8.8% respectively, and outperforms the best performance achieved in BioCreAtIvE 2006. CONCLUSIONS: Different types of features possess different discriminative capabilities in literature classification; proper integration of domain independent and dependent features would significantly improve the performance and overcome the over-fitting on data distribution. BioMed Central 2008-04-11 /pmc/articles/PMC2349297/ /pubmed/18426549 http://dx.doi.org/10.1186/1471-2105-9-S3-S4 Text en Copyright © 2008 Wang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Wang, Hongning Huang, Minlie Ding, Shilin Zhu, Xiaoyan Exploiting and integrating rich features for biological literature classification
title	Exploiting and integrating rich features for biological literature classification
title_full	Exploiting and integrating rich features for biological literature classification
title_fullStr	Exploiting and integrating rich features for biological literature classification
title_full_unstemmed	Exploiting and integrating rich features for biological literature classification
title_short	Exploiting and integrating rich features for biological literature classification
title_sort	exploiting and integrating rich features for biological literature classification
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2349297/ https://www.ncbi.nlm.nih.gov/pubmed/18426549 http://dx.doi.org/10.1186/1471-2105-9-S3-S4
work_keys_str_mv	AT wanghongning exploitingandintegratingrichfeaturesforbiologicalliteratureclassification AT huangminlie exploitingandintegratingrichfeaturesforbiologicalliteratureclassification AT dingshilin exploitingandintegratingrichfeaturesforbiologicalliteratureclassification AT zhuxiaoyan exploitingandintegratingrichfeaturesforbiologicalliteratureclassification

Exploiting and integrating rich features for biological literature classification

Ejemplares similares