Cargando…
Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection
Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi Publishing Corporation
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4377447/ https://www.ncbi.nlm.nih.gov/pubmed/25861377 http://dx.doi.org/10.1155/2015/913489 |
_version_ | 1782363911619608576 |
---|---|
author | Liu, Shengyu Tang, Buzhou Chen, Qingcai Wang, Xiaolong Fan, Xiaoming |
author_facet | Liu, Shengyu Tang, Buzhou Chen, Qingcai Wang, Xiaolong Fan, Xiaoming |
author_sort | Liu, Shengyu |
collection | PubMed |
description | Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge. |
format | Online Article Text |
id | pubmed-4377447 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Hindawi Publishing Corporation |
record_format | MEDLINE/PubMed |
spelling | pubmed-43774472015-04-08 Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection Liu, Shengyu Tang, Buzhou Chen, Qingcai Wang, Xiaolong Fan, Xiaoming Comput Math Methods Med Research Article Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge. Hindawi Publishing Corporation 2015 2015-03-12 /pmc/articles/PMC4377447/ /pubmed/25861377 http://dx.doi.org/10.1155/2015/913489 Text en Copyright © 2015 Shengyu Liu et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Liu, Shengyu Tang, Buzhou Chen, Qingcai Wang, Xiaolong Fan, Xiaoming Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection |
title | Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection |
title_full | Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection |
title_fullStr | Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection |
title_full_unstemmed | Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection |
title_short | Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection |
title_sort | feature engineering for drug name recognition in biomedical texts: feature conjunction and feature selection |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4377447/ https://www.ncbi.nlm.nih.gov/pubmed/25861377 http://dx.doi.org/10.1155/2015/913489 |
work_keys_str_mv | AT liushengyu featureengineeringfordrugnamerecognitioninbiomedicaltextsfeatureconjunctionandfeatureselection AT tangbuzhou featureengineeringfordrugnamerecognitioninbiomedicaltextsfeatureconjunctionandfeatureselection AT chenqingcai featureengineeringfordrugnamerecognitioninbiomedicaltextsfeatureconjunctionandfeatureselection AT wangxiaolong featureengineeringfordrugnamerecognitioninbiomedicaltextsfeatureconjunctionandfeatureselection AT fanxiaoming featureengineeringfordrugnamerecognitioninbiomedicaltextsfeatureconjunctionandfeatureselection |