Cargando…

Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection

Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Shengyu, Tang, Buzhou, Chen, Qingcai, Wang, Xiaolong, Fan, Xiaoming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4377447/
https://www.ncbi.nlm.nih.gov/pubmed/25861377
http://dx.doi.org/10.1155/2015/913489
_version_ 1782363911619608576
author Liu, Shengyu
Tang, Buzhou
Chen, Qingcai
Wang, Xiaolong
Fan, Xiaoming
author_facet Liu, Shengyu
Tang, Buzhou
Chen, Qingcai
Wang, Xiaolong
Fan, Xiaoming
author_sort Liu, Shengyu
collection PubMed
description Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge.
format Online
Article
Text
id pubmed-4377447
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-43774472015-04-08 Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection Liu, Shengyu Tang, Buzhou Chen, Qingcai Wang, Xiaolong Fan, Xiaoming Comput Math Methods Med Research Article Drug name recognition (DNR) is a critical step for drug information extraction. Machine learning-based methods have been widely used for DNR with various types of features such as part-of-speech, word shape, and dictionary feature. Features used in current machine learning-based methods are usually singleton features which may be due to explosive features and a large number of noisy features when singleton features are combined into conjunction features. However, singleton features that can only capture one linguistic characteristic of a word are not sufficient to describe the information for DNR when multiple characteristics should be considered. In this study, we explore feature conjunction and feature selection for DNR, which have never been reported. We intuitively select 8 types of singleton features and combine them into conjunction features in two ways. Then, Chi-square, mutual information, and information gain are used to mine effective features. Experimental results show that feature conjunction and feature selection can improve the performance of the DNR system with a moderate number of features and our DNR system significantly outperforms the best system in the DDIExtraction 2013 challenge. Hindawi Publishing Corporation 2015 2015-03-12 /pmc/articles/PMC4377447/ /pubmed/25861377 http://dx.doi.org/10.1155/2015/913489 Text en Copyright © 2015 Shengyu Liu et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Liu, Shengyu
Tang, Buzhou
Chen, Qingcai
Wang, Xiaolong
Fan, Xiaoming
Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection
title Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection
title_full Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection
title_fullStr Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection
title_full_unstemmed Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection
title_short Feature Engineering for Drug Name Recognition in Biomedical Texts: Feature Conjunction and Feature Selection
title_sort feature engineering for drug name recognition in biomedical texts: feature conjunction and feature selection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4377447/
https://www.ncbi.nlm.nih.gov/pubmed/25861377
http://dx.doi.org/10.1155/2015/913489
work_keys_str_mv AT liushengyu featureengineeringfordrugnamerecognitioninbiomedicaltextsfeatureconjunctionandfeatureselection
AT tangbuzhou featureengineeringfordrugnamerecognitioninbiomedicaltextsfeatureconjunctionandfeatureselection
AT chenqingcai featureengineeringfordrugnamerecognitioninbiomedicaltextsfeatureconjunctionandfeatureselection
AT wangxiaolong featureengineeringfordrugnamerecognitioninbiomedicaltextsfeatureconjunctionandfeatureselection
AT fanxiaoming featureengineeringfordrugnamerecognitioninbiomedicaltextsfeatureconjunctionandfeatureselection