Cargando…

A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction

OBJECTIVE: The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication–attribute linkage detection in two clinical corpora. DATA AND METHODS: We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for m...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Qi, Zhai, Haijun, Deleger, Louise, Lingren, Todd, Kaiser, Megan, Stoutenborough, Laura, Solti, Imre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3756265/
https://www.ncbi.nlm.nih.gov/pubmed/23268488
http://dx.doi.org/10.1136/amiajnl-2012-001487
_version_ 1782282066065358848
author Li, Qi
Zhai, Haijun
Deleger, Louise
Lingren, Todd
Kaiser, Megan
Stoutenborough, Laura
Solti, Imre
author_facet Li, Qi
Zhai, Haijun
Deleger, Louise
Lingren, Todd
Kaiser, Megan
Stoutenborough, Laura
Solti, Imre
author_sort Li, Qi
collection PubMed
description OBJECTIVE: The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication–attribute linkage detection in two clinical corpora. DATA AND METHODS: We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system's performance against the human-generated gold standard. RESULTS: The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach. The binary SVM classification achieved 0.94 F-measure with individual tokens as features. The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA. The CRF MLSL method achieved 0.80 F-measure on both corpora. DISCUSSION AND CONCLUSIONS: We compared the novel MLSL method with a binary classification and a rule-based method. The MLSL method performed statistically significantly better than the rule-based method. However, the SVM-based binary classification method was statistically significantly better than the MLSL method for both the CTA and CN corpora. Using parsimonious feature sets both the SVM-based binary classification and CRF-based MLSL methods achieved high performance in detecting medication name and attribute linkages in CTA and CN.
format Online
Article
Text
id pubmed-3756265
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-37562652013-12-11 A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction Li, Qi Zhai, Haijun Deleger, Louise Lingren, Todd Kaiser, Megan Stoutenborough, Laura Solti, Imre J Am Med Inform Assoc Research and Applications OBJECTIVE: The goal of this work was to evaluate machine learning methods, binary classification and sequence labeling, for medication–attribute linkage detection in two clinical corpora. DATA AND METHODS: We double annotated 3000 clinical trial announcements (CTA) and 1655 clinical notes (CN) for medication named entities and their attributes. A binary support vector machine (SVM) classification method with parsimonious feature sets, and a conditional random fields (CRF)-based multi-layered sequence labeling (MLSL) model were proposed to identify the linkages between the entities and their corresponding attributes. We evaluated the system's performance against the human-generated gold standard. RESULTS: The experiments showed that the two machine learning approaches performed statistically significantly better than the baseline rule-based approach. The binary SVM classification achieved 0.94 F-measure with individual tokens as features. The SVM model trained on a parsimonious feature set achieved 0.81 F-measure for CN and 0.87 for CTA. The CRF MLSL method achieved 0.80 F-measure on both corpora. DISCUSSION AND CONCLUSIONS: We compared the novel MLSL method with a binary classification and a rule-based method. The MLSL method performed statistically significantly better than the rule-based method. However, the SVM-based binary classification method was statistically significantly better than the MLSL method for both the CTA and CN corpora. Using parsimonious feature sets both the SVM-based binary classification and CRF-based MLSL methods achieved high performance in detecting medication name and attribute linkages in CTA and CN. BMJ Publishing Group 2013-09 2012-12-25 /pmc/articles/PMC3756265/ /pubmed/23268488 http://dx.doi.org/10.1136/amiajnl-2012-001487 Text en Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 3.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/3.0/
spellingShingle Research and Applications
Li, Qi
Zhai, Haijun
Deleger, Louise
Lingren, Todd
Kaiser, Megan
Stoutenborough, Laura
Solti, Imre
A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction
title A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction
title_full A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction
title_fullStr A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction
title_full_unstemmed A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction
title_short A sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction
title_sort sequence labeling approach to link medications and their attributes in clinical notes and clinical trial announcements for information extraction
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3756265/
https://www.ncbi.nlm.nih.gov/pubmed/23268488
http://dx.doi.org/10.1136/amiajnl-2012-001487
work_keys_str_mv AT liqi asequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT zhaihaijun asequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT delegerlouise asequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT lingrentodd asequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT kaisermegan asequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT stoutenboroughlaura asequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT soltiimre asequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT liqi sequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT zhaihaijun sequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT delegerlouise sequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT lingrentodd sequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT kaisermegan sequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT stoutenboroughlaura sequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction
AT soltiimre sequencelabelingapproachtolinkmedicationsandtheirattributesinclinicalnotesandclinicaltrialannouncementsforinformationextraction