Cargando…

Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010

OBJECTIVE: As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrati...

Descripción completa

Detalles Bibliográficos
Autores principales: de Bruijn, Berry, Cherry, Colin, Kiritchenko, Svetlana, Martin, Joel, Zhu, Xiaodan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Group 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3168309/
https://www.ncbi.nlm.nih.gov/pubmed/21565856
http://dx.doi.org/10.1136/amiajnl-2011-000150
_version_ 1782211369624403968
author de Bruijn, Berry
Cherry, Colin
Kiritchenko, Svetlana
Martin, Joel
Zhu, Xiaodan
author_facet de Bruijn, Berry
Cherry, Colin
Kiritchenko, Svetlana
Martin, Joel
Zhu, Xiaodan
author_sort de Bruijn, Berry
collection PubMed
description OBJECTIVE: As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge. DESIGN: The three systems perform three key steps in clinical information extraction: (1) extraction of medical problems, tests, and treatments, from discharge summaries and progress notes; (2) classification of assertions made on the medical problems; (3) classification of relations between medical concepts. Machine learning systems performed these tasks using large-dimensional bags of features, as derived from both the text itself and from external sources: UMLS, cTAKES, and Medline. MEASUREMENTS: Performance was measured per subtask, using micro-averaged F-scores, as calculated by comparing system annotations with ground-truth annotations on a test set. RESULTS: The systems ranked high among all submitted systems in the competition, with the following F-scores: concept extraction 0.8523 (ranked first); assertion detection 0.9362 (ranked first); relationship detection 0.7313 (ranked second). CONCLUSION: For all tasks, we found that the introduction of a wide range of features was crucial to success. Importantly, our choice of machine learning algorithms allowed us to be versatile in our feature design, and to introduce a large number of features without overfitting and without encountering computing-resource bottlenecks.
format Online
Article
Text
id pubmed-3168309
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BMJ Group
record_format MEDLINE/PubMed
spelling pubmed-31683092011-09-09 Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 de Bruijn, Berry Cherry, Colin Kiritchenko, Svetlana Martin, Joel Zhu, Xiaodan J Am Med Inform Assoc Research and Applications OBJECTIVE: As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge. DESIGN: The three systems perform three key steps in clinical information extraction: (1) extraction of medical problems, tests, and treatments, from discharge summaries and progress notes; (2) classification of assertions made on the medical problems; (3) classification of relations between medical concepts. Machine learning systems performed these tasks using large-dimensional bags of features, as derived from both the text itself and from external sources: UMLS, cTAKES, and Medline. MEASUREMENTS: Performance was measured per subtask, using micro-averaged F-scores, as calculated by comparing system annotations with ground-truth annotations on a test set. RESULTS: The systems ranked high among all submitted systems in the competition, with the following F-scores: concept extraction 0.8523 (ranked first); assertion detection 0.9362 (ranked first); relationship detection 0.7313 (ranked second). CONCLUSION: For all tasks, we found that the introduction of a wide range of features was crucial to success. Importantly, our choice of machine learning algorithms allowed us to be versatile in our feature design, and to introduce a large number of features without overfitting and without encountering computing-resource bottlenecks. BMJ Group 2011-05-12 2011 /pmc/articles/PMC3168309/ /pubmed/21565856 http://dx.doi.org/10.1136/amiajnl-2011-000150 Text en © 2011, Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.
spellingShingle Research and Applications
de Bruijn, Berry
Cherry, Colin
Kiritchenko, Svetlana
Martin, Joel
Zhu, Xiaodan
Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010
title Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010
title_full Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010
title_fullStr Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010
title_full_unstemmed Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010
title_short Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010
title_sort machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3168309/
https://www.ncbi.nlm.nih.gov/pubmed/21565856
http://dx.doi.org/10.1136/amiajnl-2011-000150
work_keys_str_mv AT debruijnberry machinelearnedsolutionsforthreestagesofclinicalinformationextractionthestateoftheartati2b22010
AT cherrycolin machinelearnedsolutionsforthreestagesofclinicalinformationextractionthestateoftheartati2b22010
AT kiritchenkosvetlana machinelearnedsolutionsforthreestagesofclinicalinformationextractionthestateoftheartati2b22010
AT martinjoel machinelearnedsolutionsforthreestagesofclinicalinformationextractionthestateoftheartati2b22010
AT zhuxiaodan machinelearnedsolutionsforthreestagesofclinicalinformationextractionthestateoftheartati2b22010