Cargando…
Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010
OBJECTIVE: As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrati...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BMJ Group
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3168309/ https://www.ncbi.nlm.nih.gov/pubmed/21565856 http://dx.doi.org/10.1136/amiajnl-2011-000150 |
_version_ | 1782211369624403968 |
---|---|
author | de Bruijn, Berry Cherry, Colin Kiritchenko, Svetlana Martin, Joel Zhu, Xiaodan |
author_facet | de Bruijn, Berry Cherry, Colin Kiritchenko, Svetlana Martin, Joel Zhu, Xiaodan |
author_sort | de Bruijn, Berry |
collection | PubMed |
description | OBJECTIVE: As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge. DESIGN: The three systems perform three key steps in clinical information extraction: (1) extraction of medical problems, tests, and treatments, from discharge summaries and progress notes; (2) classification of assertions made on the medical problems; (3) classification of relations between medical concepts. Machine learning systems performed these tasks using large-dimensional bags of features, as derived from both the text itself and from external sources: UMLS, cTAKES, and Medline. MEASUREMENTS: Performance was measured per subtask, using micro-averaged F-scores, as calculated by comparing system annotations with ground-truth annotations on a test set. RESULTS: The systems ranked high among all submitted systems in the competition, with the following F-scores: concept extraction 0.8523 (ranked first); assertion detection 0.9362 (ranked first); relationship detection 0.7313 (ranked second). CONCLUSION: For all tasks, we found that the introduction of a wide range of features was crucial to success. Importantly, our choice of machine learning algorithms allowed us to be versatile in our feature design, and to introduce a large number of features without overfitting and without encountering computing-resource bottlenecks. |
format | Online Article Text |
id | pubmed-3168309 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BMJ Group |
record_format | MEDLINE/PubMed |
spelling | pubmed-31683092011-09-09 Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 de Bruijn, Berry Cherry, Colin Kiritchenko, Svetlana Martin, Joel Zhu, Xiaodan J Am Med Inform Assoc Research and Applications OBJECTIVE: As clinical text mining continues to mature, its potential as an enabling technology for innovations in patient care and clinical research is becoming a reality. A critical part of that process is rigid benchmark testing of natural language processing methods on realistic clinical narrative. In this paper, the authors describe the design and performance of three state-of-the-art text-mining applications from the National Research Council of Canada on evaluations within the 2010 i2b2 challenge. DESIGN: The three systems perform three key steps in clinical information extraction: (1) extraction of medical problems, tests, and treatments, from discharge summaries and progress notes; (2) classification of assertions made on the medical problems; (3) classification of relations between medical concepts. Machine learning systems performed these tasks using large-dimensional bags of features, as derived from both the text itself and from external sources: UMLS, cTAKES, and Medline. MEASUREMENTS: Performance was measured per subtask, using micro-averaged F-scores, as calculated by comparing system annotations with ground-truth annotations on a test set. RESULTS: The systems ranked high among all submitted systems in the competition, with the following F-scores: concept extraction 0.8523 (ranked first); assertion detection 0.9362 (ranked first); relationship detection 0.7313 (ranked second). CONCLUSION: For all tasks, we found that the introduction of a wide range of features was crucial to success. Importantly, our choice of machine learning algorithms allowed us to be versatile in our feature design, and to introduce a large number of features without overfitting and without encountering computing-resource bottlenecks. BMJ Group 2011-05-12 2011 /pmc/articles/PMC3168309/ /pubmed/21565856 http://dx.doi.org/10.1136/amiajnl-2011-000150 Text en © 2011, Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions. This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode. |
spellingShingle | Research and Applications de Bruijn, Berry Cherry, Colin Kiritchenko, Svetlana Martin, Joel Zhu, Xiaodan Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 |
title | Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 |
title_full | Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 |
title_fullStr | Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 |
title_full_unstemmed | Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 |
title_short | Machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 |
title_sort | machine-learned solutions for three stages of clinical information extraction: the state of the art at i2b2 2010 |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3168309/ https://www.ncbi.nlm.nih.gov/pubmed/21565856 http://dx.doi.org/10.1136/amiajnl-2011-000150 |
work_keys_str_mv | AT debruijnberry machinelearnedsolutionsforthreestagesofclinicalinformationextractionthestateoftheartati2b22010 AT cherrycolin machinelearnedsolutionsforthreestagesofclinicalinformationextractionthestateoftheartati2b22010 AT kiritchenkosvetlana machinelearnedsolutionsforthreestagesofclinicalinformationextractionthestateoftheartati2b22010 AT martinjoel machinelearnedsolutionsforthreestagesofclinicalinformationextractionthestateoftheartati2b22010 AT zhuxiaodan machinelearnedsolutionsforthreestagesofclinicalinformationextractionthestateoftheartati2b22010 |