Cargando…

Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings

BACKGROUND: Natural Language Processing (NLP) has been shown effective to analyze the content of radiology reports and identify diagnosis or patient characteristics. We evaluate the combination of NLP and machine learning to detect thromboembolic disease diagnosis and incidental clinically relevant...

Descripción completa

Detalles Bibliográficos
Autores principales: Pham, Anne-Dominique, Névéol, Aurélie, Lavergne, Thomas, Yasunaga, Daisuke, Clément, Olivier, Meyer, Guy, Morello, Rémy, Burgun, Anita
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4133634/
https://www.ncbi.nlm.nih.gov/pubmed/25099227
http://dx.doi.org/10.1186/1471-2105-15-266
_version_ 1782330769274830848
author Pham, Anne-Dominique
Névéol, Aurélie
Lavergne, Thomas
Yasunaga, Daisuke
Clément, Olivier
Meyer, Guy
Morello, Rémy
Burgun, Anita
author_facet Pham, Anne-Dominique
Névéol, Aurélie
Lavergne, Thomas
Yasunaga, Daisuke
Clément, Olivier
Meyer, Guy
Morello, Rémy
Burgun, Anita
author_sort Pham, Anne-Dominique
collection PubMed
description BACKGROUND: Natural Language Processing (NLP) has been shown effective to analyze the content of radiology reports and identify diagnosis or patient characteristics. We evaluate the combination of NLP and machine learning to detect thromboembolic disease diagnosis and incidental clinically relevant findings from angiography and venography reports written in French. We model thromboembolic diagnosis and incidental findings as a set of concepts, modalities and relations between concepts that can be used as features by a supervised machine learning algorithm. A corpus of 573 radiology reports was de-identified and manually annotated with the support of NLP tools by a physician for relevant concepts, modalities and relations. A machine learning classifier was trained on the dataset interpreted by a physician for diagnosis of deep-vein thrombosis, pulmonary embolism and clinically relevant incidental findings. Decision models accounted for the imbalanced nature of the data and exploited the structure of the reports. RESULTS: The best model achieved an F measure of 0.98 for pulmonary embolism identification, 1.00 for deep vein thrombosis, and 0.80 for incidental clinically relevant findings. The use of concepts, modalities and relations improved performances in all cases. CONCLUSIONS: This study demonstrates the benefits of developing an automated method to identify medical concepts, modality and relations from radiology reports in French. An end-to-end automatic system for annotation and classification which could be applied to other radiology reports databases would be valuable for epidemiological surveillance, performance monitoring, and accreditation in French hospitals.
format Online
Article
Text
id pubmed-4133634
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41336342014-08-16 Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings Pham, Anne-Dominique Névéol, Aurélie Lavergne, Thomas Yasunaga, Daisuke Clément, Olivier Meyer, Guy Morello, Rémy Burgun, Anita BMC Bioinformatics Research Article BACKGROUND: Natural Language Processing (NLP) has been shown effective to analyze the content of radiology reports and identify diagnosis or patient characteristics. We evaluate the combination of NLP and machine learning to detect thromboembolic disease diagnosis and incidental clinically relevant findings from angiography and venography reports written in French. We model thromboembolic diagnosis and incidental findings as a set of concepts, modalities and relations between concepts that can be used as features by a supervised machine learning algorithm. A corpus of 573 radiology reports was de-identified and manually annotated with the support of NLP tools by a physician for relevant concepts, modalities and relations. A machine learning classifier was trained on the dataset interpreted by a physician for diagnosis of deep-vein thrombosis, pulmonary embolism and clinically relevant incidental findings. Decision models accounted for the imbalanced nature of the data and exploited the structure of the reports. RESULTS: The best model achieved an F measure of 0.98 for pulmonary embolism identification, 1.00 for deep vein thrombosis, and 0.80 for incidental clinically relevant findings. The use of concepts, modalities and relations improved performances in all cases. CONCLUSIONS: This study demonstrates the benefits of developing an automated method to identify medical concepts, modality and relations from radiology reports in French. An end-to-end automatic system for annotation and classification which could be applied to other radiology reports databases would be valuable for epidemiological surveillance, performance monitoring, and accreditation in French hospitals. BioMed Central 2014-08-07 /pmc/articles/PMC4133634/ /pubmed/25099227 http://dx.doi.org/10.1186/1471-2105-15-266 Text en © Pham et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Pham, Anne-Dominique
Névéol, Aurélie
Lavergne, Thomas
Yasunaga, Daisuke
Clément, Olivier
Meyer, Guy
Morello, Rémy
Burgun, Anita
Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings
title Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings
title_full Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings
title_fullStr Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings
title_full_unstemmed Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings
title_short Natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings
title_sort natural language processing of radiology reports for the detection of thromboembolic diseases and clinically relevant incidental findings
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4133634/
https://www.ncbi.nlm.nih.gov/pubmed/25099227
http://dx.doi.org/10.1186/1471-2105-15-266
work_keys_str_mv AT phamannedominique naturallanguageprocessingofradiologyreportsforthedetectionofthromboembolicdiseasesandclinicallyrelevantincidentalfindings
AT neveolaurelie naturallanguageprocessingofradiologyreportsforthedetectionofthromboembolicdiseasesandclinicallyrelevantincidentalfindings
AT lavergnethomas naturallanguageprocessingofradiologyreportsforthedetectionofthromboembolicdiseasesandclinicallyrelevantincidentalfindings
AT yasunagadaisuke naturallanguageprocessingofradiologyreportsforthedetectionofthromboembolicdiseasesandclinicallyrelevantincidentalfindings
AT clementolivier naturallanguageprocessingofradiologyreportsforthedetectionofthromboembolicdiseasesandclinicallyrelevantincidentalfindings
AT meyerguy naturallanguageprocessingofradiologyreportsforthedetectionofthromboembolicdiseasesandclinicallyrelevantincidentalfindings
AT morelloremy naturallanguageprocessingofradiologyreportsforthedetectionofthromboembolicdiseasesandclinicallyrelevantincidentalfindings
AT burgunanita naturallanguageprocessingofradiologyreportsforthedetectionofthromboembolicdiseasesandclinicallyrelevantincidentalfindings