Cargando…

Temporal bone radiology report classification using open source machine learning and natural langue processing libraries

BACKGROUND: Radiology reports are a rich resource for biomedical research. Prior to utilization, trained experts must manually review reports to identify discrete outcomes. The Audiological and Genetic Database (AudGenDB) is a public, de-identified research database that contains over 16,000 radiolo...

Descripción completa

Detalles Bibliográficos
Autores principales:	Masino, Aaron J., Grundmeier, Robert W., Pennington, Jeffrey W., Germiller, John A., Crenshaw, E. Bryan
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4896018/ https://www.ncbi.nlm.nih.gov/pubmed/27267768 http://dx.doi.org/10.1186/s12911-016-0306-3

_version_	1782435970509963264
author	Masino, Aaron J. Grundmeier, Robert W. Pennington, Jeffrey W. Germiller, John A. Crenshaw, E. Bryan
author_facet	Masino, Aaron J. Grundmeier, Robert W. Pennington, Jeffrey W. Germiller, John A. Crenshaw, E. Bryan
author_sort	Masino, Aaron J.
collection	PubMed
description	BACKGROUND: Radiology reports are a rich resource for biomedical research. Prior to utilization, trained experts must manually review reports to identify discrete outcomes. The Audiological and Genetic Database (AudGenDB) is a public, de-identified research database that contains over 16,000 radiology reports. Because the reports are unlabeled, it is difficult to select those with specific abnormalities. We implemented a classification pipeline using a human-in-the-loop machine learning approach and open source libraries to label the reports with one or more of four abnormality region labels: inner, middle, outer, and mastoid, indicating the presence of an abnormality in the specified ear region. METHODS: Trained abstractors labeled radiology reports taken from AudGenDB to form a gold standard. These were split into training (80 %) and test (20 %) sets. We applied open source libraries to normalize and convert every report to an n-gram feature vector. We trained logistic regression, support vector machine (linear and Gaussian), decision tree, random forest, and naïve Bayes models for each ear region. The models were evaluated on the hold-out test set. RESULTS: Our gold-standard data set contained 726 reports. The best classifiers were linear support vector machine for inner and outer ear, logistic regression for middle ear, and decision tree for mastoid. Classifier test set accuracy was 90 %, 90 %, 93 %, and 82 % for the inner, middle, outer and mastoid regions, respectively. The logistic regression method was very consistent, achieving accuracy scores within 2.75 % of the best classifier across regions and a receiver operator characteristic area under the curve of 0.92 or greater across all regions. CONCLUSIONS: Our results indicate that the applied methods achieve accuracy scores sufficient to support our objective of extracting discrete features from radiology reports to enhance cohort identification in AudGenDB. The models described here are available in several free, open source libraries that make them more accessible and simplify their utilization as demonstrated in this work. We additionally implemented the models as a web service that accepts radiology report text in an HTTP request and provides the predicted region labels. This service has been used to label the reports in AudGenDB and is freely available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12911-016-0306-3) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4896018
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-48960182016-06-08 Temporal bone radiology report classification using open source machine learning and natural langue processing libraries Masino, Aaron J. Grundmeier, Robert W. Pennington, Jeffrey W. Germiller, John A. Crenshaw, E. Bryan BMC Med Inform Decis Mak Research Article BACKGROUND: Radiology reports are a rich resource for biomedical research. Prior to utilization, trained experts must manually review reports to identify discrete outcomes. The Audiological and Genetic Database (AudGenDB) is a public, de-identified research database that contains over 16,000 radiology reports. Because the reports are unlabeled, it is difficult to select those with specific abnormalities. We implemented a classification pipeline using a human-in-the-loop machine learning approach and open source libraries to label the reports with one or more of four abnormality region labels: inner, middle, outer, and mastoid, indicating the presence of an abnormality in the specified ear region. METHODS: Trained abstractors labeled radiology reports taken from AudGenDB to form a gold standard. These were split into training (80 %) and test (20 %) sets. We applied open source libraries to normalize and convert every report to an n-gram feature vector. We trained logistic regression, support vector machine (linear and Gaussian), decision tree, random forest, and naïve Bayes models for each ear region. The models were evaluated on the hold-out test set. RESULTS: Our gold-standard data set contained 726 reports. The best classifiers were linear support vector machine for inner and outer ear, logistic regression for middle ear, and decision tree for mastoid. Classifier test set accuracy was 90 %, 90 %, 93 %, and 82 % for the inner, middle, outer and mastoid regions, respectively. The logistic regression method was very consistent, achieving accuracy scores within 2.75 % of the best classifier across regions and a receiver operator characteristic area under the curve of 0.92 or greater across all regions. CONCLUSIONS: Our results indicate that the applied methods achieve accuracy scores sufficient to support our objective of extracting discrete features from radiology reports to enhance cohort identification in AudGenDB. The models described here are available in several free, open source libraries that make them more accessible and simplify their utilization as demonstrated in this work. We additionally implemented the models as a web service that accepts radiology report text in an HTTP request and provides the predicted region labels. This service has been used to label the reports in AudGenDB and is freely available. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12911-016-0306-3) contains supplementary material, which is available to authorized users. BioMed Central 2016-06-06 /pmc/articles/PMC4896018/ /pubmed/27267768 http://dx.doi.org/10.1186/s12911-016-0306-3 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Masino, Aaron J. Grundmeier, Robert W. Pennington, Jeffrey W. Germiller, John A. Crenshaw, E. Bryan Temporal bone radiology report classification using open source machine learning and natural langue processing libraries
title	Temporal bone radiology report classification using open source machine learning and natural langue processing libraries
title_full	Temporal bone radiology report classification using open source machine learning and natural langue processing libraries
title_fullStr	Temporal bone radiology report classification using open source machine learning and natural langue processing libraries
title_full_unstemmed	Temporal bone radiology report classification using open source machine learning and natural langue processing libraries
title_short	Temporal bone radiology report classification using open source machine learning and natural langue processing libraries
title_sort	temporal bone radiology report classification using open source machine learning and natural langue processing libraries
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4896018/ https://www.ncbi.nlm.nih.gov/pubmed/27267768 http://dx.doi.org/10.1186/s12911-016-0306-3
work_keys_str_mv	AT masinoaaronj temporalboneradiologyreportclassificationusingopensourcemachinelearningandnaturallangueprocessinglibraries AT grundmeierrobertw temporalboneradiologyreportclassificationusingopensourcemachinelearningandnaturallangueprocessinglibraries AT penningtonjeffreyw temporalboneradiologyreportclassificationusingopensourcemachinelearningandnaturallangueprocessinglibraries AT germillerjohna temporalboneradiologyreportclassificationusingopensourcemachinelearningandnaturallangueprocessinglibraries AT crenshawebryan temporalboneradiologyreportclassificationusingopensourcemachinelearningandnaturallangueprocessinglibraries

Temporal bone radiology report classification using open source machine learning and natural langue processing libraries

Ejemplares similares