Cargando…

FasTag: Automatic text classification of unstructured medical narratives

Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are...

Descripción completa

Detalles Bibliográficos
Autores principales:	Venkataraman, Guhan Ram, Pineda, Arturo Lopez, Bear Don’t Walk IV, Oliver J., Zehnder, Ashley M., Ayyar, Sandeep, Page, Rodney L., Bustamante, Carlos D., Rivas, Manuel A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7307763/ https://www.ncbi.nlm.nih.gov/pubmed/32569327 http://dx.doi.org/10.1371/journal.pone.0234647

_version_	1783548865973059584
author	Venkataraman, Guhan Ram Pineda, Arturo Lopez Bear Don’t Walk IV, Oliver J. Zehnder, Ashley M. Ayyar, Sandeep Page, Rodney L. Bustamante, Carlos D. Rivas, Manuel A.
author_facet	Venkataraman, Guhan Ram Pineda, Arturo Lopez Bear Don’t Walk IV, Oliver J. Zehnder, Ashley M. Ayyar, Sandeep Page, Rodney L. Bustamante, Carlos D. Rivas, Manuel A.
author_sort	Venkataraman, Guhan Ram
collection	PubMed
description	Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the “neoplasia” category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another.
format	Online Article Text
id	pubmed-7307763
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-73077632020-06-25 FasTag: Automatic text classification of unstructured medical narratives Venkataraman, Guhan Ram Pineda, Arturo Lopez Bear Don’t Walk IV, Oliver J. Zehnder, Ashley M. Ayyar, Sandeep Page, Rodney L. Bustamante, Carlos D. Rivas, Manuel A. PLoS One Research Article Unstructured clinical narratives are continuously being recorded as part of delivery of care in electronic health records, and dedicated tagging staff spend considerable effort manually assigning clinical codes for billing purposes. Despite these efforts, however, label availability and accuracy are both suboptimal. In this retrospective study, we aimed to automate the assignment of top-level International Classification of Diseases version 9 (ICD-9) codes to clinical records from human and veterinary data stores using minimal manual labor and feature curation. Automating top-level annotations could in turn enable rapid cohort identification, especially in a veterinary setting. To this end, we trained long short-term memory (LSTM) recurrent neural networks (RNNs) on 52,722 human and 89,591 veterinary records. We investigated the accuracy of both separate-domain and combined-domain models and probed model portability. We established relevant baseline classification performances by training Decision Trees (DT) and Random Forests (RF). We also investigated whether transforming the data using MetaMap Lite, a clinical natural language processing tool, affected classification performance. We showed that the LSTM-RNNs accurately classify veterinary and human text narratives into top-level categories with an average weighted macro F1 score of 0.74 and 0.68 respectively. In the “neoplasia” category, the model trained on veterinary data had a high validation accuracy in veterinary data and moderate accuracy in human data, with F1 scores of 0.91 and 0.70 respectively. Our LSTM method scored slightly higher than that of the DT and RF models. The use of LSTM-RNN models represents a scalable structure that could prove useful in cohort identification for comparative oncology studies. Digitization of human and veterinary health information will continue to be a reality, particularly in the form of unstructured narratives. Our approach is a step forward for these two domains to learn from and inform one another. Public Library of Science 2020-06-22 /pmc/articles/PMC7307763/ /pubmed/32569327 http://dx.doi.org/10.1371/journal.pone.0234647 Text en © 2020 Venkataraman et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Venkataraman, Guhan Ram Pineda, Arturo Lopez Bear Don’t Walk IV, Oliver J. Zehnder, Ashley M. Ayyar, Sandeep Page, Rodney L. Bustamante, Carlos D. Rivas, Manuel A. FasTag: Automatic text classification of unstructured medical narratives
title	FasTag: Automatic text classification of unstructured medical narratives
title_full	FasTag: Automatic text classification of unstructured medical narratives
title_fullStr	FasTag: Automatic text classification of unstructured medical narratives
title_full_unstemmed	FasTag: Automatic text classification of unstructured medical narratives
title_short	FasTag: Automatic text classification of unstructured medical narratives
title_sort	fastag: automatic text classification of unstructured medical narratives
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7307763/ https://www.ncbi.nlm.nih.gov/pubmed/32569327 http://dx.doi.org/10.1371/journal.pone.0234647
work_keys_str_mv	AT venkataramanguhanram fastagautomatictextclassificationofunstructuredmedicalnarratives AT pinedaarturolopez fastagautomatictextclassificationofunstructuredmedicalnarratives AT beardontwalkivoliverj fastagautomatictextclassificationofunstructuredmedicalnarratives AT zehnderashleym fastagautomatictextclassificationofunstructuredmedicalnarratives AT ayyarsandeep fastagautomatictextclassificationofunstructuredmedicalnarratives AT pagerodneyl fastagautomatictextclassificationofunstructuredmedicalnarratives AT bustamantecarlosd fastagautomatictextclassificationofunstructuredmedicalnarratives AT rivasmanuela fastagautomatictextclassificationofunstructuredmedicalnarratives

FasTag: Automatic text classification of unstructured medical narratives

Ejemplares similares