Cargando…

Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases

Free-text clinical notes in electronic health records are more difficult for data mining while the structured diagnostic codes can be missing or erroneous. To improve the quality of diagnostic codes, this work extracts diagnostic codes from free-text notes: five old and new word vectorization method...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhan, Xianghao, Humbert-Droz, Marie, Mukherjee, Pritam, Gevaert, Olivier
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276012/ https://www.ncbi.nlm.nih.gov/pubmed/34286303 http://dx.doi.org/10.1016/j.patter.2021.100289

_version_	1783721829115887616
author	Zhan, Xianghao Humbert-Droz, Marie Mukherjee, Pritam Gevaert, Olivier
author_facet	Zhan, Xianghao Humbert-Droz, Marie Mukherjee, Pritam Gevaert, Olivier
author_sort	Zhan, Xianghao
collection	PubMed
description	Free-text clinical notes in electronic health records are more difficult for data mining while the structured diagnostic codes can be missing or erroneous. To improve the quality of diagnostic codes, this work extracts diagnostic codes from free-text notes: five old and new word vectorization methods were used to vectorize Stanford progress notes and predict eight ICD-10 codes of common cardiovascular diseases with logistic regression. The models showed good performance, with TF-IDF as the best vectorization model showing the highest AUROC (0.9499–0.9915) and AUPRC (0.2956–0.8072). The models also showed transferability when tested on MIMIC-III data with AUROC from 0.7952 to 0.9790 and AUPRC from 0.2353 to 0.8084. Model interpretability was shown by the important words with clinical meanings matching each disease. This study shows the feasibility of accurately extracting structured diagnostic codes, imputing missing codes, and correcting erroneous codes from free-text clinical notes for information retrieval and downstream machine-learning applications.
format	Online Article Text
id	pubmed-8276012
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-82760122021-07-19 Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases Zhan, Xianghao Humbert-Droz, Marie Mukherjee, Pritam Gevaert, Olivier Patterns (N Y) Article Free-text clinical notes in electronic health records are more difficult for data mining while the structured diagnostic codes can be missing or erroneous. To improve the quality of diagnostic codes, this work extracts diagnostic codes from free-text notes: five old and new word vectorization methods were used to vectorize Stanford progress notes and predict eight ICD-10 codes of common cardiovascular diseases with logistic regression. The models showed good performance, with TF-IDF as the best vectorization model showing the highest AUROC (0.9499–0.9915) and AUPRC (0.2956–0.8072). The models also showed transferability when tested on MIMIC-III data with AUROC from 0.7952 to 0.9790 and AUPRC from 0.2353 to 0.8084. Model interpretability was shown by the important words with clinical meanings matching each disease. This study shows the feasibility of accurately extracting structured diagnostic codes, imputing missing codes, and correcting erroneous codes from free-text clinical notes for information retrieval and downstream machine-learning applications. Elsevier 2021-06-17 /pmc/articles/PMC8276012/ /pubmed/34286303 http://dx.doi.org/10.1016/j.patter.2021.100289 Text en © 2021 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Article Zhan, Xianghao Humbert-Droz, Marie Mukherjee, Pritam Gevaert, Olivier Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases
title	Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases
title_full	Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases
title_fullStr	Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases
title_full_unstemmed	Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases
title_short	Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases
title_sort	structuring clinical text with ai: old versus new natural language processing techniques evaluated on eight common cardiovascular diseases
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8276012/ https://www.ncbi.nlm.nih.gov/pubmed/34286303 http://dx.doi.org/10.1016/j.patter.2021.100289
work_keys_str_mv	AT zhanxianghao structuringclinicaltextwithaioldversusnewnaturallanguageprocessingtechniquesevaluatedoneightcommoncardiovasculardiseases AT humbertdrozmarie structuringclinicaltextwithaioldversusnewnaturallanguageprocessingtechniquesevaluatedoneightcommoncardiovasculardiseases AT mukherjeepritam structuringclinicaltextwithaioldversusnewnaturallanguageprocessingtechniquesevaluatedoneightcommoncardiovasculardiseases AT gevaertolivier structuringclinicaltextwithaioldversusnewnaturallanguageprocessingtechniquesevaluatedoneightcommoncardiovasculardiseases

Structuring clinical text with AI: Old versus new natural language processing techniques evaluated on eight common cardiovascular diseases

Ejemplares similares