Cargando…
Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio‐Canary comorbidity project
Objective: Accurate ascertainment of comorbidities is paramount in clinical research. While manual adjudication is labor‐intensive and expensive, the adoption of electronic health records enables computational analysis of free‐text documentation using natural language processing (NLP) tools. Hypothe...
Autores principales: | , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Wiley Periodicals, Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8428009/ https://www.ncbi.nlm.nih.gov/pubmed/34347314 http://dx.doi.org/10.1002/clc.23687 |
_version_ | 1783750291556925440 |
---|---|
author | Berman, Adam N. Biery, David W. Ginder, Curtis Hulme, Olivia L. Marcusa, Daniel Leiva, Orly Wu, Wanda Y. Cardin, Nicholas Hainer, Jon Bhatt, Deepak L. Di Carli, Marcelo F. Turchin, Alexander Blankstein, Ron |
author_facet | Berman, Adam N. Biery, David W. Ginder, Curtis Hulme, Olivia L. Marcusa, Daniel Leiva, Orly Wu, Wanda Y. Cardin, Nicholas Hainer, Jon Bhatt, Deepak L. Di Carli, Marcelo F. Turchin, Alexander Blankstein, Ron |
author_sort | Berman, Adam N. |
collection | PubMed |
description | Objective: Accurate ascertainment of comorbidities is paramount in clinical research. While manual adjudication is labor‐intensive and expensive, the adoption of electronic health records enables computational analysis of free‐text documentation using natural language processing (NLP) tools. Hypothesis: We sought to develop highly accurate NLP modules to assess for the presence of five key cardiovascular comorbidities in a large electronic health record system. Methods: One‐thousand clinical notes were randomly selected from a cardiovascular registry at Mass General Brigham. Trained physicians manually adjudicated these notes for the following five diagnostic comorbidities: hypertension, dyslipidemia, diabetes, coronary artery disease, and stroke/transient ischemic attack. Using the open‐source Canary NLP system, five separate NLP modules were designed based on 800 “training‐set” notes and validated on 200 “test‐set” notes. Results: Across the five NLP modules, the sentence‐level and note‐level sensitivity, specificity, and positive predictive value was always greater than 85% and was most often greater than 90%. Accuracy tended to be highest for conditions with greater diagnostic clarity (e.g. diabetes and hypertension) and slightly lower for conditions whose greater diagnostic challenges (e.g. myocardial infarction and embolic stroke) may lead to less definitive documentation. Conclusion: We designed five open‐source and highly accurate NLP modules that can be used to assess for the presence of important cardiovascular comorbidities in free‐text health records. These modules have been placed in the public domain and can be used for clinical research, trial recruitment and population management at any institution as well as serve as the basis for further development of cardiovascular NLP tools. |
format | Online Article Text |
id | pubmed-8428009 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Wiley Periodicals, Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-84280092021-09-13 Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio‐Canary comorbidity project Berman, Adam N. Biery, David W. Ginder, Curtis Hulme, Olivia L. Marcusa, Daniel Leiva, Orly Wu, Wanda Y. Cardin, Nicholas Hainer, Jon Bhatt, Deepak L. Di Carli, Marcelo F. Turchin, Alexander Blankstein, Ron Clin Cardiol Clinical Investigations Objective: Accurate ascertainment of comorbidities is paramount in clinical research. While manual adjudication is labor‐intensive and expensive, the adoption of electronic health records enables computational analysis of free‐text documentation using natural language processing (NLP) tools. Hypothesis: We sought to develop highly accurate NLP modules to assess for the presence of five key cardiovascular comorbidities in a large electronic health record system. Methods: One‐thousand clinical notes were randomly selected from a cardiovascular registry at Mass General Brigham. Trained physicians manually adjudicated these notes for the following five diagnostic comorbidities: hypertension, dyslipidemia, diabetes, coronary artery disease, and stroke/transient ischemic attack. Using the open‐source Canary NLP system, five separate NLP modules were designed based on 800 “training‐set” notes and validated on 200 “test‐set” notes. Results: Across the five NLP modules, the sentence‐level and note‐level sensitivity, specificity, and positive predictive value was always greater than 85% and was most often greater than 90%. Accuracy tended to be highest for conditions with greater diagnostic clarity (e.g. diabetes and hypertension) and slightly lower for conditions whose greater diagnostic challenges (e.g. myocardial infarction and embolic stroke) may lead to less definitive documentation. Conclusion: We designed five open‐source and highly accurate NLP modules that can be used to assess for the presence of important cardiovascular comorbidities in free‐text health records. These modules have been placed in the public domain and can be used for clinical research, trial recruitment and population management at any institution as well as serve as the basis for further development of cardiovascular NLP tools. Wiley Periodicals, Inc. 2021-08-04 /pmc/articles/PMC8428009/ /pubmed/34347314 http://dx.doi.org/10.1002/clc.23687 Text en © 2021 The Authors. Clinical Cardiology published by Wiley Periodicals LLC. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Clinical Investigations Berman, Adam N. Biery, David W. Ginder, Curtis Hulme, Olivia L. Marcusa, Daniel Leiva, Orly Wu, Wanda Y. Cardin, Nicholas Hainer, Jon Bhatt, Deepak L. Di Carli, Marcelo F. Turchin, Alexander Blankstein, Ron Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio‐Canary comorbidity project |
title | Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio‐Canary comorbidity project |
title_full | Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio‐Canary comorbidity project |
title_fullStr | Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio‐Canary comorbidity project |
title_full_unstemmed | Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio‐Canary comorbidity project |
title_short | Natural language processing for the assessment of cardiovascular disease comorbidities: The cardio‐Canary comorbidity project |
title_sort | natural language processing for the assessment of cardiovascular disease comorbidities: the cardio‐canary comorbidity project |
topic | Clinical Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8428009/ https://www.ncbi.nlm.nih.gov/pubmed/34347314 http://dx.doi.org/10.1002/clc.23687 |
work_keys_str_mv | AT bermanadamn naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject AT bierydavidw naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject AT gindercurtis naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject AT hulmeolivial naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject AT marcusadaniel naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject AT leivaorly naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject AT wuwanday naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject AT cardinnicholas naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject AT hainerjon naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject AT bhattdeepakl naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject AT dicarlimarcelof naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject AT turchinalexander naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject AT blanksteinron naturallanguageprocessingfortheassessmentofcardiovasculardiseasecomorbiditiesthecardiocanarycomorbidityproject |