Cargando…
Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical reco...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7007592/ https://www.ncbi.nlm.nih.gov/pubmed/32012059 http://dx.doi.org/10.2196/16042 |
_version_ | 1783495336145190912 |
---|---|
author | Pfaff, Emily R Crosskey, Miles Morton, Kenneth Krishnamurthy, Ashok |
author_facet | Pfaff, Emily R Crosskey, Miles Morton, Kenneth Krishnamurthy, Ashok |
author_sort | Pfaff, Emily R |
collection | PubMed |
description | Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods. |
format | Online Article Text |
id | pubmed-7007592 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-70075922020-03-05 Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning Pfaff, Emily R Crosskey, Miles Morton, Kenneth Krishnamurthy, Ashok JMIR Med Inform Viewpoint Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods. JMIR Publications 2020-01-24 /pmc/articles/PMC7007592/ /pubmed/32012059 http://dx.doi.org/10.2196/16042 Text en ©Emily R Pfaff, Miles Crosskey, Kenneth Morton, Ashok Krishnamurthy. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 24.01.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Viewpoint Pfaff, Emily R Crosskey, Miles Morton, Kenneth Krishnamurthy, Ashok Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning |
title | Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning |
title_full | Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning |
title_fullStr | Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning |
title_full_unstemmed | Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning |
title_short | Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning |
title_sort | clinical annotation research kit (clark): computable phenotyping using machine learning |
topic | Viewpoint |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7007592/ https://www.ncbi.nlm.nih.gov/pubmed/32012059 http://dx.doi.org/10.2196/16042 |
work_keys_str_mv | AT pfaffemilyr clinicalannotationresearchkitclarkcomputablephenotypingusingmachinelearning AT crosskeymiles clinicalannotationresearchkitclarkcomputablephenotypingusingmachinelearning AT mortonkenneth clinicalannotationresearchkitclarkcomputablephenotypingusingmachinelearning AT krishnamurthyashok clinicalannotationresearchkitclarkcomputablephenotypingusingmachinelearning |