Cargando…

Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning

Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical reco...

Descripción completa

Detalles Bibliográficos
Autores principales: Pfaff, Emily R, Crosskey, Miles, Morton, Kenneth, Krishnamurthy, Ashok
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7007592/
https://www.ncbi.nlm.nih.gov/pubmed/32012059
http://dx.doi.org/10.2196/16042
_version_ 1783495336145190912
author Pfaff, Emily R
Crosskey, Miles
Morton, Kenneth
Krishnamurthy, Ashok
author_facet Pfaff, Emily R
Crosskey, Miles
Morton, Kenneth
Krishnamurthy, Ashok
author_sort Pfaff, Emily R
collection PubMed
description Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.
format Online
Article
Text
id pubmed-7007592
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-70075922020-03-05 Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning Pfaff, Emily R Crosskey, Miles Morton, Kenneth Krishnamurthy, Ashok JMIR Med Inform Viewpoint Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods. JMIR Publications 2020-01-24 /pmc/articles/PMC7007592/ /pubmed/32012059 http://dx.doi.org/10.2196/16042 Text en ©Emily R Pfaff, Miles Crosskey, Kenneth Morton, Ashok Krishnamurthy. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 24.01.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Viewpoint
Pfaff, Emily R
Crosskey, Miles
Morton, Kenneth
Krishnamurthy, Ashok
Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
title Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
title_full Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
title_fullStr Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
title_full_unstemmed Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
title_short Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
title_sort clinical annotation research kit (clark): computable phenotyping using machine learning
topic Viewpoint
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7007592/
https://www.ncbi.nlm.nih.gov/pubmed/32012059
http://dx.doi.org/10.2196/16042
work_keys_str_mv AT pfaffemilyr clinicalannotationresearchkitclarkcomputablephenotypingusingmachinelearning
AT crosskeymiles clinicalannotationresearchkitclarkcomputablephenotypingusingmachinelearning
AT mortonkenneth clinicalannotationresearchkitclarkcomputablephenotypingusingmachinelearning
AT krishnamurthyashok clinicalannotationresearchkitclarkcomputablephenotypingusingmachinelearning