Cargando…

Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning

Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical reco...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pfaff, Emily R, Crosskey, Miles, Morton, Kenneth, Krishnamurthy, Ashok
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2020
Materias:	Viewpoint
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7007592/ https://www.ncbi.nlm.nih.gov/pubmed/32012059 http://dx.doi.org/10.2196/16042

_version_	1783495336145190912
author	Pfaff, Emily R Crosskey, Miles Morton, Kenneth Krishnamurthy, Ashok
author_facet	Pfaff, Emily R Crosskey, Miles Morton, Kenneth Krishnamurthy, Ashok
author_sort	Pfaff, Emily R
collection	PubMed
description	Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods.
format	Online Article Text
id	pubmed-7007592
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-70075922020-03-05 Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning Pfaff, Emily R Crosskey, Miles Morton, Kenneth Krishnamurthy, Ashok JMIR Med Inform Viewpoint Computable phenotypes are algorithms that translate clinical features into code that can be run against electronic health record (EHR) data to define patient cohorts. However, computable phenotypes that only make use of structured EHR data do not capture the full richness of a patient’s medical record. While natural language processing (NLP) methods have shown success in extracting clinical features from text, the use of such tools has generally been limited to research groups with substantial NLP expertise. Our goal was to develop an open-source phenotyping software, Clinical Annotation Research Kit (CLARK), that would enable clinical and translational researchers to use machine learning–based NLP for computable phenotyping without requiring deep informatics expertise. CLARK enables nonexpert users to mine text using machine learning classifiers by specifying features for the software to match in clinical notes. Once the features are defined, the user-friendly CLARK interface allows the user to choose from a variety of standard machine learning algorithms (linear support vector machine, Gaussian Naïve Bayes, decision tree, and random forest), cross-validation methods, and the number of folds (cross-validation splits) to be used in evaluation of the classifier. Example phenotypes where CLARK has been applied include pediatric diabetes (sensitivity=0.91; specificity=0.98), symptomatic uterine fibroids (positive predictive value=0.81; negative predictive value=0.54), nonalcoholic fatty liver disease (sensitivity=0.90; specificity=0.94), and primary ciliary dyskinesia (sensitivity=0.88; specificity=1.0). In each of these use cases, CLARK allowed investigators to incorporate variables into their phenotype algorithm that would not be available as structured data. Moreover, the fact that nonexpert users can get started with machine learning–based NLP with limited informatics involvement is a significant improvement over the status quo. We hope to disseminate CLARK to other organizations that may not have NLP or machine learning specialists available, enabling wider use of these methods. JMIR Publications 2020-01-24 /pmc/articles/PMC7007592/ /pubmed/32012059 http://dx.doi.org/10.2196/16042 Text en ©Emily R Pfaff, Miles Crosskey, Kenneth Morton, Ashok Krishnamurthy. Originally published in JMIR Medical Informatics (http://medinform.jmir.org), 24.01.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on http://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Viewpoint Pfaff, Emily R Crosskey, Miles Morton, Kenneth Krishnamurthy, Ashok Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
title	Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
title_full	Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
title_fullStr	Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
title_full_unstemmed	Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
title_short	Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning
title_sort	clinical annotation research kit (clark): computable phenotyping using machine learning
topic	Viewpoint
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7007592/ https://www.ncbi.nlm.nih.gov/pubmed/32012059 http://dx.doi.org/10.2196/16042
work_keys_str_mv	AT pfaffemilyr clinicalannotationresearchkitclarkcomputablephenotypingusingmachinelearning AT crosskeymiles clinicalannotationresearchkitclarkcomputablephenotypingusingmachinelearning AT mortonkenneth clinicalannotationresearchkitclarkcomputablephenotypingusingmachinelearning AT krishnamurthyashok clinicalannotationresearchkitclarkcomputablephenotypingusingmachinelearning

Clinical Annotation Research Kit (CLARK): Computable Phenotyping Using Machine Learning

Ejemplares similares