Cargando…

Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application

BACKGROUND: Electronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and rev...

Descripción completa

Detalles Bibliográficos
Autores principales:	Leroy, Gondy, Gu, Yang, Pettygrove, Sydney, Galindo, Maureen K, Arora, Ananyaa, Kurzius-Spencer, Margaret
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2018
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6249505/ https://www.ncbi.nlm.nih.gov/pubmed/30404767 http://dx.doi.org/10.2196/10497

_version_	1783372765517053952
author	Leroy, Gondy Gu, Yang Pettygrove, Sydney Galindo, Maureen K Arora, Ananyaa Kurzius-Spencer, Margaret
author_facet	Leroy, Gondy Gu, Yang Pettygrove, Sydney Galindo, Maureen K Arora, Ananyaa Kurzius-Spencer, Margaret
author_sort	Leroy, Gondy
collection	PubMed
description	BACKGROUND: Electronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and review of EHRs of 4- and 8-year old children in 11 US states for the presence of ASD criteria. The work is time-consuming and expensive. OBJECTIVE: Our objective was to automatically extract from EHRs the description of behaviors noted by the clinicians in evidence of the diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders (DSM). Previously, we reported on the classification of entire EHRs as ASD or not. In this work, we focus on the extraction of individual expressions of the different ASD criteria in the text. We intend to facilitate large-scale surveillance efforts for ASD and support analysis of changes over time as well as enable integration with other relevant data. METHODS: We developed a natural language processing (NLP) parser to extract expressions of 12 DSM criteria using 104 patterns and 92 lexicons (1787 terms). The parser is rule-based to enable precise extraction of the entities from the text. The entities themselves are encompassed in the EHRs as very diverse expressions of the diagnostic criteria written by different people at different times (clinicians, speech pathologists, among others). Due to the sparsity of the data, a rule-based approach is best suited until larger datasets can be generated for machine learning algorithms. RESULTS: We evaluated our rule-based parser and compared it with a machine learning baseline (decision tree). Using a test set of 6636 sentences (50 EHRs), we found that our parser achieved 76% precision, 43% recall (ie, sensitivity), and >99% specificity for criterion extraction. The performance was better for the rule-based approach than for the machine learning baseline (60% precision and 30% recall). For some individual criteria, precision was as high as 97% and recall 57%. Since precision was very high, we were assured that criteria were rarely assigned incorrectly, and our numbers presented a lower bound of their presence in EHRs. We then conducted a case study and parsed 4480 new EHRs covering 10 years of surveillance records from the Arizona Developmental Disabilities Surveillance Program. The social criteria (A1 criteria) showed the biggest change over the years. The communication criteria (A2 criteria) did not distinguish the ASD from the non-ASD records. Among behaviors and interests criteria (A3 criteria), 1 (A3b) was present with much greater frequency in the ASD than in the non-ASD EHRs. CONCLUSIONS: Our results demonstrate that NLP can support large-scale analysis useful for ASD surveillance and research. In the future, we intend to facilitate detailed analysis and integration of national datasets.
format	Online Article Text
id	pubmed-6249505
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-62495052018-12-13 Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application Leroy, Gondy Gu, Yang Pettygrove, Sydney Galindo, Maureen K Arora, Ananyaa Kurzius-Spencer, Margaret J Med Internet Res Original Paper BACKGROUND: Electronic health records (EHRs) bring many opportunities for information utilization. One such use is the surveillance conducted by the Centers for Disease Control and Prevention to track cases of autism spectrum disorder (ASD). This process currently comprises manual collection and review of EHRs of 4- and 8-year old children in 11 US states for the presence of ASD criteria. The work is time-consuming and expensive. OBJECTIVE: Our objective was to automatically extract from EHRs the description of behaviors noted by the clinicians in evidence of the diagnostic criteria in the Diagnostic and Statistical Manual of Mental Disorders (DSM). Previously, we reported on the classification of entire EHRs as ASD or not. In this work, we focus on the extraction of individual expressions of the different ASD criteria in the text. We intend to facilitate large-scale surveillance efforts for ASD and support analysis of changes over time as well as enable integration with other relevant data. METHODS: We developed a natural language processing (NLP) parser to extract expressions of 12 DSM criteria using 104 patterns and 92 lexicons (1787 terms). The parser is rule-based to enable precise extraction of the entities from the text. The entities themselves are encompassed in the EHRs as very diverse expressions of the diagnostic criteria written by different people at different times (clinicians, speech pathologists, among others). Due to the sparsity of the data, a rule-based approach is best suited until larger datasets can be generated for machine learning algorithms. RESULTS: We evaluated our rule-based parser and compared it with a machine learning baseline (decision tree). Using a test set of 6636 sentences (50 EHRs), we found that our parser achieved 76% precision, 43% recall (ie, sensitivity), and >99% specificity for criterion extraction. The performance was better for the rule-based approach than for the machine learning baseline (60% precision and 30% recall). For some individual criteria, precision was as high as 97% and recall 57%. Since precision was very high, we were assured that criteria were rarely assigned incorrectly, and our numbers presented a lower bound of their presence in EHRs. We then conducted a case study and parsed 4480 new EHRs covering 10 years of surveillance records from the Arizona Developmental Disabilities Surveillance Program. The social criteria (A1 criteria) showed the biggest change over the years. The communication criteria (A2 criteria) did not distinguish the ASD from the non-ASD records. Among behaviors and interests criteria (A3 criteria), 1 (A3b) was present with much greater frequency in the ASD than in the non-ASD EHRs. CONCLUSIONS: Our results demonstrate that NLP can support large-scale analysis useful for ASD surveillance and research. In the future, we intend to facilitate detailed analysis and integration of national datasets. JMIR Publications 2018-11-07 /pmc/articles/PMC6249505/ /pubmed/30404767 http://dx.doi.org/10.2196/10497 Text en ©Gondy Leroy, Yang Gu, Sydney Pettygrove, Maureen K Galindo, Ananyaa Arora, Margaret Kurzius-Spencer. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 07.11.2018. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on http://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Leroy, Gondy Gu, Yang Pettygrove, Sydney Galindo, Maureen K Arora, Ananyaa Kurzius-Spencer, Margaret Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application
title	Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application
title_full	Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application
title_fullStr	Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application
title_full_unstemmed	Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application
title_short	Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application
title_sort	automated extraction of diagnostic criteria from electronic health records for autism spectrum disorders: development, evaluation, and application
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6249505/ https://www.ncbi.nlm.nih.gov/pubmed/30404767 http://dx.doi.org/10.2196/10497
work_keys_str_mv	AT leroygondy automatedextractionofdiagnosticcriteriafromelectronichealthrecordsforautismspectrumdisordersdevelopmentevaluationandapplication AT guyang automatedextractionofdiagnosticcriteriafromelectronichealthrecordsforautismspectrumdisordersdevelopmentevaluationandapplication AT pettygrovesydney automatedextractionofdiagnosticcriteriafromelectronichealthrecordsforautismspectrumdisordersdevelopmentevaluationandapplication AT galindomaureenk automatedextractionofdiagnosticcriteriafromelectronichealthrecordsforautismspectrumdisordersdevelopmentevaluationandapplication AT aroraananyaa automatedextractionofdiagnosticcriteriafromelectronichealthrecordsforautismspectrumdisordersdevelopmentevaluationandapplication AT kurziusspencermargaret automatedextractionofdiagnosticcriteriafromelectronichealthrecordsforautismspectrumdisordersdevelopmentevaluationandapplication

Automated Extraction of Diagnostic Criteria From Electronic Health Records for Autism Spectrum Disorders: Development, Evaluation, and Application

Ejemplares similares