Cargando…

Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach

BACKGROUND: Family health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for geneti...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Jianlin, Morgan, Keaton L, Bradshaw, Richard L, Jung, Se-Hee, Kohlmann, Wendy, Kaphingst, Kimberly A, Kawamoto, Kensaku, Fiol, Guilherme Del
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9412758/
https://www.ncbi.nlm.nih.gov/pubmed/35969459
http://dx.doi.org/10.2196/37842
_version_ 1784775573157969920
author Shi, Jianlin
Morgan, Keaton L
Bradshaw, Richard L
Jung, Se-Hee
Kohlmann, Wendy
Kaphingst, Kimberly A
Kawamoto, Kensaku
Fiol, Guilherme Del
author_facet Shi, Jianlin
Morgan, Keaton L
Bradshaw, Richard L
Jung, Se-Hee
Kohlmann, Wendy
Kaphingst, Kimberly A
Kawamoto, Kensaku
Fiol, Guilherme Del
author_sort Shi, Jianlin
collection PubMed
description BACKGROUND: Family health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is labor intensive. OBJECTIVE: The aim of this study was to develop a natural language processing (NLP) pipeline and assess its contribution to identifying patients who meet genetic testing criteria for hereditary cancers based on family health history data in the electronic health record (EHR). We compared an algorithm that uses structured data alone with structured data augmented using NLP. METHODS: Algorithms were developed based on the National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast, ovarian, pancreatic, and colorectal cancers. The NLP-augmented algorithm uses both structured family health history data and the associated unstructured free-text comments. The algorithms were compared with a reference standard of 100 patients with a family health history in the EHR. RESULTS: Regarding identifying the reference standard patients meeting the NCCN criteria, the NLP-augmented algorithm compared with the structured data algorithm yielded a significantly higher recall of 0.95 (95% CI 0.9-0.99) versus 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) versus 0.81 (95% CI 0.65-0.95). On the whole data set, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients meeting the NCCN criteria. CONCLUSIONS: Compared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR increased the number of patients identified as meeting the NCCN criteria for genetic testing for hereditary breast or ovarian and colorectal cancers.
format Online
Article
Text
id pubmed-9412758
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-94127582022-08-27 Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach Shi, Jianlin Morgan, Keaton L Bradshaw, Richard L Jung, Se-Hee Kohlmann, Wendy Kaphingst, Kimberly A Kawamoto, Kensaku Fiol, Guilherme Del JMIR Med Inform Original Paper BACKGROUND: Family health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is labor intensive. OBJECTIVE: The aim of this study was to develop a natural language processing (NLP) pipeline and assess its contribution to identifying patients who meet genetic testing criteria for hereditary cancers based on family health history data in the electronic health record (EHR). We compared an algorithm that uses structured data alone with structured data augmented using NLP. METHODS: Algorithms were developed based on the National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast, ovarian, pancreatic, and colorectal cancers. The NLP-augmented algorithm uses both structured family health history data and the associated unstructured free-text comments. The algorithms were compared with a reference standard of 100 patients with a family health history in the EHR. RESULTS: Regarding identifying the reference standard patients meeting the NCCN criteria, the NLP-augmented algorithm compared with the structured data algorithm yielded a significantly higher recall of 0.95 (95% CI 0.9-0.99) versus 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) versus 0.81 (95% CI 0.65-0.95). On the whole data set, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients meeting the NCCN criteria. CONCLUSIONS: Compared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR increased the number of patients identified as meeting the NCCN criteria for genetic testing for hereditary breast or ovarian and colorectal cancers. JMIR Publications 2022-08-11 /pmc/articles/PMC9412758/ /pubmed/35969459 http://dx.doi.org/10.2196/37842 Text en ©Jianlin Shi, Keaton L Morgan, Richard L Bradshaw, Se-Hee Jung, Wendy Kohlmann, Kimberly A Kaphingst, Kensaku Kawamoto, Guilherme Del Fiol. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 11.08.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Original Paper
Shi, Jianlin
Morgan, Keaton L
Bradshaw, Richard L
Jung, Se-Hee
Kohlmann, Wendy
Kaphingst, Kimberly A
Kawamoto, Kensaku
Fiol, Guilherme Del
Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
title Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
title_full Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
title_fullStr Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
title_full_unstemmed Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
title_short Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
title_sort identifying patients who meet criteria for genetic testing of hereditary cancers based on structured and unstructured family health history data in the electronic health record: natural language processing approach
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9412758/
https://www.ncbi.nlm.nih.gov/pubmed/35969459
http://dx.doi.org/10.2196/37842
work_keys_str_mv AT shijianlin identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT morgankeatonl identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT bradshawrichardl identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT jungsehee identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT kohlmannwendy identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT kaphingstkimberlya identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT kawamotokensaku identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach
AT fiolguilhermedel identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach