Cargando…
Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach
BACKGROUND: Family health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for geneti...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
JMIR Publications
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9412758/ https://www.ncbi.nlm.nih.gov/pubmed/35969459 http://dx.doi.org/10.2196/37842 |
_version_ | 1784775573157969920 |
---|---|
author | Shi, Jianlin Morgan, Keaton L Bradshaw, Richard L Jung, Se-Hee Kohlmann, Wendy Kaphingst, Kimberly A Kawamoto, Kensaku Fiol, Guilherme Del |
author_facet | Shi, Jianlin Morgan, Keaton L Bradshaw, Richard L Jung, Se-Hee Kohlmann, Wendy Kaphingst, Kimberly A Kawamoto, Kensaku Fiol, Guilherme Del |
author_sort | Shi, Jianlin |
collection | PubMed |
description | BACKGROUND: Family health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is labor intensive. OBJECTIVE: The aim of this study was to develop a natural language processing (NLP) pipeline and assess its contribution to identifying patients who meet genetic testing criteria for hereditary cancers based on family health history data in the electronic health record (EHR). We compared an algorithm that uses structured data alone with structured data augmented using NLP. METHODS: Algorithms were developed based on the National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast, ovarian, pancreatic, and colorectal cancers. The NLP-augmented algorithm uses both structured family health history data and the associated unstructured free-text comments. The algorithms were compared with a reference standard of 100 patients with a family health history in the EHR. RESULTS: Regarding identifying the reference standard patients meeting the NCCN criteria, the NLP-augmented algorithm compared with the structured data algorithm yielded a significantly higher recall of 0.95 (95% CI 0.9-0.99) versus 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) versus 0.81 (95% CI 0.65-0.95). On the whole data set, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients meeting the NCCN criteria. CONCLUSIONS: Compared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR increased the number of patients identified as meeting the NCCN criteria for genetic testing for hereditary breast or ovarian and colorectal cancers. |
format | Online Article Text |
id | pubmed-9412758 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | JMIR Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-94127582022-08-27 Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach Shi, Jianlin Morgan, Keaton L Bradshaw, Richard L Jung, Se-Hee Kohlmann, Wendy Kaphingst, Kimberly A Kawamoto, Kensaku Fiol, Guilherme Del JMIR Med Inform Original Paper BACKGROUND: Family health history has been recognized as an essential factor for cancer risk assessment and is an integral part of many cancer screening guidelines, including genetic testing for personalized clinical management strategies. However, manually identifying eligible candidates for genetic testing is labor intensive. OBJECTIVE: The aim of this study was to develop a natural language processing (NLP) pipeline and assess its contribution to identifying patients who meet genetic testing criteria for hereditary cancers based on family health history data in the electronic health record (EHR). We compared an algorithm that uses structured data alone with structured data augmented using NLP. METHODS: Algorithms were developed based on the National Comprehensive Cancer Network (NCCN) guidelines for genetic testing for hereditary breast, ovarian, pancreatic, and colorectal cancers. The NLP-augmented algorithm uses both structured family health history data and the associated unstructured free-text comments. The algorithms were compared with a reference standard of 100 patients with a family health history in the EHR. RESULTS: Regarding identifying the reference standard patients meeting the NCCN criteria, the NLP-augmented algorithm compared with the structured data algorithm yielded a significantly higher recall of 0.95 (95% CI 0.9-0.99) versus 0.29 (95% CI 0.19-0.40) and a precision of 0.99 (95% CI 0.96-1.00) versus 0.81 (95% CI 0.65-0.95). On the whole data set, the NLP-augmented algorithm extracted 33.6% more entities, resulting in 53.8% more patients meeting the NCCN criteria. CONCLUSIONS: Compared with the structured data algorithm, the NLP-augmented algorithm based on both structured and unstructured family health history data in the EHR increased the number of patients identified as meeting the NCCN criteria for genetic testing for hereditary breast or ovarian and colorectal cancers. JMIR Publications 2022-08-11 /pmc/articles/PMC9412758/ /pubmed/35969459 http://dx.doi.org/10.2196/37842 Text en ©Jianlin Shi, Keaton L Morgan, Richard L Bradshaw, Se-Hee Jung, Wendy Kohlmann, Kimberly A Kaphingst, Kensaku Kawamoto, Guilherme Del Fiol. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 11.08.2022. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included. |
spellingShingle | Original Paper Shi, Jianlin Morgan, Keaton L Bradshaw, Richard L Jung, Se-Hee Kohlmann, Wendy Kaphingst, Kimberly A Kawamoto, Kensaku Fiol, Guilherme Del Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach |
title | Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach |
title_full | Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach |
title_fullStr | Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach |
title_full_unstemmed | Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach |
title_short | Identifying Patients Who Meet Criteria for Genetic Testing of Hereditary Cancers Based on Structured and Unstructured Family Health History Data in the Electronic Health Record: Natural Language Processing Approach |
title_sort | identifying patients who meet criteria for genetic testing of hereditary cancers based on structured and unstructured family health history data in the electronic health record: natural language processing approach |
topic | Original Paper |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9412758/ https://www.ncbi.nlm.nih.gov/pubmed/35969459 http://dx.doi.org/10.2196/37842 |
work_keys_str_mv | AT shijianlin identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT morgankeatonl identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT bradshawrichardl identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT jungsehee identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT kohlmannwendy identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT kaphingstkimberlya identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT kawamotokensaku identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach AT fiolguilhermedel identifyingpatientswhomeetcriteriaforgenetictestingofhereditarycancersbasedonstructuredandunstructuredfamilyhealthhistorydataintheelectronichealthrecordnaturallanguageprocessingapproach |