Cargando…

Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm

OBJECTIVE: Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting ~50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (E...

Descripción completa

Detalles Bibliográficos
Autores principales: Joo, Yoonjung Yoonie, Pacheco, Jennifer A., Thompson, William K., Rasmussen-Torvik, Laura J., Rasmussen, Luke V., Lin, Frederick T. J., de Andrade, Mariza, Borthwick, Kenneth M., Bottinger, Erwin, Cagan, Andrew, Carrell, David S., Denny, Joshua C., Ellis, Stephen B., Gottesman, Omri, Linneman, James G., Pathak, Jyotishman, Peissig, Peggy L., Shang, Ning, Tromp, Gerard, Veerappan, Annapoorani, Smith, Maureen E., Chisholm, Rex L., Gawron, Andrew J., Hayes, M. Geoffrey, Kho, Abel N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10191288/
https://www.ncbi.nlm.nih.gov/pubmed/37196047
http://dx.doi.org/10.1371/journal.pone.0283553
_version_ 1785043431061454848
author Joo, Yoonjung Yoonie
Pacheco, Jennifer A.
Thompson, William K.
Rasmussen-Torvik, Laura J.
Rasmussen, Luke V.
Lin, Frederick T. J.
de Andrade, Mariza
Borthwick, Kenneth M.
Bottinger, Erwin
Cagan, Andrew
Carrell, David S.
Denny, Joshua C.
Ellis, Stephen B.
Gottesman, Omri
Linneman, James G.
Pathak, Jyotishman
Peissig, Peggy L.
Shang, Ning
Tromp, Gerard
Veerappan, Annapoorani
Smith, Maureen E.
Chisholm, Rex L.
Gawron, Andrew J.
Hayes, M. Geoffrey
Kho, Abel N.
author_facet Joo, Yoonjung Yoonie
Pacheco, Jennifer A.
Thompson, William K.
Rasmussen-Torvik, Laura J.
Rasmussen, Luke V.
Lin, Frederick T. J.
de Andrade, Mariza
Borthwick, Kenneth M.
Bottinger, Erwin
Cagan, Andrew
Carrell, David S.
Denny, Joshua C.
Ellis, Stephen B.
Gottesman, Omri
Linneman, James G.
Pathak, Jyotishman
Peissig, Peggy L.
Shang, Ning
Tromp, Gerard
Veerappan, Annapoorani
Smith, Maureen E.
Chisholm, Rex L.
Gawron, Andrew J.
Hayes, M. Geoffrey
Kho, Abel N.
author_sort Joo, Yoonjung Yoonie
collection PubMed
description OBJECTIVE: Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting ~50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (EHR) data sources of 91,166 multi-ancestry participants with a Natural Language Processing (NLP) technique. MATERIALS AND METHODS: We developed a NLP-enriched phenotyping algorithm that incorporated colonoscopy or abdominal imaging reports to identify patients with diverticulosis and diverticulitis from multicenter EHRs. We performed genome-wide association studies (GWAS) of DD in European, African and multi-ancestry participants, followed by phenome-wide association studies (PheWAS) of the risk variants to identify their potential comorbid/pleiotropic effects in clinical phenotypes. RESULTS: Our developed algorithm showed a significant improvement in patient classification performance for DD analysis (algorithm PPVs ≥ 0.94), with up to a 3.5 fold increase in terms of the number of identified patients than the traditional method. Ancestry-stratified analyses of diverticulosis and diverticulitis of the identified subjects replicated the well-established associations between ARHGAP15 loci with DD, showing overall intensified GWAS signals in diverticulitis patients compared to diverticulosis patients. Our PheWAS analyses identified significant associations between the DD GWAS variants and circulatory system, genitourinary, and neoplastic EHR phenotypes. DISCUSSION: As the first multi-ancestry GWAS-PheWAS study, we showcased that heterogenous EHR data can be mapped through an integrative analytical pipeline and reveal significant genotype-phenotype associations with clinical interpretation. CONCLUSION: A systematic framework to process unstructured EHR data with NLP could advance a deep and scalable phenotyping for better patient identification and facilitate etiological investigation of a disease with multilayered data.
format Online
Article
Text
id pubmed-10191288
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-101912882023-05-18 Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm Joo, Yoonjung Yoonie Pacheco, Jennifer A. Thompson, William K. Rasmussen-Torvik, Laura J. Rasmussen, Luke V. Lin, Frederick T. J. de Andrade, Mariza Borthwick, Kenneth M. Bottinger, Erwin Cagan, Andrew Carrell, David S. Denny, Joshua C. Ellis, Stephen B. Gottesman, Omri Linneman, James G. Pathak, Jyotishman Peissig, Peggy L. Shang, Ning Tromp, Gerard Veerappan, Annapoorani Smith, Maureen E. Chisholm, Rex L. Gawron, Andrew J. Hayes, M. Geoffrey Kho, Abel N. PLoS One Research Article OBJECTIVE: Diverticular disease (DD) is one of the most prevalent conditions encountered by gastroenterologists, affecting ~50% of Americans before the age of 60. Our aim was to identify genetic risk variants and clinical phenotypes associated with DD, leveraging multiple electronic health record (EHR) data sources of 91,166 multi-ancestry participants with a Natural Language Processing (NLP) technique. MATERIALS AND METHODS: We developed a NLP-enriched phenotyping algorithm that incorporated colonoscopy or abdominal imaging reports to identify patients with diverticulosis and diverticulitis from multicenter EHRs. We performed genome-wide association studies (GWAS) of DD in European, African and multi-ancestry participants, followed by phenome-wide association studies (PheWAS) of the risk variants to identify their potential comorbid/pleiotropic effects in clinical phenotypes. RESULTS: Our developed algorithm showed a significant improvement in patient classification performance for DD analysis (algorithm PPVs ≥ 0.94), with up to a 3.5 fold increase in terms of the number of identified patients than the traditional method. Ancestry-stratified analyses of diverticulosis and diverticulitis of the identified subjects replicated the well-established associations between ARHGAP15 loci with DD, showing overall intensified GWAS signals in diverticulitis patients compared to diverticulosis patients. Our PheWAS analyses identified significant associations between the DD GWAS variants and circulatory system, genitourinary, and neoplastic EHR phenotypes. DISCUSSION: As the first multi-ancestry GWAS-PheWAS study, we showcased that heterogenous EHR data can be mapped through an integrative analytical pipeline and reveal significant genotype-phenotype associations with clinical interpretation. CONCLUSION: A systematic framework to process unstructured EHR data with NLP could advance a deep and scalable phenotyping for better patient identification and facilitate etiological investigation of a disease with multilayered data. Public Library of Science 2023-05-17 /pmc/articles/PMC10191288/ /pubmed/37196047 http://dx.doi.org/10.1371/journal.pone.0283553 Text en © 2023 Joo et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Joo, Yoonjung Yoonie
Pacheco, Jennifer A.
Thompson, William K.
Rasmussen-Torvik, Laura J.
Rasmussen, Luke V.
Lin, Frederick T. J.
de Andrade, Mariza
Borthwick, Kenneth M.
Bottinger, Erwin
Cagan, Andrew
Carrell, David S.
Denny, Joshua C.
Ellis, Stephen B.
Gottesman, Omri
Linneman, James G.
Pathak, Jyotishman
Peissig, Peggy L.
Shang, Ning
Tromp, Gerard
Veerappan, Annapoorani
Smith, Maureen E.
Chisholm, Rex L.
Gawron, Andrew J.
Hayes, M. Geoffrey
Kho, Abel N.
Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm
title Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm
title_full Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm
title_fullStr Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm
title_full_unstemmed Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm
title_short Multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm
title_sort multi-ancestry genome- and phenome-wide association studies of diverticular disease in electronic health records with natural language processing enriched phenotyping algorithm
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10191288/
https://www.ncbi.nlm.nih.gov/pubmed/37196047
http://dx.doi.org/10.1371/journal.pone.0283553
work_keys_str_mv AT jooyoonjungyoonie multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT pachecojennifera multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT thompsonwilliamk multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT rasmussentorviklauraj multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT rasmussenlukev multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT linfredericktj multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT deandrademariza multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT borthwickkennethm multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT bottingererwin multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT caganandrew multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT carrelldavids multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT dennyjoshuac multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT ellisstephenb multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT gottesmanomri multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT linnemanjamesg multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT pathakjyotishman multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT peissigpeggyl multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT shangning multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT trompgerard multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT veerappanannapoorani multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT smithmaureene multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT chisholmrexl multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT gawronandrewj multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT hayesmgeoffrey multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm
AT khoabeln multiancestrygenomeandphenomewideassociationstudiesofdiverticulardiseaseinelectronichealthrecordswithnaturallanguageprocessingenrichedphenotypingalgorithm