Cargando…

Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT

To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models - PhenoBCBERT and PhenoGPT - for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the ful...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Jingye, Liu, Cong, Deng, Wendy, Wu, Da, Weng, Chunhua, Zhou, Yunyun, Wang, Kai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cornell University 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10659449/
https://www.ncbi.nlm.nih.gov/pubmed/37986722
_version_ 1785137579141627904
author Yang, Jingye
Liu, Cong
Deng, Wendy
Wu, Da
Weng, Chunhua
Zhou, Yunyun
Wang, Kai
author_facet Yang, Jingye
Liu, Cong
Deng, Wendy
Wu, Da
Weng, Chunhua
Zhou, Yunyun
Wang, Kai
author_sort Yang, Jingye
collection PubMed
description To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models - PhenoBCBERT and PhenoGPT - for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes, due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models (LLMs) to automate the detection of phenotype terms, including those not in the current HPO. We compared these models to PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also showed strong performance in case studies on biomedical literature. We evaluated the strengths and weaknesses of BERT-based and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases.
format Online
Article
Text
id pubmed-10659449
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cornell University
record_format MEDLINE/PubMed
spelling pubmed-106594492023-11-09 Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT Yang, Jingye Liu, Cong Deng, Wendy Wu, Da Weng, Chunhua Zhou, Yunyun Wang, Kai ArXiv Article To enhance phenotype recognition in clinical notes of genetic diseases, we developed two models - PhenoBCBERT and PhenoGPT - for expanding the vocabularies of Human Phenotype Ontology (HPO) terms. While HPO offers a standardized vocabulary for phenotypes, existing tools often fail to capture the full scope of phenotypes, due to limitations from traditional heuristic or rule-based approaches. Our models leverage large language models (LLMs) to automate the detection of phenotype terms, including those not in the current HPO. We compared these models to PhenoTagger, another HPO recognition tool, and found that our models identify a wider range of phenotype concepts, including previously uncharacterized ones. Our models also showed strong performance in case studies on biomedical literature. We evaluated the strengths and weaknesses of BERT-based and GPT-based models in aspects such as architecture and accuracy. Overall, our models enhance automated phenotype detection from clinical texts, improving downstream analyses on human diseases. Cornell University 2023-11-09 /pmc/articles/PMC10659449/ /pubmed/37986722 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Yang, Jingye
Liu, Cong
Deng, Wendy
Wu, Da
Weng, Chunhua
Zhou, Yunyun
Wang, Kai
Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT
title Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT
title_full Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT
title_fullStr Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT
title_full_unstemmed Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT
title_short Enhancing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT
title_sort enhancing phenotype recognition in clinical notes using large language models: phenobcbert and phenogpt
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10659449/
https://www.ncbi.nlm.nih.gov/pubmed/37986722
work_keys_str_mv AT yangjingye enhancingphenotyperecognitioninclinicalnotesusinglargelanguagemodelsphenobcbertandphenogpt
AT liucong enhancingphenotyperecognitioninclinicalnotesusinglargelanguagemodelsphenobcbertandphenogpt
AT dengwendy enhancingphenotyperecognitioninclinicalnotesusinglargelanguagemodelsphenobcbertandphenogpt
AT wuda enhancingphenotyperecognitioninclinicalnotesusinglargelanguagemodelsphenobcbertandphenogpt
AT wengchunhua enhancingphenotyperecognitioninclinicalnotesusinglargelanguagemodelsphenobcbertandphenogpt
AT zhouyunyun enhancingphenotyperecognitioninclinicalnotesusinglargelanguagemodelsphenobcbertandphenogpt
AT wangkai enhancingphenotyperecognitioninclinicalnotesusinglargelanguagemodelsphenobcbertandphenogpt