Cargando…
Evaluation of large language models for discovery of gene set function
Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI’s GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cornell University
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10508824/ https://www.ncbi.nlm.nih.gov/pubmed/37731657 |
_version_ | 1785107615672434688 |
---|---|
author | Hu, Mengzhou Alkhairy, Sahar Lee, Ingoo Pillich, Rudolf T. Bachelder, Robin Ideker, Trey Pratt, Dexter |
author_facet | Hu, Mengzhou Alkhairy, Sahar Lee, Ingoo Pillich, Rudolf T. Bachelder, Robin Ideker, Trey Pratt, Dexter |
author_sort | Hu, Mengzhou |
collection | PubMed |
description | Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI’s GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions from its embedded biomedical knowledge. We created a GPT-4 pipeline to label gene sets with names that summarize their consensus functions, substantiated by analysis text and citations. Benchmarking against named gene sets in the Gene Ontology, GPT-4 generated very similar names in 50% of cases, while in most remaining cases it recovered the name of a more general concept. In gene sets discovered in ‘omics data, GPT-4 names were more informative than gene set enrichment, with supporting statements and citations that largely verified in human review. The ability to rapidly synthesize common gene functions positions LLMs as valuable functional genomics assistants. |
format | Online Article Text |
id | pubmed-10508824 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cornell University |
record_format | MEDLINE/PubMed |
spelling | pubmed-105088242023-09-20 Evaluation of large language models for discovery of gene set function Hu, Mengzhou Alkhairy, Sahar Lee, Ingoo Pillich, Rudolf T. Bachelder, Robin Ideker, Trey Pratt, Dexter ArXiv Article Gene set analysis is a mainstay of functional genomics, but it relies on manually curated databases of gene functions that are incomplete and unaware of biological context. Here we evaluate the ability of OpenAI’s GPT-4, a Large Language Model (LLM), to develop hypotheses about common gene functions from its embedded biomedical knowledge. We created a GPT-4 pipeline to label gene sets with names that summarize their consensus functions, substantiated by analysis text and citations. Benchmarking against named gene sets in the Gene Ontology, GPT-4 generated very similar names in 50% of cases, while in most remaining cases it recovered the name of a more general concept. In gene sets discovered in ‘omics data, GPT-4 names were more informative than gene set enrichment, with supporting statements and citations that largely verified in human review. The ability to rapidly synthesize common gene functions positions LLMs as valuable functional genomics assistants. Cornell University 2023-09-07 /pmc/articles/PMC10508824/ /pubmed/37731657 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Hu, Mengzhou Alkhairy, Sahar Lee, Ingoo Pillich, Rudolf T. Bachelder, Robin Ideker, Trey Pratt, Dexter Evaluation of large language models for discovery of gene set function |
title | Evaluation of large language models for discovery of gene set function |
title_full | Evaluation of large language models for discovery of gene set function |
title_fullStr | Evaluation of large language models for discovery of gene set function |
title_full_unstemmed | Evaluation of large language models for discovery of gene set function |
title_short | Evaluation of large language models for discovery of gene set function |
title_sort | evaluation of large language models for discovery of gene set function |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10508824/ https://www.ncbi.nlm.nih.gov/pubmed/37731657 |
work_keys_str_mv | AT humengzhou evaluationoflargelanguagemodelsfordiscoveryofgenesetfunction AT alkhairysahar evaluationoflargelanguagemodelsfordiscoveryofgenesetfunction AT leeingoo evaluationoflargelanguagemodelsfordiscoveryofgenesetfunction AT pillichrudolft evaluationoflargelanguagemodelsfordiscoveryofgenesetfunction AT bachelderrobin evaluationoflargelanguagemodelsfordiscoveryofgenesetfunction AT idekertrey evaluationoflargelanguagemodelsfordiscoveryofgenesetfunction AT prattdexter evaluationoflargelanguagemodelsfordiscoveryofgenesetfunction |