Cargando…
Assessment of chemistry knowledge in large language models that generate code
In this work, we investigate the question: do code-generating large language models know chemistry? Our results indicate, mostly yes. To evaluate this, we introduce an expandable framework for evaluating chemistry knowledge in these models, through prompting models to solve chemistry problems posed...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
RSC
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10087057/ https://www.ncbi.nlm.nih.gov/pubmed/37065678 http://dx.doi.org/10.1039/d2dd00087c |
_version_ | 1785022263934844928 |
---|---|
author | White, Andrew D. Hocky, Glen M. Gandhi, Heta A. Ansari, Mehrad Cox, Sam Wellawatte, Geemi P. Sasmal, Subarna Yang, Ziyue Liu, Kangxin Singh, Yuvraj Peña Ccoa, Willmor J. |
author_facet | White, Andrew D. Hocky, Glen M. Gandhi, Heta A. Ansari, Mehrad Cox, Sam Wellawatte, Geemi P. Sasmal, Subarna Yang, Ziyue Liu, Kangxin Singh, Yuvraj Peña Ccoa, Willmor J. |
author_sort | White, Andrew D. |
collection | PubMed |
description | In this work, we investigate the question: do code-generating large language models know chemistry? Our results indicate, mostly yes. To evaluate this, we introduce an expandable framework for evaluating chemistry knowledge in these models, through prompting models to solve chemistry problems posed as coding tasks. To do so, we produce a benchmark set of problems, and evaluate these models based on correctness of code by automated testing and evaluation by experts. We find that recent LLMs are able to write correct code across a variety of topics in chemistry and their accuracy can be increased by 30 percentage points via prompt engineering strategies, like putting copyright notices at the top of files. Our dataset and evaluation tools are open source which can be contributed to or built upon by future researchers, and will serve as a community resource for evaluating the performance of new models as they emerge. We also describe some good practices for employing LLMs in chemistry. The general success of these models demonstrates that their impact on chemistry teaching and research is poised to be enormous. |
format | Online Article Text |
id | pubmed-10087057 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | RSC |
record_format | MEDLINE/PubMed |
spelling | pubmed-100870572023-04-12 Assessment of chemistry knowledge in large language models that generate code White, Andrew D. Hocky, Glen M. Gandhi, Heta A. Ansari, Mehrad Cox, Sam Wellawatte, Geemi P. Sasmal, Subarna Yang, Ziyue Liu, Kangxin Singh, Yuvraj Peña Ccoa, Willmor J. Digit Discov Chemistry In this work, we investigate the question: do code-generating large language models know chemistry? Our results indicate, mostly yes. To evaluate this, we introduce an expandable framework for evaluating chemistry knowledge in these models, through prompting models to solve chemistry problems posed as coding tasks. To do so, we produce a benchmark set of problems, and evaluate these models based on correctness of code by automated testing and evaluation by experts. We find that recent LLMs are able to write correct code across a variety of topics in chemistry and their accuracy can be increased by 30 percentage points via prompt engineering strategies, like putting copyright notices at the top of files. Our dataset and evaluation tools are open source which can be contributed to or built upon by future researchers, and will serve as a community resource for evaluating the performance of new models as they emerge. We also describe some good practices for employing LLMs in chemistry. The general success of these models demonstrates that their impact on chemistry teaching and research is poised to be enormous. RSC 2023-01-26 /pmc/articles/PMC10087057/ /pubmed/37065678 http://dx.doi.org/10.1039/d2dd00087c Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/ |
spellingShingle | Chemistry White, Andrew D. Hocky, Glen M. Gandhi, Heta A. Ansari, Mehrad Cox, Sam Wellawatte, Geemi P. Sasmal, Subarna Yang, Ziyue Liu, Kangxin Singh, Yuvraj Peña Ccoa, Willmor J. Assessment of chemistry knowledge in large language models that generate code |
title | Assessment of chemistry knowledge in large language models that generate code |
title_full | Assessment of chemistry knowledge in large language models that generate code |
title_fullStr | Assessment of chemistry knowledge in large language models that generate code |
title_full_unstemmed | Assessment of chemistry knowledge in large language models that generate code |
title_short | Assessment of chemistry knowledge in large language models that generate code |
title_sort | assessment of chemistry knowledge in large language models that generate code |
topic | Chemistry |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10087057/ https://www.ncbi.nlm.nih.gov/pubmed/37065678 http://dx.doi.org/10.1039/d2dd00087c |
work_keys_str_mv | AT whiteandrewd assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode AT hockyglenm assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode AT gandhihetaa assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode AT ansarimehrad assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode AT coxsam assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode AT wellawattegeemip assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode AT sasmalsubarna assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode AT yangziyue assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode AT liukangxin assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode AT singhyuvraj assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode AT penaccoawillmorj assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode |