Cargando…

Assessment of chemistry knowledge in large language models that generate code

In this work, we investigate the question: do code-generating large language models know chemistry? Our results indicate, mostly yes. To evaluate this, we introduce an expandable framework for evaluating chemistry knowledge in these models, through prompting models to solve chemistry problems posed...

Descripción completa

Detalles Bibliográficos
Autores principales: White, Andrew D., Hocky, Glen M., Gandhi, Heta A., Ansari, Mehrad, Cox, Sam, Wellawatte, Geemi P., Sasmal, Subarna, Yang, Ziyue, Liu, Kangxin, Singh, Yuvraj, Peña Ccoa, Willmor J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: RSC 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10087057/
https://www.ncbi.nlm.nih.gov/pubmed/37065678
http://dx.doi.org/10.1039/d2dd00087c
_version_ 1785022263934844928
author White, Andrew D.
Hocky, Glen M.
Gandhi, Heta A.
Ansari, Mehrad
Cox, Sam
Wellawatte, Geemi P.
Sasmal, Subarna
Yang, Ziyue
Liu, Kangxin
Singh, Yuvraj
Peña Ccoa, Willmor J.
author_facet White, Andrew D.
Hocky, Glen M.
Gandhi, Heta A.
Ansari, Mehrad
Cox, Sam
Wellawatte, Geemi P.
Sasmal, Subarna
Yang, Ziyue
Liu, Kangxin
Singh, Yuvraj
Peña Ccoa, Willmor J.
author_sort White, Andrew D.
collection PubMed
description In this work, we investigate the question: do code-generating large language models know chemistry? Our results indicate, mostly yes. To evaluate this, we introduce an expandable framework for evaluating chemistry knowledge in these models, through prompting models to solve chemistry problems posed as coding tasks. To do so, we produce a benchmark set of problems, and evaluate these models based on correctness of code by automated testing and evaluation by experts. We find that recent LLMs are able to write correct code across a variety of topics in chemistry and their accuracy can be increased by 30 percentage points via prompt engineering strategies, like putting copyright notices at the top of files. Our dataset and evaluation tools are open source which can be contributed to or built upon by future researchers, and will serve as a community resource for evaluating the performance of new models as they emerge. We also describe some good practices for employing LLMs in chemistry. The general success of these models demonstrates that their impact on chemistry teaching and research is poised to be enormous.
format Online
Article
Text
id pubmed-10087057
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher RSC
record_format MEDLINE/PubMed
spelling pubmed-100870572023-04-12 Assessment of chemistry knowledge in large language models that generate code White, Andrew D. Hocky, Glen M. Gandhi, Heta A. Ansari, Mehrad Cox, Sam Wellawatte, Geemi P. Sasmal, Subarna Yang, Ziyue Liu, Kangxin Singh, Yuvraj Peña Ccoa, Willmor J. Digit Discov Chemistry In this work, we investigate the question: do code-generating large language models know chemistry? Our results indicate, mostly yes. To evaluate this, we introduce an expandable framework for evaluating chemistry knowledge in these models, through prompting models to solve chemistry problems posed as coding tasks. To do so, we produce a benchmark set of problems, and evaluate these models based on correctness of code by automated testing and evaluation by experts. We find that recent LLMs are able to write correct code across a variety of topics in chemistry and their accuracy can be increased by 30 percentage points via prompt engineering strategies, like putting copyright notices at the top of files. Our dataset and evaluation tools are open source which can be contributed to or built upon by future researchers, and will serve as a community resource for evaluating the performance of new models as they emerge. We also describe some good practices for employing LLMs in chemistry. The general success of these models demonstrates that their impact on chemistry teaching and research is poised to be enormous. RSC 2023-01-26 /pmc/articles/PMC10087057/ /pubmed/37065678 http://dx.doi.org/10.1039/d2dd00087c Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/
spellingShingle Chemistry
White, Andrew D.
Hocky, Glen M.
Gandhi, Heta A.
Ansari, Mehrad
Cox, Sam
Wellawatte, Geemi P.
Sasmal, Subarna
Yang, Ziyue
Liu, Kangxin
Singh, Yuvraj
Peña Ccoa, Willmor J.
Assessment of chemistry knowledge in large language models that generate code
title Assessment of chemistry knowledge in large language models that generate code
title_full Assessment of chemistry knowledge in large language models that generate code
title_fullStr Assessment of chemistry knowledge in large language models that generate code
title_full_unstemmed Assessment of chemistry knowledge in large language models that generate code
title_short Assessment of chemistry knowledge in large language models that generate code
title_sort assessment of chemistry knowledge in large language models that generate code
topic Chemistry
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10087057/
https://www.ncbi.nlm.nih.gov/pubmed/37065678
http://dx.doi.org/10.1039/d2dd00087c
work_keys_str_mv AT whiteandrewd assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode
AT hockyglenm assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode
AT gandhihetaa assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode
AT ansarimehrad assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode
AT coxsam assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode
AT wellawattegeemip assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode
AT sasmalsubarna assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode
AT yangziyue assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode
AT liukangxin assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode
AT singhyuvraj assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode
AT penaccoawillmorj assessmentofchemistryknowledgeinlargelanguagemodelsthatgeneratecode