Cargando…

Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors

We try to determine if machine learning (ML) methods, applied to the discovery of new materials on the basis of existing data sets, have the power to predict completely new classes of compounds (extrapolating) or perform well only when interpolating between known materials. We introduce the leave-on...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Zhi-Wen, del Cueto, Marcos, Troisi, Alessandro
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	RSC 2022
Materias:	Chemistry
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9189862/ https://www.ncbi.nlm.nih.gov/pubmed/35769202 http://dx.doi.org/10.1039/d2dd00004k

_version_	1784725680651501568
author	Zhao, Zhi-Wen del Cueto, Marcos Troisi, Alessandro
author_facet	Zhao, Zhi-Wen del Cueto, Marcos Troisi, Alessandro
author_sort	Zhao, Zhi-Wen
collection	PubMed
description	We try to determine if machine learning (ML) methods, applied to the discovery of new materials on the basis of existing data sets, have the power to predict completely new classes of compounds (extrapolating) or perform well only when interpolating between known materials. We introduce the leave-one-group-out cross-validation, in which the ML model is trained to explicitly perform extrapolations of unseen chemical families. This approach can be used across materials science and chemistry problems to improve the added value of ML predictions, instead of using extrapolative ML models that were trained with a regular cross-validation. We consider as a case study the problem of the discovery of non-fullerene acceptors because novel classes of acceptors are naturally classified into distinct chemical families. We show that conventional ML methods are not useful in practice when attempting to predict the efficiency of a completely novel class of materials. The approach proposed in this work increases the accuracy of the predictions to enable at least the categorization of materials with a performance above and below the median value.
format	Online Article Text
id	pubmed-9189862
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	RSC
record_format	MEDLINE/PubMed
spelling	pubmed-91898622022-06-27 Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors Zhao, Zhi-Wen del Cueto, Marcos Troisi, Alessandro Digit Discov Chemistry We try to determine if machine learning (ML) methods, applied to the discovery of new materials on the basis of existing data sets, have the power to predict completely new classes of compounds (extrapolating) or perform well only when interpolating between known materials. We introduce the leave-one-group-out cross-validation, in which the ML model is trained to explicitly perform extrapolations of unseen chemical families. This approach can be used across materials science and chemistry problems to improve the added value of ML predictions, instead of using extrapolative ML models that were trained with a regular cross-validation. We consider as a case study the problem of the discovery of non-fullerene acceptors because novel classes of acceptors are naturally classified into distinct chemical families. We show that conventional ML methods are not useful in practice when attempting to predict the efficiency of a completely novel class of materials. The approach proposed in this work increases the accuracy of the predictions to enable at least the categorization of materials with a performance above and below the median value. RSC 2022-03-25 /pmc/articles/PMC9189862/ /pubmed/35769202 http://dx.doi.org/10.1039/d2dd00004k Text en This journal is © The Royal Society of Chemistry https://creativecommons.org/licenses/by-nc/3.0/
spellingShingle	Chemistry Zhao, Zhi-Wen del Cueto, Marcos Troisi, Alessandro Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors
title	Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors
title_full	Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors
title_fullStr	Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors
title_full_unstemmed	Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors
title_short	Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors
title_sort	limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors
topic	Chemistry
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9189862/ https://www.ncbi.nlm.nih.gov/pubmed/35769202 http://dx.doi.org/10.1039/d2dd00004k
work_keys_str_mv	AT zhaozhiwen limitationsofmachinelearningmodelswhenpredictingcompoundswithcompletelynewchemistriespossibleimprovementsappliedtothediscoveryofnewnonfullereneacceptors AT delcuetomarcos limitationsofmachinelearningmodelswhenpredictingcompoundswithcompletelynewchemistriespossibleimprovementsappliedtothediscoveryofnewnonfullereneacceptors AT troisialessandro limitationsofmachinelearningmodelswhenpredictingcompoundswithcompletelynewchemistriespossibleimprovementsappliedtothediscoveryofnewnonfullereneacceptors

Limitations of machine learning models when predicting compounds with completely new chemistries: possible improvements applied to the discovery of new non-fullerene acceptors

Ejemplares similares