Cargando…

Identifying domains of applicability of machine learning models for materials science

Although machine learning (ML) models promise to substantially accelerate the discovery of novel materials, their performance is often still insufficient to draw reliable conclusions. Improved ML models are therefore actively researched, but their design is currently guided mainly by monitoring the...

Descripción completa

Detalles Bibliográficos
Autores principales: Sutton, Christopher, Boley, Mario, Ghiringhelli, Luca M., Rupp, Matthias, Vreeken, Jilles, Scheffler, Matthias
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7474068/
https://www.ncbi.nlm.nih.gov/pubmed/32887879
http://dx.doi.org/10.1038/s41467-020-17112-9
_version_ 1783579274903552000
author Sutton, Christopher
Boley, Mario
Ghiringhelli, Luca M.
Rupp, Matthias
Vreeken, Jilles
Scheffler, Matthias
author_facet Sutton, Christopher
Boley, Mario
Ghiringhelli, Luca M.
Rupp, Matthias
Vreeken, Jilles
Scheffler, Matthias
author_sort Sutton, Christopher
collection PubMed
description Although machine learning (ML) models promise to substantially accelerate the discovery of novel materials, their performance is often still insufficient to draw reliable conclusions. Improved ML models are therefore actively researched, but their design is currently guided mainly by monitoring the average model test error. This can render different models indistinguishable although their performance differs substantially across materials, or it can make a model appear generally insufficient while it actually works well in specific sub-domains. Here, we present a method, based on subgroup discovery, for detecting domains of applicability (DA) of models within a materials class. The utility of this approach is demonstrated by analyzing three state-of-the-art ML models for predicting the formation energy of transparent conducting oxides. We find that, despite having a mutually indistinguishable and unsatisfactory average error, the models have DAs with distinctive features and notably improved performance.
format Online
Article
Text
id pubmed-7474068
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-74740682020-09-16 Identifying domains of applicability of machine learning models for materials science Sutton, Christopher Boley, Mario Ghiringhelli, Luca M. Rupp, Matthias Vreeken, Jilles Scheffler, Matthias Nat Commun Article Although machine learning (ML) models promise to substantially accelerate the discovery of novel materials, their performance is often still insufficient to draw reliable conclusions. Improved ML models are therefore actively researched, but their design is currently guided mainly by monitoring the average model test error. This can render different models indistinguishable although their performance differs substantially across materials, or it can make a model appear generally insufficient while it actually works well in specific sub-domains. Here, we present a method, based on subgroup discovery, for detecting domains of applicability (DA) of models within a materials class. The utility of this approach is demonstrated by analyzing three state-of-the-art ML models for predicting the formation energy of transparent conducting oxides. We find that, despite having a mutually indistinguishable and unsatisfactory average error, the models have DAs with distinctive features and notably improved performance. Nature Publishing Group UK 2020-09-04 /pmc/articles/PMC7474068/ /pubmed/32887879 http://dx.doi.org/10.1038/s41467-020-17112-9 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Sutton, Christopher
Boley, Mario
Ghiringhelli, Luca M.
Rupp, Matthias
Vreeken, Jilles
Scheffler, Matthias
Identifying domains of applicability of machine learning models for materials science
title Identifying domains of applicability of machine learning models for materials science
title_full Identifying domains of applicability of machine learning models for materials science
title_fullStr Identifying domains of applicability of machine learning models for materials science
title_full_unstemmed Identifying domains of applicability of machine learning models for materials science
title_short Identifying domains of applicability of machine learning models for materials science
title_sort identifying domains of applicability of machine learning models for materials science
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7474068/
https://www.ncbi.nlm.nih.gov/pubmed/32887879
http://dx.doi.org/10.1038/s41467-020-17112-9
work_keys_str_mv AT suttonchristopher identifyingdomainsofapplicabilityofmachinelearningmodelsformaterialsscience
AT boleymario identifyingdomainsofapplicabilityofmachinelearningmodelsformaterialsscience
AT ghiringhellilucam identifyingdomainsofapplicabilityofmachinelearningmodelsformaterialsscience
AT ruppmatthias identifyingdomainsofapplicabilityofmachinelearningmodelsformaterialsscience
AT vreekenjilles identifyingdomainsofapplicabilityofmachinelearningmodelsformaterialsscience
AT schefflermatthias identifyingdomainsofapplicabilityofmachinelearningmodelsformaterialsscience