Cargando…

Data quantity governance for machine learning in materials science

Data-driven machine learning (ML) is widely employed in the analysis of materials structure–activity relationships, performance optimization and materials design due to its superior ability to reveal latent data patterns and make accurate prediction. However, because of the laborious process of mate...

Descripción completa

Detalles Bibliográficos
Autores principales:	Liu, Yue, Yang, Zhengwei, Zou, Xinxin, Ma, Shuchang, Liu, Dahui, Avdeev, Maxim, Shi, Siqi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2023
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10265966/ https://www.ncbi.nlm.nih.gov/pubmed/37323811 http://dx.doi.org/10.1093/nsr/nwad125

_version_	1785058643460227072
author	Liu, Yue Yang, Zhengwei Zou, Xinxin Ma, Shuchang Liu, Dahui Avdeev, Maxim Shi, Siqi
author_facet	Liu, Yue Yang, Zhengwei Zou, Xinxin Ma, Shuchang Liu, Dahui Avdeev, Maxim Shi, Siqi
author_sort	Liu, Yue
collection	PubMed
description	Data-driven machine learning (ML) is widely employed in the analysis of materials structure–activity relationships, performance optimization and materials design due to its superior ability to reveal latent data patterns and make accurate prediction. However, because of the laborious process of materials data acquisition, ML models encounter the issue of the mismatch between a high dimension of feature space and a small sample size (for traditional ML models) or the mismatch between model parameters and sample size (for deep-learning models), usually resulting in terrible performance. Here, we review the efforts for tackling this issue via feature reduction, sample augmentation and specific ML approaches, and show that the balance between the number of samples and features or model parameters should attract great attention during data quantity governance. Following this, we propose a synergistic data quantity governance flow with the incorporation of materials domain knowledge. After summarizing the approaches to incorporating materials domain knowledge into the process of ML, we provide examples of incorporating domain knowledge into governance schemes to demonstrate the advantages of the approach and applications. The work paves the way for obtaining the required high-quality data to accelerate materials design and discovery based on ML.
format	Online Article Text
id	pubmed-10265966
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-102659662023-06-15 Data quantity governance for machine learning in materials science Liu, Yue Yang, Zhengwei Zou, Xinxin Ma, Shuchang Liu, Dahui Avdeev, Maxim Shi, Siqi Natl Sci Rev Review Data-driven machine learning (ML) is widely employed in the analysis of materials structure–activity relationships, performance optimization and materials design due to its superior ability to reveal latent data patterns and make accurate prediction. However, because of the laborious process of materials data acquisition, ML models encounter the issue of the mismatch between a high dimension of feature space and a small sample size (for traditional ML models) or the mismatch between model parameters and sample size (for deep-learning models), usually resulting in terrible performance. Here, we review the efforts for tackling this issue via feature reduction, sample augmentation and specific ML approaches, and show that the balance between the number of samples and features or model parameters should attract great attention during data quantity governance. Following this, we propose a synergistic data quantity governance flow with the incorporation of materials domain knowledge. After summarizing the approaches to incorporating materials domain knowledge into the process of ML, we provide examples of incorporating domain knowledge into governance schemes to demonstrate the advantages of the approach and applications. The work paves the way for obtaining the required high-quality data to accelerate materials design and discovery based on ML. Oxford University Press 2023-05-01 /pmc/articles/PMC10265966/ /pubmed/37323811 http://dx.doi.org/10.1093/nsr/nwad125 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of China Science Publishing & Media Ltd. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Review Liu, Yue Yang, Zhengwei Zou, Xinxin Ma, Shuchang Liu, Dahui Avdeev, Maxim Shi, Siqi Data quantity governance for machine learning in materials science
title	Data quantity governance for machine learning in materials science
title_full	Data quantity governance for machine learning in materials science
title_fullStr	Data quantity governance for machine learning in materials science
title_full_unstemmed	Data quantity governance for machine learning in materials science
title_short	Data quantity governance for machine learning in materials science
title_sort	data quantity governance for machine learning in materials science
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10265966/ https://www.ncbi.nlm.nih.gov/pubmed/37323811 http://dx.doi.org/10.1093/nsr/nwad125
work_keys_str_mv	AT liuyue dataquantitygovernanceformachinelearninginmaterialsscience AT yangzhengwei dataquantitygovernanceformachinelearninginmaterialsscience AT zouxinxin dataquantitygovernanceformachinelearninginmaterialsscience AT mashuchang dataquantitygovernanceformachinelearninginmaterialsscience AT liudahui dataquantitygovernanceformachinelearninginmaterialsscience AT avdeevmaxim dataquantitygovernanceformachinelearninginmaterialsscience AT shisiqi dataquantitygovernanceformachinelearninginmaterialsscience

Data quantity governance for machine learning in materials science

Ejemplares similares