Cargando…

XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications

Innovative development in the energy and chemical industries is mainly dependent on advances in the accelerated design and development of new functional materials. The success of research in new nanocatalysts mainly relies on modern techniques and approaches for their precise characterization. The e...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kartashov, Oleg O., Chernov, Andrey V., Polyanichenko, Dmitry S., Butakova, Maria A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8709119/ https://www.ncbi.nlm.nih.gov/pubmed/34947477 http://dx.doi.org/10.3390/ma14247884

_version_	1784622856097759232
author	Kartashov, Oleg O. Chernov, Andrey V. Polyanichenko, Dmitry S. Butakova, Maria A.
author_facet	Kartashov, Oleg O. Chernov, Andrey V. Polyanichenko, Dmitry S. Butakova, Maria A.
author_sort	Kartashov, Oleg O.
collection	PubMed
description	Innovative development in the energy and chemical industries is mainly dependent on advances in the accelerated design and development of new functional materials. The success of research in new nanocatalysts mainly relies on modern techniques and approaches for their precise characterization. The existing methods of experimental characterization of nanocatalysts, which make it possible to assess the possibility of using these materials in specific chemical reactions or applications, generate significant amounts of heterogeneous data. The acceleration of new functional materials, including nanocatalysts, directly depends on the speed and quality of extracting hidden dependencies and knowledge from the obtained experimental data. Usually, such experiments involve different characterization techniques and different types of X-ray absorption spectroscopy (XAS) too. Using the machine learning (ML) methods based on XAS data, we can study and predict the atomic-scale structure and another bunch of parameters for the nanocatalyst efficiently. However, before using any ML model, it is necessary to make sure that the XAS raw experimental data is properly pre-processed, cleared, and prepared for ML application. Usually, the XAS preprocessing stage is vaguely presented in scientific studies, and the main efforts of researchers are devoted to the ML description and implementation stage. However, the quality of the input data influences the quality of ML analysis and the prediction results used in the future. This paper fills the gap between the stage of obtaining XAS data from synchrotron facilities and the stage of using and customizing various ML analysis and prediction models. We aimed this study to develop automated tools for the preprocessing and presentation of data from physical experiments and the creation of deposited datasets on the basis of the example of studying palladium-based nanocatalysts using synchrotron radiation facilities. During the study, methods of preliminary processing of XAS data were considered, which can be conditionally divided into X-ray absorption near edge structure (XANES) and extended X-ray absorption fine structure (EXAFS). This paper proposes a software toolkit that implements data preprocessing scenarios in the form of a single pipeline. The main preprocessing methods used in this study proposed are principal component analysis (PCA); z-score normalization; the interquartile method for eliminating outliers in the data; as well as the k-means machine learning method, which makes it possible to clarify the phase of the studied material sample by clustering feature vectors of experiments. Among the results of this study, one should also highlight the obtained deposited datasets of physical experiments on palladium-based nanocatalysts using synchrotron radiation. This will allow for further high-quality data mining to extract new knowledge about materials using artificial intelligence methods and machine learning models, and will ensure the smooth dissemination of these datasets to researchers and their reuse.
format	Online Article Text
id	pubmed-8709119
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-87091192021-12-25 XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications Kartashov, Oleg O. Chernov, Andrey V. Polyanichenko, Dmitry S. Butakova, Maria A. Materials (Basel) Article Innovative development in the energy and chemical industries is mainly dependent on advances in the accelerated design and development of new functional materials. The success of research in new nanocatalysts mainly relies on modern techniques and approaches for their precise characterization. The existing methods of experimental characterization of nanocatalysts, which make it possible to assess the possibility of using these materials in specific chemical reactions or applications, generate significant amounts of heterogeneous data. The acceleration of new functional materials, including nanocatalysts, directly depends on the speed and quality of extracting hidden dependencies and knowledge from the obtained experimental data. Usually, such experiments involve different characterization techniques and different types of X-ray absorption spectroscopy (XAS) too. Using the machine learning (ML) methods based on XAS data, we can study and predict the atomic-scale structure and another bunch of parameters for the nanocatalyst efficiently. However, before using any ML model, it is necessary to make sure that the XAS raw experimental data is properly pre-processed, cleared, and prepared for ML application. Usually, the XAS preprocessing stage is vaguely presented in scientific studies, and the main efforts of researchers are devoted to the ML description and implementation stage. However, the quality of the input data influences the quality of ML analysis and the prediction results used in the future. This paper fills the gap between the stage of obtaining XAS data from synchrotron facilities and the stage of using and customizing various ML analysis and prediction models. We aimed this study to develop automated tools for the preprocessing and presentation of data from physical experiments and the creation of deposited datasets on the basis of the example of studying palladium-based nanocatalysts using synchrotron radiation facilities. During the study, methods of preliminary processing of XAS data were considered, which can be conditionally divided into X-ray absorption near edge structure (XANES) and extended X-ray absorption fine structure (EXAFS). This paper proposes a software toolkit that implements data preprocessing scenarios in the form of a single pipeline. The main preprocessing methods used in this study proposed are principal component analysis (PCA); z-score normalization; the interquartile method for eliminating outliers in the data; as well as the k-means machine learning method, which makes it possible to clarify the phase of the studied material sample by clustering feature vectors of experiments. Among the results of this study, one should also highlight the obtained deposited datasets of physical experiments on palladium-based nanocatalysts using synchrotron radiation. This will allow for further high-quality data mining to extract new knowledge about materials using artificial intelligence methods and machine learning models, and will ensure the smooth dissemination of these datasets to researchers and their reuse. MDPI 2021-12-20 /pmc/articles/PMC8709119/ /pubmed/34947477 http://dx.doi.org/10.3390/ma14247884 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Kartashov, Oleg O. Chernov, Andrey V. Polyanichenko, Dmitry S. Butakova, Maria A. XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications
title	XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications
title_full	XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications
title_fullStr	XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications
title_full_unstemmed	XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications
title_short	XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications
title_sort	xas data preprocessing of nanocatalysts for machine learning applications
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8709119/ https://www.ncbi.nlm.nih.gov/pubmed/34947477 http://dx.doi.org/10.3390/ma14247884
work_keys_str_mv	AT kartashovolego xasdatapreprocessingofnanocatalystsformachinelearningapplications AT chernovandreyv xasdatapreprocessingofnanocatalystsformachinelearningapplications AT polyanichenkodmitrys xasdatapreprocessingofnanocatalystsformachinelearningapplications AT butakovamariaa xasdatapreprocessingofnanocatalystsformachinelearningapplications

XAS Data Preprocessing of Nanocatalysts for Machine Learning Applications

Ejemplares similares