Cargando…

Machine Learning-Based Classification of Lignocellulosic Biomass from Pyrolysis-Molecular Beam Mass Spectrometry Data

High-throughput analysis of biomass is necessary to ensure consistent and uniform feedstocks for agricultural and bioenergy applications and is needed to inform genomics and systems biology models. Pyrolysis followed by mass spectrometry such as molecular beam mass spectrometry (py-MBMS) analyses ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Nag, Ambarish, Gerritsen, Alida, Doeppke, Crissa, Harman-Ware, Anne E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8071563/
https://www.ncbi.nlm.nih.gov/pubmed/33921121
http://dx.doi.org/10.3390/ijms22084107
_version_ 1783683737789136896
author Nag, Ambarish
Gerritsen, Alida
Doeppke, Crissa
Harman-Ware, Anne E.
author_facet Nag, Ambarish
Gerritsen, Alida
Doeppke, Crissa
Harman-Ware, Anne E.
author_sort Nag, Ambarish
collection PubMed
description High-throughput analysis of biomass is necessary to ensure consistent and uniform feedstocks for agricultural and bioenergy applications and is needed to inform genomics and systems biology models. Pyrolysis followed by mass spectrometry such as molecular beam mass spectrometry (py-MBMS) analyses are becoming increasingly popular for the rapid analysis of biomass cell wall composition and typically require the use of different data analysis tools depending on the need and application. Here, the authors report the py-MBMS analysis of several types of lignocellulosic biomass to gain an understanding of spectral patterns and variation with associated biomass composition and use machine learning approaches to classify, differentiate, and predict biomass types on the basis of py-MBMS spectra. Py-MBMS spectra were also corrected for instrumental variance using generalized linear modeling (GLM) based on the use of select ions relative abundances as spike-in controls. Machine learning classification algorithms e.g., random forest, k-nearest neighbor, decision tree, Gaussian Naïve Bayes, gradient boosting, and multilayer perceptron classifiers were used. The k-nearest neighbors (k-NN) classifier generally performed the best for classifications using raw spectral data, and the decision tree classifier performed the worst. After normalization of spectra to account for instrumental variance, all the classifiers had comparable and generally acceptable performance for predicting the biomass types, although the k-NN and decision tree classifiers were not as accurate for prediction of specific sample types. Gaussian Naïve Bayes (GNB) and extreme gradient boosting (XGB) classifiers performed better than the k-NN and the decision tree classifiers for the prediction of biomass mixtures. The data analysis workflow reported here could be applied and extended for comparison of biomass samples of varying types, species, phenotypes, and/or genotypes or subjected to different treatments, environments, etc. to further elucidate the sources of spectral variance, patterns, and to infer compositional information based on spectral analysis, particularly for analysis of data without a priori knowledge of the feedstock composition or identity.
format Online
Article
Text
id pubmed-8071563
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-80715632021-04-26 Machine Learning-Based Classification of Lignocellulosic Biomass from Pyrolysis-Molecular Beam Mass Spectrometry Data Nag, Ambarish Gerritsen, Alida Doeppke, Crissa Harman-Ware, Anne E. Int J Mol Sci Article High-throughput analysis of biomass is necessary to ensure consistent and uniform feedstocks for agricultural and bioenergy applications and is needed to inform genomics and systems biology models. Pyrolysis followed by mass spectrometry such as molecular beam mass spectrometry (py-MBMS) analyses are becoming increasingly popular for the rapid analysis of biomass cell wall composition and typically require the use of different data analysis tools depending on the need and application. Here, the authors report the py-MBMS analysis of several types of lignocellulosic biomass to gain an understanding of spectral patterns and variation with associated biomass composition and use machine learning approaches to classify, differentiate, and predict biomass types on the basis of py-MBMS spectra. Py-MBMS spectra were also corrected for instrumental variance using generalized linear modeling (GLM) based on the use of select ions relative abundances as spike-in controls. Machine learning classification algorithms e.g., random forest, k-nearest neighbor, decision tree, Gaussian Naïve Bayes, gradient boosting, and multilayer perceptron classifiers were used. The k-nearest neighbors (k-NN) classifier generally performed the best for classifications using raw spectral data, and the decision tree classifier performed the worst. After normalization of spectra to account for instrumental variance, all the classifiers had comparable and generally acceptable performance for predicting the biomass types, although the k-NN and decision tree classifiers were not as accurate for prediction of specific sample types. Gaussian Naïve Bayes (GNB) and extreme gradient boosting (XGB) classifiers performed better than the k-NN and the decision tree classifiers for the prediction of biomass mixtures. The data analysis workflow reported here could be applied and extended for comparison of biomass samples of varying types, species, phenotypes, and/or genotypes or subjected to different treatments, environments, etc. to further elucidate the sources of spectral variance, patterns, and to infer compositional information based on spectral analysis, particularly for analysis of data without a priori knowledge of the feedstock composition or identity. MDPI 2021-04-15 /pmc/articles/PMC8071563/ /pubmed/33921121 http://dx.doi.org/10.3390/ijms22084107 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Nag, Ambarish
Gerritsen, Alida
Doeppke, Crissa
Harman-Ware, Anne E.
Machine Learning-Based Classification of Lignocellulosic Biomass from Pyrolysis-Molecular Beam Mass Spectrometry Data
title Machine Learning-Based Classification of Lignocellulosic Biomass from Pyrolysis-Molecular Beam Mass Spectrometry Data
title_full Machine Learning-Based Classification of Lignocellulosic Biomass from Pyrolysis-Molecular Beam Mass Spectrometry Data
title_fullStr Machine Learning-Based Classification of Lignocellulosic Biomass from Pyrolysis-Molecular Beam Mass Spectrometry Data
title_full_unstemmed Machine Learning-Based Classification of Lignocellulosic Biomass from Pyrolysis-Molecular Beam Mass Spectrometry Data
title_short Machine Learning-Based Classification of Lignocellulosic Biomass from Pyrolysis-Molecular Beam Mass Spectrometry Data
title_sort machine learning-based classification of lignocellulosic biomass from pyrolysis-molecular beam mass spectrometry data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8071563/
https://www.ncbi.nlm.nih.gov/pubmed/33921121
http://dx.doi.org/10.3390/ijms22084107
work_keys_str_mv AT nagambarish machinelearningbasedclassificationoflignocellulosicbiomassfrompyrolysismolecularbeammassspectrometrydata
AT gerritsenalida machinelearningbasedclassificationoflignocellulosicbiomassfrompyrolysismolecularbeammassspectrometrydata
AT doeppkecrissa machinelearningbasedclassificationoflignocellulosicbiomassfrompyrolysismolecularbeammassspectrometrydata
AT harmanwareannee machinelearningbasedclassificationoflignocellulosicbiomassfrompyrolysismolecularbeammassspectrometrydata