Cargando…

Using A Low-Cost Sensor Array and Machine Learning Techniques to Detect Complex Pollutant Mixtures and Identify Likely Sources

An array of low-cost sensors was assembled and tested in a chamber environment wherein several pollutant mixtures were generated. The four classes of sources that were simulated were mobile emissions, biomass burning, natural gas emissions, and gasoline vapors. A two-step regression and classificati...

Descripción completa

Detalles Bibliográficos
Autores principales: Thorson, Jacob, Collier-Oxandale, Ashley, Hannigan, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749282/
https://www.ncbi.nlm.nih.gov/pubmed/31466288
http://dx.doi.org/10.3390/s19173723
_version_ 1783452241610407936
author Thorson, Jacob
Collier-Oxandale, Ashley
Hannigan, Michael
author_facet Thorson, Jacob
Collier-Oxandale, Ashley
Hannigan, Michael
author_sort Thorson, Jacob
collection PubMed
description An array of low-cost sensors was assembled and tested in a chamber environment wherein several pollutant mixtures were generated. The four classes of sources that were simulated were mobile emissions, biomass burning, natural gas emissions, and gasoline vapors. A two-step regression and classification method was developed and applied to the sensor data from this array. We first applied regression models to estimate the concentrations of several compounds and then classification models trained to use those estimates to identify the presence of each of those sources. The regression models that were used included forms of multiple linear regression, random forests, Gaussian process regression, and neural networks. The regression models with human-interpretable outputs were investigated to understand the utility of each sensor signal. The classification models that were trained included logistic regression, random forests, support vector machines, and neural networks. The best combination of models was determined by maximizing the F(1) score on ten-fold cross-validation data. The highest F(1) score, as calculated on testing data, was 0.72 and was produced by the combination of a multiple linear regression model utilizing the full array of sensors and a random forest classification model.
format Online
Article
Text
id pubmed-6749282
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-67492822019-09-27 Using A Low-Cost Sensor Array and Machine Learning Techniques to Detect Complex Pollutant Mixtures and Identify Likely Sources Thorson, Jacob Collier-Oxandale, Ashley Hannigan, Michael Sensors (Basel) Article An array of low-cost sensors was assembled and tested in a chamber environment wherein several pollutant mixtures were generated. The four classes of sources that were simulated were mobile emissions, biomass burning, natural gas emissions, and gasoline vapors. A two-step regression and classification method was developed and applied to the sensor data from this array. We first applied regression models to estimate the concentrations of several compounds and then classification models trained to use those estimates to identify the presence of each of those sources. The regression models that were used included forms of multiple linear regression, random forests, Gaussian process regression, and neural networks. The regression models with human-interpretable outputs were investigated to understand the utility of each sensor signal. The classification models that were trained included logistic regression, random forests, support vector machines, and neural networks. The best combination of models was determined by maximizing the F(1) score on ten-fold cross-validation data. The highest F(1) score, as calculated on testing data, was 0.72 and was produced by the combination of a multiple linear regression model utilizing the full array of sensors and a random forest classification model. MDPI 2019-08-28 /pmc/articles/PMC6749282/ /pubmed/31466288 http://dx.doi.org/10.3390/s19173723 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Thorson, Jacob
Collier-Oxandale, Ashley
Hannigan, Michael
Using A Low-Cost Sensor Array and Machine Learning Techniques to Detect Complex Pollutant Mixtures and Identify Likely Sources
title Using A Low-Cost Sensor Array and Machine Learning Techniques to Detect Complex Pollutant Mixtures and Identify Likely Sources
title_full Using A Low-Cost Sensor Array and Machine Learning Techniques to Detect Complex Pollutant Mixtures and Identify Likely Sources
title_fullStr Using A Low-Cost Sensor Array and Machine Learning Techniques to Detect Complex Pollutant Mixtures and Identify Likely Sources
title_full_unstemmed Using A Low-Cost Sensor Array and Machine Learning Techniques to Detect Complex Pollutant Mixtures and Identify Likely Sources
title_short Using A Low-Cost Sensor Array and Machine Learning Techniques to Detect Complex Pollutant Mixtures and Identify Likely Sources
title_sort using a low-cost sensor array and machine learning techniques to detect complex pollutant mixtures and identify likely sources
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6749282/
https://www.ncbi.nlm.nih.gov/pubmed/31466288
http://dx.doi.org/10.3390/s19173723
work_keys_str_mv AT thorsonjacob usingalowcostsensorarrayandmachinelearningtechniquestodetectcomplexpollutantmixturesandidentifylikelysources
AT collieroxandaleashley usingalowcostsensorarrayandmachinelearningtechniquestodetectcomplexpollutantmixturesandidentifylikelysources
AT hanniganmichael usingalowcostsensorarrayandmachinelearningtechniquestodetectcomplexpollutantmixturesandidentifylikelysources