Cargando…

Detecting Arsenic Contamination Using Satellite Imagery and Machine Learning

Arsenic, a potent carcinogen and neurotoxin, affects over 200 million people globally. Current detection methods are laborious, expensive, and unscalable, being difficult to implement in developing regions and during crises such as COVID-19. This study attempts to determine if a relationship exists...

Descripción completa

Detalles Bibliográficos
Autores principales: Agrawal, Ayush, Petersen, Mark R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8707206/
https://www.ncbi.nlm.nih.gov/pubmed/34941767
http://dx.doi.org/10.3390/toxics9120333
Descripción
Sumario:Arsenic, a potent carcinogen and neurotoxin, affects over 200 million people globally. Current detection methods are laborious, expensive, and unscalable, being difficult to implement in developing regions and during crises such as COVID-19. This study attempts to determine if a relationship exists between soil’s hyperspectral data and arsenic concentration using NASA’s Hyperion satellite. It is the first arsenic study to use satellite-based hyperspectral data and apply a classification approach. Four regression machine learning models are tested to determine this correlation in soil with bare land cover. Raw data are converted to reflectance, problematic atmospheric influences are removed, characteristic wavelengths are selected, and four noise reduction algorithms are tested. The combination of data augmentation, Genetic Algorithm, Second Derivative Transformation, and Random Forest regression ([Formula: see text] and normalized root mean squared error (re-scaled to [0,1]) = [Formula: see text]) shows strong correlation, performing better than past models despite using noisier satellite data (versus lab-processed samples). Three binary classification machine learning models are then applied to identify high-risk shrub-covered regions in ten U.S. states, achieving strong accuracy (=0.693) and F1-score (=0.728). Overall, these results suggest that such a methodology is practical and can provide a sustainable alternative to arsenic contamination detection.