Cargando…

A systematic review of data mining and machine learning for air pollution epidemiology

BACKGROUND: Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predic...

Descripción completa

Detalles Bibliográficos
Autores principales: Bellinger, Colin, Mohomed Jabbar, Mohomed Shazan, Zaïane, Osmar, Osornio-Vargas, Alvaro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5704396/
https://www.ncbi.nlm.nih.gov/pubmed/29179711
http://dx.doi.org/10.1186/s12889-017-4914-3
_version_ 1783281885784309760
author Bellinger, Colin
Mohomed Jabbar, Mohomed Shazan
Zaïane, Osmar
Osornio-Vargas, Alvaro
author_facet Bellinger, Colin
Mohomed Jabbar, Mohomed Shazan
Zaïane, Osmar
Osornio-Vargas, Alvaro
author_sort Bellinger, Colin
collection PubMed
description BACKGROUND: Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. METHODS: We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. RESULTS: Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. CONCLUSIONS: We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology. The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future.
format Online
Article
Text
id pubmed-5704396
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57043962017-12-05 A systematic review of data mining and machine learning for air pollution epidemiology Bellinger, Colin Mohomed Jabbar, Mohomed Shazan Zaïane, Osmar Osornio-Vargas, Alvaro BMC Public Health Research Article BACKGROUND: Data measuring airborne pollutants, public health and environmental factors are increasingly being stored and merged. These big datasets offer great potential, but also challenge traditional epidemiological methods. This has motivated the exploration of alternative methods to make predictions, find patterns and extract information. To this end, data mining and machine learning algorithms are increasingly being applied to air pollution epidemiology. METHODS: We conducted a systematic literature review on the application of data mining and machine learning methods in air pollution epidemiology. We carried out our search process in PubMed, the MEDLINE database and Google Scholar. Research articles applying data mining and machine learning methods to air pollution epidemiology were queried and reviewed. RESULTS: Our search queries resulted in 400 research articles. Our fine-grained analysis employed our inclusion/exclusion criteria to reduce the results to 47 articles, which we separate into three primary areas of interest: 1) source apportionment; 2) forecasting/prediction of air pollution/quality or exposure; and 3) generating hypotheses. Early applications had a preference for artificial neural networks. In more recent work, decision trees, support vector machines, k-means clustering and the APRIORI algorithm have been widely applied. Our survey shows that the majority of the research has been conducted in Europe, China and the USA, and that data mining is becoming an increasingly common tool in environmental health. For potential new directions, we have identified that deep learning and geo-spacial pattern mining are two burgeoning areas of data mining that have good potential for future applications in air pollution epidemiology. CONCLUSIONS: We carried out a systematic review identifying the current trends, challenges and new directions to explore in the application of data mining methods to air pollution epidemiology. This work shows that data mining is increasingly being applied in air pollution epidemiology. The potential to support air pollution epidemiology continues to grow with advancements in data mining related to temporal and geo-spacial mining, and deep learning. This is further supported by new sensors and storage mediums that enable larger, better quality data. This suggests that many more fruitful applications can be expected in the future. BioMed Central 2017-11-28 /pmc/articles/PMC5704396/ /pubmed/29179711 http://dx.doi.org/10.1186/s12889-017-4914-3 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Bellinger, Colin
Mohomed Jabbar, Mohomed Shazan
Zaïane, Osmar
Osornio-Vargas, Alvaro
A systematic review of data mining and machine learning for air pollution epidemiology
title A systematic review of data mining and machine learning for air pollution epidemiology
title_full A systematic review of data mining and machine learning for air pollution epidemiology
title_fullStr A systematic review of data mining and machine learning for air pollution epidemiology
title_full_unstemmed A systematic review of data mining and machine learning for air pollution epidemiology
title_short A systematic review of data mining and machine learning for air pollution epidemiology
title_sort systematic review of data mining and machine learning for air pollution epidemiology
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5704396/
https://www.ncbi.nlm.nih.gov/pubmed/29179711
http://dx.doi.org/10.1186/s12889-017-4914-3
work_keys_str_mv AT bellingercolin asystematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT mohomedjabbarmohomedshazan asystematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT zaianeosmar asystematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT osorniovargasalvaro asystematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT bellingercolin systematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT mohomedjabbarmohomedshazan systematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT zaianeosmar systematicreviewofdataminingandmachinelearningforairpollutionepidemiology
AT osorniovargasalvaro systematicreviewofdataminingandmachinelearningforairpollutionepidemiology