Cargando…

An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE)

BACKGROUND: Examining whether disease cases are clustered in space is an important part of epidemiological research. Another important part of spatial epidemiology is testing whether patients suffering from a disease are more, or less, exposed to environmental factors of interest than adequately def...

Descripción completa

Detalles Bibliográficos
Autores principales: Baker, David M, Valleron, Alain-Jacques
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4233060/
https://www.ncbi.nlm.nih.gov/pubmed/25358866
http://dx.doi.org/10.1186/1476-072X-13-46
_version_ 1782344680338358272
author Baker, David M
Valleron, Alain-Jacques
author_facet Baker, David M
Valleron, Alain-Jacques
author_sort Baker, David M
collection PubMed
description BACKGROUND: Examining whether disease cases are clustered in space is an important part of epidemiological research. Another important part of spatial epidemiology is testing whether patients suffering from a disease are more, or less, exposed to environmental factors of interest than adequately defined controls. Both approaches involve determining the number of cases and controls (or population at risk) in specific zones. For cluster searches, this often must be done for millions of different zones. Doing this by calculating distances can lead to very lengthy computations. In this work we discuss the computational advantages of geographical grid-based methods, and introduce an open source software (FGBASE) which we have created for this purpose. METHODS: Geographical grids based on the Lambert Azimuthal Equal Area projection are well suited for spatial epidemiology because they preserve area: each cell of the grid has the same area. We describe how data is projected onto such a grid, as well as grid-based algorithms for spatial epidemiological data-mining. The software program (FGBASE), that we have developed, implements these grid-based methods. RESULTS: The grid based algorithms perform extremely fast. This is particularly the case for cluster searches. When applied to a cohort of French Type 1 Diabetes (T1D) patients, as an example, the grid based algorithms detected potential clusters in a few seconds on a modern laptop. This compares very favorably to an equivalent cluster search using distance calculations instead of a grid, which took over 4 hours on the same computer. In the case study we discovered 4 potential clusters of T1D cases near the cities of Le Havre, Dunkerque, Toulouse and Nantes. One example of environmental analysis with our software was to study whether a significant association could be found between distance to vineyards with heavy pesticide. None was found. In both examples, the software facilitates the rapid testing of hypotheses. CONCLUSIONS: Grid-based algorithms for mining spatial epidemiological data provide advantages in terms of computational complexity thus improving the speed of computations. We believe that these methods and this software tool (FGBASE) will lower the computational barriers to entry for those performing epidemiological research. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1476-072X-13-46) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4233060
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42330602014-11-17 An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE) Baker, David M Valleron, Alain-Jacques Int J Health Geogr Methodology BACKGROUND: Examining whether disease cases are clustered in space is an important part of epidemiological research. Another important part of spatial epidemiology is testing whether patients suffering from a disease are more, or less, exposed to environmental factors of interest than adequately defined controls. Both approaches involve determining the number of cases and controls (or population at risk) in specific zones. For cluster searches, this often must be done for millions of different zones. Doing this by calculating distances can lead to very lengthy computations. In this work we discuss the computational advantages of geographical grid-based methods, and introduce an open source software (FGBASE) which we have created for this purpose. METHODS: Geographical grids based on the Lambert Azimuthal Equal Area projection are well suited for spatial epidemiology because they preserve area: each cell of the grid has the same area. We describe how data is projected onto such a grid, as well as grid-based algorithms for spatial epidemiological data-mining. The software program (FGBASE), that we have developed, implements these grid-based methods. RESULTS: The grid based algorithms perform extremely fast. This is particularly the case for cluster searches. When applied to a cohort of French Type 1 Diabetes (T1D) patients, as an example, the grid based algorithms detected potential clusters in a few seconds on a modern laptop. This compares very favorably to an equivalent cluster search using distance calculations instead of a grid, which took over 4 hours on the same computer. In the case study we discovered 4 potential clusters of T1D cases near the cities of Le Havre, Dunkerque, Toulouse and Nantes. One example of environmental analysis with our software was to study whether a significant association could be found between distance to vineyards with heavy pesticide. None was found. In both examples, the software facilitates the rapid testing of hypotheses. CONCLUSIONS: Grid-based algorithms for mining spatial epidemiological data provide advantages in terms of computational complexity thus improving the speed of computations. We believe that these methods and this software tool (FGBASE) will lower the computational barriers to entry for those performing epidemiological research. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1476-072X-13-46) contains supplementary material, which is available to authorized users. BioMed Central 2014-10-30 /pmc/articles/PMC4233060/ /pubmed/25358866 http://dx.doi.org/10.1186/1476-072X-13-46 Text en © Baker and Valleron; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Baker, David M
Valleron, Alain-Jacques
An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE)
title An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE)
title_full An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE)
title_fullStr An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE)
title_full_unstemmed An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE)
title_short An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE)
title_sort open source software for fast grid-based data-mining in spatial epidemiology (fgbase)
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4233060/
https://www.ncbi.nlm.nih.gov/pubmed/25358866
http://dx.doi.org/10.1186/1476-072X-13-46
work_keys_str_mv AT bakerdavidm anopensourcesoftwareforfastgridbaseddatamininginspatialepidemiologyfgbase
AT valleronalainjacques anopensourcesoftwareforfastgridbaseddatamininginspatialepidemiologyfgbase
AT bakerdavidm opensourcesoftwareforfastgridbaseddatamininginspatialepidemiologyfgbase
AT valleronalainjacques opensourcesoftwareforfastgridbaseddatamininginspatialepidemiologyfgbase