Cargando…

Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study

Recent improvements in online information communication and mobile location-aware technologies have led to the production of large volumes of volunteered geographic information. Widespread, large-scale efforts by volunteers to collect data can inform and drive scientific advances in diverse fields,...

Descripción completa

Detalles Bibliográficos
Autores principales: Mehdipoor, Hamed, Zurita-Milla, Raul, Rosemartin, Alyssa, Gerst, Katharine L., Weltzin, Jake F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4618855/
https://www.ncbi.nlm.nih.gov/pubmed/26485157
http://dx.doi.org/10.1371/journal.pone.0140811
_version_ 1782396985519636480
author Mehdipoor, Hamed
Zurita-Milla, Raul
Rosemartin, Alyssa
Gerst, Katharine L.
Weltzin, Jake F.
author_facet Mehdipoor, Hamed
Zurita-Milla, Raul
Rosemartin, Alyssa
Gerst, Katharine L.
Weltzin, Jake F.
author_sort Mehdipoor, Hamed
collection PubMed
description Recent improvements in online information communication and mobile location-aware technologies have led to the production of large volumes of volunteered geographic information. Widespread, large-scale efforts by volunteers to collect data can inform and drive scientific advances in diverse fields, including ecology and climatology. Traditional workflows to check the quality of such volunteered information can be costly and time consuming as they heavily rely on human interventions. However, identifying factors that can influence data quality, such as inconsistency, is crucial when these data are used in modeling and decision-making frameworks. Recently developed workflows use simple statistical approaches that assume that the majority of the information is consistent. However, this assumption is not generalizable, and ignores underlying geographic and environmental contextual variability that may explain apparent inconsistencies. Here we describe an automated workflow to check inconsistency based on the availability of contextual environmental information for sampling locations. The workflow consists of three steps: (1) dimensionality reduction to facilitate further analysis and interpretation of results, (2) model-based clustering to group observations according to their contextual conditions, and (3) identification of inconsistent observations within each cluster. The workflow was applied to volunteered observations of flowering in common and cloned lilac plants (Syringa vulgaris and Syringa x chinensis) in the United States for the period 1980 to 2013. About 97% of the observations for both common and cloned lilacs were flagged as consistent, indicating that volunteers provided reliable information for this case study. Relative to the original dataset, the exclusion of inconsistent observations changed the apparent rate of change in lilac bloom dates by two days per decade, indicating the importance of inconsistency checking as a key step in data quality assessment for volunteered geographic information. Initiatives that leverage volunteered geographic information can adapt this workflow to improve the quality of their datasets and the robustness of their scientific analyses.
format Online
Article
Text
id pubmed-4618855
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-46188552015-10-29 Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study Mehdipoor, Hamed Zurita-Milla, Raul Rosemartin, Alyssa Gerst, Katharine L. Weltzin, Jake F. PLoS One Research Article Recent improvements in online information communication and mobile location-aware technologies have led to the production of large volumes of volunteered geographic information. Widespread, large-scale efforts by volunteers to collect data can inform and drive scientific advances in diverse fields, including ecology and climatology. Traditional workflows to check the quality of such volunteered information can be costly and time consuming as they heavily rely on human interventions. However, identifying factors that can influence data quality, such as inconsistency, is crucial when these data are used in modeling and decision-making frameworks. Recently developed workflows use simple statistical approaches that assume that the majority of the information is consistent. However, this assumption is not generalizable, and ignores underlying geographic and environmental contextual variability that may explain apparent inconsistencies. Here we describe an automated workflow to check inconsistency based on the availability of contextual environmental information for sampling locations. The workflow consists of three steps: (1) dimensionality reduction to facilitate further analysis and interpretation of results, (2) model-based clustering to group observations according to their contextual conditions, and (3) identification of inconsistent observations within each cluster. The workflow was applied to volunteered observations of flowering in common and cloned lilac plants (Syringa vulgaris and Syringa x chinensis) in the United States for the period 1980 to 2013. About 97% of the observations for both common and cloned lilacs were flagged as consistent, indicating that volunteers provided reliable information for this case study. Relative to the original dataset, the exclusion of inconsistent observations changed the apparent rate of change in lilac bloom dates by two days per decade, indicating the importance of inconsistency checking as a key step in data quality assessment for volunteered geographic information. Initiatives that leverage volunteered geographic information can adapt this workflow to improve the quality of their datasets and the robustness of their scientific analyses. Public Library of Science 2015-10-20 /pmc/articles/PMC4618855/ /pubmed/26485157 http://dx.doi.org/10.1371/journal.pone.0140811 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Mehdipoor, Hamed
Zurita-Milla, Raul
Rosemartin, Alyssa
Gerst, Katharine L.
Weltzin, Jake F.
Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study
title Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study
title_full Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study
title_fullStr Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study
title_full_unstemmed Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study
title_short Developing a Workflow to Identify Inconsistencies in Volunteered Geographic Information: A Phenological Case Study
title_sort developing a workflow to identify inconsistencies in volunteered geographic information: a phenological case study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4618855/
https://www.ncbi.nlm.nih.gov/pubmed/26485157
http://dx.doi.org/10.1371/journal.pone.0140811
work_keys_str_mv AT mehdipoorhamed developingaworkflowtoidentifyinconsistenciesinvolunteeredgeographicinformationaphenologicalcasestudy
AT zuritamillaraul developingaworkflowtoidentifyinconsistenciesinvolunteeredgeographicinformationaphenologicalcasestudy
AT rosemartinalyssa developingaworkflowtoidentifyinconsistenciesinvolunteeredgeographicinformationaphenologicalcasestudy
AT gerstkatharinel developingaworkflowtoidentifyinconsistenciesinvolunteeredgeographicinformationaphenologicalcasestudy
AT weltzinjakef developingaworkflowtoidentifyinconsistenciesinvolunteeredgeographicinformationaphenologicalcasestudy