Cargando…

Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse

Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of l...

Descripción completa

Detalles Bibliográficos
Autores principales: Soranno, Patricia A., Bissell, Edward G., Cheruvelil, Kendra S., Christel, Samuel T., Collins, Sarah M., Fergus, C. Emi, Filstrup, Christopher T., Lapierre, Jean-Francois, Lottig, Noah R., Oliver, Samantha K., Scott, Caren E., Smith, Nicole J., Stopyak, Scott, Yuan, Shuai, Bremigan, Mary Tate, Downing, John A., Gries, Corinna, Henry, Emily N., Skaff, Nick K., Stanley, Emily H., Stow, Craig A., Tan, Pang-Ning, Wagner, Tyler, Webster, Katherine E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4488039/
https://www.ncbi.nlm.nih.gov/pubmed/26140212
http://dx.doi.org/10.1186/s13742-015-0067-4
_version_ 1782379081791176704
author Soranno, Patricia A.
Bissell, Edward G.
Cheruvelil, Kendra S.
Christel, Samuel T.
Collins, Sarah M.
Fergus, C. Emi
Filstrup, Christopher T.
Lapierre, Jean-Francois
Lottig, Noah R.
Oliver, Samantha K.
Scott, Caren E.
Smith, Nicole J.
Stopyak, Scott
Yuan, Shuai
Bremigan, Mary Tate
Downing, John A.
Gries, Corinna
Henry, Emily N.
Skaff, Nick K.
Stanley, Emily H.
Stow, Craig A.
Tan, Pang-Ning
Wagner, Tyler
Webster, Katherine E.
author_facet Soranno, Patricia A.
Bissell, Edward G.
Cheruvelil, Kendra S.
Christel, Samuel T.
Collins, Sarah M.
Fergus, C. Emi
Filstrup, Christopher T.
Lapierre, Jean-Francois
Lottig, Noah R.
Oliver, Samantha K.
Scott, Caren E.
Smith, Nicole J.
Stopyak, Scott
Yuan, Shuai
Bremigan, Mary Tate
Downing, John A.
Gries, Corinna
Henry, Emily N.
Skaff, Nick K.
Stanley, Emily H.
Stow, Craig A.
Tan, Pang-Ning
Wagner, Tyler
Webster, Katherine E.
author_sort Soranno, Patricia A.
collection PubMed
description Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km(2)). LAGOS includes two modules: LAGOS(GEO), with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOS(LIMNO), with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-015-0067-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4488039
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44880392015-07-03 Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse Soranno, Patricia A. Bissell, Edward G. Cheruvelil, Kendra S. Christel, Samuel T. Collins, Sarah M. Fergus, C. Emi Filstrup, Christopher T. Lapierre, Jean-Francois Lottig, Noah R. Oliver, Samantha K. Scott, Caren E. Smith, Nicole J. Stopyak, Scott Yuan, Shuai Bremigan, Mary Tate Downing, John A. Gries, Corinna Henry, Emily N. Skaff, Nick K. Stanley, Emily H. Stow, Craig A. Tan, Pang-Ning Wagner, Tyler Webster, Katherine E. Gigascience Review Although there are considerable site-based data for individual or groups of ecosystems, these datasets are widely scattered, have different data formats and conventions, and often have limited accessibility. At the broader scale, national datasets exist for a large number of geospatial features of land, water, and air that are needed to fully understand variation among these ecosystems. However, such datasets originate from different sources and have different spatial and temporal resolutions. By taking an open-science perspective and by combining site-based ecosystem datasets and national geospatial datasets, science gains the ability to ask important research questions related to grand environmental challenges that operate at broad scales. Documentation of such complicated database integration efforts, through peer-reviewed papers, is recommended to foster reproducibility and future use of the integrated database. Here, we describe the major steps, challenges, and considerations in building an integrated database of lake ecosystems, called LAGOS (LAke multi-scaled GeOSpatial and temporal database), that was developed at the sub-continental study extent of 17 US states (1,800,000 km(2)). LAGOS includes two modules: LAGOS(GEO), with geospatial data on every lake with surface area larger than 4 ha in the study extent (~50,000 lakes), including climate, atmospheric deposition, land use/cover, hydrology, geology, and topography measured across a range of spatial and temporal extents; and LAGOS(LIMNO), with lake water quality data compiled from ~100 individual datasets for a subset of lakes in the study extent (~10,000 lakes). Procedures for the integration of datasets included: creating a flexible database design; authoring and integrating metadata; documenting data provenance; quantifying spatial measures of geographic data; quality-controlling integrated and derived data; and extensively documenting the database. Our procedures make a large, complex, and integrated database reproducible and extensible, allowing users to ask new research questions with the existing database or through the addition of new data. The largest challenge of this task was the heterogeneity of the data, formats, and metadata. Many steps of data integration need manual input from experts in diverse fields, requiring close collaboration. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13742-015-0067-4) contains supplementary material, which is available to authorized users. BioMed Central 2015-07-01 /pmc/articles/PMC4488039/ /pubmed/26140212 http://dx.doi.org/10.1186/s13742-015-0067-4 Text en © Soranno et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Review
Soranno, Patricia A.
Bissell, Edward G.
Cheruvelil, Kendra S.
Christel, Samuel T.
Collins, Sarah M.
Fergus, C. Emi
Filstrup, Christopher T.
Lapierre, Jean-Francois
Lottig, Noah R.
Oliver, Samantha K.
Scott, Caren E.
Smith, Nicole J.
Stopyak, Scott
Yuan, Shuai
Bremigan, Mary Tate
Downing, John A.
Gries, Corinna
Henry, Emily N.
Skaff, Nick K.
Stanley, Emily H.
Stow, Craig A.
Tan, Pang-Ning
Wagner, Tyler
Webster, Katherine E.
Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse
title Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse
title_full Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse
title_fullStr Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse
title_full_unstemmed Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse
title_short Building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse
title_sort building a multi-scaled geospatial temporal ecology database from disparate data sources: fostering open science and data reuse
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4488039/
https://www.ncbi.nlm.nih.gov/pubmed/26140212
http://dx.doi.org/10.1186/s13742-015-0067-4
work_keys_str_mv AT sorannopatriciaa buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT bisselledwardg buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT cheruvelilkendras buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT christelsamuelt buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT collinssarahm buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT ferguscemi buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT filstrupchristophert buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT lapierrejeanfrancois buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT lottignoahr buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT oliversamanthak buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT scottcarene buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT smithnicolej buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT stopyakscott buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT yuanshuai buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT bremiganmarytate buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT downingjohna buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT griescorinna buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT henryemilyn buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT skaffnickk buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT stanleyemilyh buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT stowcraiga buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT tanpangning buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT wagnertyler buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse
AT websterkatherinee buildingamultiscaledgeospatialtemporalecologydatabasefromdisparatedatasourcesfosteringopenscienceanddatareuse