Cargando…
A globally synthesised and flagged bee occurrence dataset and cleaning workflow
Species occurrence data are foundational for research, conservation, and science communication, but the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We present BeeBDC, a new R package, and a global bee o...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10622554/ https://www.ncbi.nlm.nih.gov/pubmed/37919303 http://dx.doi.org/10.1038/s41597-023-02626-w |
_version_ | 1785130564994465792 |
---|---|
author | Dorey, James B. Fischer, Erica E. Chesshire, Paige R. Nava-Bolaños, Angela O’Reilly, Robert L. Bossert, Silas Collins, Shannon M. Lichtenberg, Elinor M. Tucker, Erika M. Smith-Pardo, Allan Falcon-Brindis, Armando Guevara, Diego A. Ribeiro, Bruno de Pedro, Diego Pickering, John Hung, Keng-Lou James Parys, Katherine A. McCabe, Lindsie M. Rogan, Matthew S. Minckley, Robert L. Velazco, Santiago J. E. Griswold, Terry Zarrillo, Tracy A. Jetz, Walter Sica, Yanina V. Orr, Michael C. Guzman, Laura Melissa Ascher, John S. Hughes, Alice C. Cobb, Neil S. |
author_facet | Dorey, James B. Fischer, Erica E. Chesshire, Paige R. Nava-Bolaños, Angela O’Reilly, Robert L. Bossert, Silas Collins, Shannon M. Lichtenberg, Elinor M. Tucker, Erika M. Smith-Pardo, Allan Falcon-Brindis, Armando Guevara, Diego A. Ribeiro, Bruno de Pedro, Diego Pickering, John Hung, Keng-Lou James Parys, Katherine A. McCabe, Lindsie M. Rogan, Matthew S. Minckley, Robert L. Velazco, Santiago J. E. Griswold, Terry Zarrillo, Tracy A. Jetz, Walter Sica, Yanina V. Orr, Michael C. Guzman, Laura Melissa Ascher, John S. Hughes, Alice C. Cobb, Neil S. |
author_sort | Dorey, James B. |
collection | PubMed |
description | Species occurrence data are foundational for research, conservation, and science communication, but the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We present BeeBDC, a new R package, and a global bee occurrence dataset to address this issue. We combined >18.3 million bee occurrence records from multiple public repositories (GBIF, SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated, and cleaned the data using the reproducible BeeBDC R-workflow. Specifically, we harmonised species names (following established global taxonomy), country names, and collection dates and, we added record-level flags for a series of potential quality issues. These data are provided in two formats, “cleaned” and “flagged-but-uncleaned”. The BeeBDC package with online documentation provides end users the ability to modify filtering parameters to address their research questions. By publishing reproducible R workflows and globally cleaned datasets, we can increase the accessibility and reliability of downstream analyses. This workflow can be implemented for other taxa to support research and conservation. |
format | Online Article Text |
id | pubmed-10622554 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-106225542023-11-04 A globally synthesised and flagged bee occurrence dataset and cleaning workflow Dorey, James B. Fischer, Erica E. Chesshire, Paige R. Nava-Bolaños, Angela O’Reilly, Robert L. Bossert, Silas Collins, Shannon M. Lichtenberg, Elinor M. Tucker, Erika M. Smith-Pardo, Allan Falcon-Brindis, Armando Guevara, Diego A. Ribeiro, Bruno de Pedro, Diego Pickering, John Hung, Keng-Lou James Parys, Katherine A. McCabe, Lindsie M. Rogan, Matthew S. Minckley, Robert L. Velazco, Santiago J. E. Griswold, Terry Zarrillo, Tracy A. Jetz, Walter Sica, Yanina V. Orr, Michael C. Guzman, Laura Melissa Ascher, John S. Hughes, Alice C. Cobb, Neil S. Sci Data Data Descriptor Species occurrence data are foundational for research, conservation, and science communication, but the limited availability and accessibility of reliable data represents a major obstacle, particularly for insects, which face mounting pressures. We present BeeBDC, a new R package, and a global bee occurrence dataset to address this issue. We combined >18.3 million bee occurrence records from multiple public repositories (GBIF, SCAN, iDigBio, USGS, ALA) and smaller datasets, then standardised, flagged, deduplicated, and cleaned the data using the reproducible BeeBDC R-workflow. Specifically, we harmonised species names (following established global taxonomy), country names, and collection dates and, we added record-level flags for a series of potential quality issues. These data are provided in two formats, “cleaned” and “flagged-but-uncleaned”. The BeeBDC package with online documentation provides end users the ability to modify filtering parameters to address their research questions. By publishing reproducible R workflows and globally cleaned datasets, we can increase the accessibility and reliability of downstream analyses. This workflow can be implemented for other taxa to support research and conservation. Nature Publishing Group UK 2023-11-02 /pmc/articles/PMC10622554/ /pubmed/37919303 http://dx.doi.org/10.1038/s41597-023-02626-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Data Descriptor Dorey, James B. Fischer, Erica E. Chesshire, Paige R. Nava-Bolaños, Angela O’Reilly, Robert L. Bossert, Silas Collins, Shannon M. Lichtenberg, Elinor M. Tucker, Erika M. Smith-Pardo, Allan Falcon-Brindis, Armando Guevara, Diego A. Ribeiro, Bruno de Pedro, Diego Pickering, John Hung, Keng-Lou James Parys, Katherine A. McCabe, Lindsie M. Rogan, Matthew S. Minckley, Robert L. Velazco, Santiago J. E. Griswold, Terry Zarrillo, Tracy A. Jetz, Walter Sica, Yanina V. Orr, Michael C. Guzman, Laura Melissa Ascher, John S. Hughes, Alice C. Cobb, Neil S. A globally synthesised and flagged bee occurrence dataset and cleaning workflow |
title | A globally synthesised and flagged bee occurrence dataset and cleaning workflow |
title_full | A globally synthesised and flagged bee occurrence dataset and cleaning workflow |
title_fullStr | A globally synthesised and flagged bee occurrence dataset and cleaning workflow |
title_full_unstemmed | A globally synthesised and flagged bee occurrence dataset and cleaning workflow |
title_short | A globally synthesised and flagged bee occurrence dataset and cleaning workflow |
title_sort | globally synthesised and flagged bee occurrence dataset and cleaning workflow |
topic | Data Descriptor |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10622554/ https://www.ncbi.nlm.nih.gov/pubmed/37919303 http://dx.doi.org/10.1038/s41597-023-02626-w |
work_keys_str_mv | AT doreyjamesb agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT fischerericae agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT chesshirepaiger agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT navabolanosangela agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT oreillyrobertl agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT bossertsilas agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT collinsshannonm agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT lichtenbergelinorm agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT tuckererikam agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT smithpardoallan agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT falconbrindisarmando agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT guevaradiegoa agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT ribeirobruno agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT depedrodiego agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT pickeringjohn agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT hungkengloujames agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT paryskatherinea agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT mccabelindsiem agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT roganmatthews agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT minckleyrobertl agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT velazcosantiagoje agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT griswoldterry agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT zarrillotracya agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT jetzwalter agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT sicayaninav agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT orrmichaelc agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT guzmanlauramelissa agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT ascherjohns agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT hughesalicec agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT cobbneils agloballysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT doreyjamesb globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT fischerericae globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT chesshirepaiger globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT navabolanosangela globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT oreillyrobertl globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT bossertsilas globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT collinsshannonm globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT lichtenbergelinorm globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT tuckererikam globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT smithpardoallan globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT falconbrindisarmando globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT guevaradiegoa globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT ribeirobruno globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT depedrodiego globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT pickeringjohn globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT hungkengloujames globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT paryskatherinea globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT mccabelindsiem globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT roganmatthews globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT minckleyrobertl globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT velazcosantiagoje globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT griswoldterry globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT zarrillotracya globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT jetzwalter globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT sicayaninav globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT orrmichaelc globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT guzmanlauramelissa globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT ascherjohns globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT hughesalicec globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow AT cobbneils globallysynthesisedandflaggedbeeoccurrencedatasetandcleaningworkflow |