Cargando…
No one-size-fits-all solution to clean GBIF
Species occurrence records provide the basis for many biodiversity studies. They derive from georeferenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, th...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7528811/ https://www.ncbi.nlm.nih.gov/pubmed/33062422 http://dx.doi.org/10.7717/peerj.9916 |
_version_ | 1783589330804015104 |
---|---|
author | Zizka, Alexander Antunes Carvalho, Fernanda Calvente, Alice Rocio Baez-Lizarazo, Mabel Cabral, Andressa Coelho, Jéssica Fernanda Ramos Colli-Silva, Matheus Fantinati, Mariana Ramos Fernandes, Moabe F. Ferreira-Araújo, Thais Gondim Lambert Moreira, Fernanda Santos, Nathália Michellyda Cunha Santos, Tiago Andrade Borges dos Santos-Costa, Renata Clicia Serrano, Filipe C. Alves da Silva, Ana Paula de Souza Soares, Arthur Cavalcante de Souza, Paolla Gabryelle Calisto Tomaz, Eduardo Vale, Valéria Fonseca Vieira, Tiago Luiz Antonelli, Alexandre |
author_facet | Zizka, Alexander Antunes Carvalho, Fernanda Calvente, Alice Rocio Baez-Lizarazo, Mabel Cabral, Andressa Coelho, Jéssica Fernanda Ramos Colli-Silva, Matheus Fantinati, Mariana Ramos Fernandes, Moabe F. Ferreira-Araújo, Thais Gondim Lambert Moreira, Fernanda Santos, Nathália Michellyda Cunha Santos, Tiago Andrade Borges dos Santos-Costa, Renata Clicia Serrano, Filipe C. Alves da Silva, Ana Paula de Souza Soares, Arthur Cavalcante de Souza, Paolla Gabryelle Calisto Tomaz, Eduardo Vale, Valéria Fonseca Vieira, Tiago Luiz Antonelli, Alexandre |
author_sort | Zizka, Alexander |
collection | PubMed |
description | Species occurrence records provide the basis for many biodiversity studies. They derive from georeferenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, the control of quality and accuracy constitutes a particular concern. Automatic filtering is a scalable and reproducible means to identify potentially problematic records and tailor datasets from public databases such as the Global Biodiversity Information Facility (GBIF; http://www.gbif.org), for biodiversity analyses. However, it is unclear how much data may be lost by filtering, whether the same filters should be applied across all taxonomic groups, and what the effect of filtering is on common downstream analyses. Here, we evaluate the effect of 13 recently proposed filters on the inference of species richness patterns and automated conservation assessments for 18 Neotropical taxa, including terrestrial and marine animals, fungi, and plants downloaded from GBIF. We find that a total of 44.3% of the records are potentially problematic, with large variation across taxonomic groups (25–90%). A small fraction of records was identified as erroneous in the strict sense (4.2%), and a much larger proportion as unfit for most downstream analyses (41.7%). Filters of duplicated information, collection year, and basis of record, as well as coordinates in urban areas, or for terrestrial taxa in the sea or marine taxa on land, have the greatest effect. Automated filtering can help in identifying problematic records, but requires customization of which tests and thresholds should be applied to the taxonomic group and geographic area under focus. Our results stress the importance of thorough recording and exploration of the meta-data associated with species records for biodiversity research. |
format | Online Article Text |
id | pubmed-7528811 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-75288112020-10-13 No one-size-fits-all solution to clean GBIF Zizka, Alexander Antunes Carvalho, Fernanda Calvente, Alice Rocio Baez-Lizarazo, Mabel Cabral, Andressa Coelho, Jéssica Fernanda Ramos Colli-Silva, Matheus Fantinati, Mariana Ramos Fernandes, Moabe F. Ferreira-Araújo, Thais Gondim Lambert Moreira, Fernanda Santos, Nathália Michellyda Cunha Santos, Tiago Andrade Borges dos Santos-Costa, Renata Clicia Serrano, Filipe C. Alves da Silva, Ana Paula de Souza Soares, Arthur Cavalcante de Souza, Paolla Gabryelle Calisto Tomaz, Eduardo Vale, Valéria Fonseca Vieira, Tiago Luiz Antonelli, Alexandre PeerJ Biodiversity Species occurrence records provide the basis for many biodiversity studies. They derive from georeferenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, the control of quality and accuracy constitutes a particular concern. Automatic filtering is a scalable and reproducible means to identify potentially problematic records and tailor datasets from public databases such as the Global Biodiversity Information Facility (GBIF; http://www.gbif.org), for biodiversity analyses. However, it is unclear how much data may be lost by filtering, whether the same filters should be applied across all taxonomic groups, and what the effect of filtering is on common downstream analyses. Here, we evaluate the effect of 13 recently proposed filters on the inference of species richness patterns and automated conservation assessments for 18 Neotropical taxa, including terrestrial and marine animals, fungi, and plants downloaded from GBIF. We find that a total of 44.3% of the records are potentially problematic, with large variation across taxonomic groups (25–90%). A small fraction of records was identified as erroneous in the strict sense (4.2%), and a much larger proportion as unfit for most downstream analyses (41.7%). Filters of duplicated information, collection year, and basis of record, as well as coordinates in urban areas, or for terrestrial taxa in the sea or marine taxa on land, have the greatest effect. Automated filtering can help in identifying problematic records, but requires customization of which tests and thresholds should be applied to the taxonomic group and geographic area under focus. Our results stress the importance of thorough recording and exploration of the meta-data associated with species records for biodiversity research. PeerJ Inc. 2020-09-28 /pmc/articles/PMC7528811/ /pubmed/33062422 http://dx.doi.org/10.7717/peerj.9916 Text en ©2020 Zizka et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Biodiversity Zizka, Alexander Antunes Carvalho, Fernanda Calvente, Alice Rocio Baez-Lizarazo, Mabel Cabral, Andressa Coelho, Jéssica Fernanda Ramos Colli-Silva, Matheus Fantinati, Mariana Ramos Fernandes, Moabe F. Ferreira-Araújo, Thais Gondim Lambert Moreira, Fernanda Santos, Nathália Michellyda Cunha Santos, Tiago Andrade Borges dos Santos-Costa, Renata Clicia Serrano, Filipe C. Alves da Silva, Ana Paula de Souza Soares, Arthur Cavalcante de Souza, Paolla Gabryelle Calisto Tomaz, Eduardo Vale, Valéria Fonseca Vieira, Tiago Luiz Antonelli, Alexandre No one-size-fits-all solution to clean GBIF |
title | No one-size-fits-all solution to clean GBIF |
title_full | No one-size-fits-all solution to clean GBIF |
title_fullStr | No one-size-fits-all solution to clean GBIF |
title_full_unstemmed | No one-size-fits-all solution to clean GBIF |
title_short | No one-size-fits-all solution to clean GBIF |
title_sort | no one-size-fits-all solution to clean gbif |
topic | Biodiversity |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7528811/ https://www.ncbi.nlm.nih.gov/pubmed/33062422 http://dx.doi.org/10.7717/peerj.9916 |
work_keys_str_mv | AT zizkaalexander noonesizefitsallsolutiontocleangbif AT antunescarvalhofernanda noonesizefitsallsolutiontocleangbif AT calventealice noonesizefitsallsolutiontocleangbif AT rociobaezlizarazomabel noonesizefitsallsolutiontocleangbif AT cabralandressa noonesizefitsallsolutiontocleangbif AT coelhojessicafernandaramos noonesizefitsallsolutiontocleangbif AT collisilvamatheus noonesizefitsallsolutiontocleangbif AT fantinatimarianaramos noonesizefitsallsolutiontocleangbif AT fernandesmoabef noonesizefitsallsolutiontocleangbif AT ferreiraaraujothais noonesizefitsallsolutiontocleangbif AT gondimlambertmoreirafernanda noonesizefitsallsolutiontocleangbif AT santosnathaliamichellydacunha noonesizefitsallsolutiontocleangbif AT santostiagoandradeborges noonesizefitsallsolutiontocleangbif AT dossantoscostarenataclicia noonesizefitsallsolutiontocleangbif AT serranofilipec noonesizefitsallsolutiontocleangbif AT alvesdasilvaanapaula noonesizefitsallsolutiontocleangbif AT desouzasoaresarthur noonesizefitsallsolutiontocleangbif AT cavalcantedesouzapaollagabryelle noonesizefitsallsolutiontocleangbif AT calistotomazeduardo noonesizefitsallsolutiontocleangbif AT valevaleriafonseca noonesizefitsallsolutiontocleangbif AT vieiratiagoluiz noonesizefitsallsolutiontocleangbif AT antonellialexandre noonesizefitsallsolutiontocleangbif |