Cargando…

No one-size-fits-all solution to clean GBIF

Species occurrence records provide the basis for many biodiversity studies. They derive from georeferenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, th...

Descripción completa

Detalles Bibliográficos
Autores principales: Zizka, Alexander, Antunes Carvalho, Fernanda, Calvente, Alice, Rocio Baez-Lizarazo, Mabel, Cabral, Andressa, Coelho, Jéssica Fernanda Ramos, Colli-Silva, Matheus, Fantinati, Mariana Ramos, Fernandes, Moabe F., Ferreira-Araújo, Thais, Gondim Lambert Moreira, Fernanda, Santos, Nathália Michellyda Cunha, Santos, Tiago Andrade Borges, dos Santos-Costa, Renata Clicia, Serrano, Filipe C., Alves da Silva, Ana Paula, de Souza Soares, Arthur, Cavalcante de Souza, Paolla Gabryelle, Calisto Tomaz, Eduardo, Vale, Valéria Fonseca, Vieira, Tiago Luiz, Antonelli, Alexandre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7528811/
https://www.ncbi.nlm.nih.gov/pubmed/33062422
http://dx.doi.org/10.7717/peerj.9916
_version_ 1783589330804015104
author Zizka, Alexander
Antunes Carvalho, Fernanda
Calvente, Alice
Rocio Baez-Lizarazo, Mabel
Cabral, Andressa
Coelho, Jéssica Fernanda Ramos
Colli-Silva, Matheus
Fantinati, Mariana Ramos
Fernandes, Moabe F.
Ferreira-Araújo, Thais
Gondim Lambert Moreira, Fernanda
Santos, Nathália Michellyda Cunha
Santos, Tiago Andrade Borges
dos Santos-Costa, Renata Clicia
Serrano, Filipe C.
Alves da Silva, Ana Paula
de Souza Soares, Arthur
Cavalcante de Souza, Paolla Gabryelle
Calisto Tomaz, Eduardo
Vale, Valéria Fonseca
Vieira, Tiago Luiz
Antonelli, Alexandre
author_facet Zizka, Alexander
Antunes Carvalho, Fernanda
Calvente, Alice
Rocio Baez-Lizarazo, Mabel
Cabral, Andressa
Coelho, Jéssica Fernanda Ramos
Colli-Silva, Matheus
Fantinati, Mariana Ramos
Fernandes, Moabe F.
Ferreira-Araújo, Thais
Gondim Lambert Moreira, Fernanda
Santos, Nathália Michellyda Cunha
Santos, Tiago Andrade Borges
dos Santos-Costa, Renata Clicia
Serrano, Filipe C.
Alves da Silva, Ana Paula
de Souza Soares, Arthur
Cavalcante de Souza, Paolla Gabryelle
Calisto Tomaz, Eduardo
Vale, Valéria Fonseca
Vieira, Tiago Luiz
Antonelli, Alexandre
author_sort Zizka, Alexander
collection PubMed
description Species occurrence records provide the basis for many biodiversity studies. They derive from georeferenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, the control of quality and accuracy constitutes a particular concern. Automatic filtering is a scalable and reproducible means to identify potentially problematic records and tailor datasets from public databases such as the Global Biodiversity Information Facility (GBIF; http://www.gbif.org), for biodiversity analyses. However, it is unclear how much data may be lost by filtering, whether the same filters should be applied across all taxonomic groups, and what the effect of filtering is on common downstream analyses. Here, we evaluate the effect of 13 recently proposed filters on the inference of species richness patterns and automated conservation assessments for 18 Neotropical taxa, including terrestrial and marine animals, fungi, and plants downloaded from GBIF. We find that a total of 44.3% of the records are potentially problematic, with large variation across taxonomic groups (25–90%). A small fraction of records was identified as erroneous in the strict sense (4.2%), and a much larger proportion as unfit for most downstream analyses (41.7%). Filters of duplicated information, collection year, and basis of record, as well as coordinates in urban areas, or for terrestrial taxa in the sea or marine taxa on land, have the greatest effect. Automated filtering can help in identifying problematic records, but requires customization of which tests and thresholds should be applied to the taxonomic group and geographic area under focus. Our results stress the importance of thorough recording and exploration of the meta-data associated with species records for biodiversity research.
format Online
Article
Text
id pubmed-7528811
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-75288112020-10-13 No one-size-fits-all solution to clean GBIF Zizka, Alexander Antunes Carvalho, Fernanda Calvente, Alice Rocio Baez-Lizarazo, Mabel Cabral, Andressa Coelho, Jéssica Fernanda Ramos Colli-Silva, Matheus Fantinati, Mariana Ramos Fernandes, Moabe F. Ferreira-Araújo, Thais Gondim Lambert Moreira, Fernanda Santos, Nathália Michellyda Cunha Santos, Tiago Andrade Borges dos Santos-Costa, Renata Clicia Serrano, Filipe C. Alves da Silva, Ana Paula de Souza Soares, Arthur Cavalcante de Souza, Paolla Gabryelle Calisto Tomaz, Eduardo Vale, Valéria Fonseca Vieira, Tiago Luiz Antonelli, Alexandre PeerJ Biodiversity Species occurrence records provide the basis for many biodiversity studies. They derive from georeferenced specimens deposited in natural history collections and visual observations, such as those obtained through various mobile applications. Given the rapid increase in availability of such data, the control of quality and accuracy constitutes a particular concern. Automatic filtering is a scalable and reproducible means to identify potentially problematic records and tailor datasets from public databases such as the Global Biodiversity Information Facility (GBIF; http://www.gbif.org), for biodiversity analyses. However, it is unclear how much data may be lost by filtering, whether the same filters should be applied across all taxonomic groups, and what the effect of filtering is on common downstream analyses. Here, we evaluate the effect of 13 recently proposed filters on the inference of species richness patterns and automated conservation assessments for 18 Neotropical taxa, including terrestrial and marine animals, fungi, and plants downloaded from GBIF. We find that a total of 44.3% of the records are potentially problematic, with large variation across taxonomic groups (25–90%). A small fraction of records was identified as erroneous in the strict sense (4.2%), and a much larger proportion as unfit for most downstream analyses (41.7%). Filters of duplicated information, collection year, and basis of record, as well as coordinates in urban areas, or for terrestrial taxa in the sea or marine taxa on land, have the greatest effect. Automated filtering can help in identifying problematic records, but requires customization of which tests and thresholds should be applied to the taxonomic group and geographic area under focus. Our results stress the importance of thorough recording and exploration of the meta-data associated with species records for biodiversity research. PeerJ Inc. 2020-09-28 /pmc/articles/PMC7528811/ /pubmed/33062422 http://dx.doi.org/10.7717/peerj.9916 Text en ©2020 Zizka et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Biodiversity
Zizka, Alexander
Antunes Carvalho, Fernanda
Calvente, Alice
Rocio Baez-Lizarazo, Mabel
Cabral, Andressa
Coelho, Jéssica Fernanda Ramos
Colli-Silva, Matheus
Fantinati, Mariana Ramos
Fernandes, Moabe F.
Ferreira-Araújo, Thais
Gondim Lambert Moreira, Fernanda
Santos, Nathália Michellyda Cunha
Santos, Tiago Andrade Borges
dos Santos-Costa, Renata Clicia
Serrano, Filipe C.
Alves da Silva, Ana Paula
de Souza Soares, Arthur
Cavalcante de Souza, Paolla Gabryelle
Calisto Tomaz, Eduardo
Vale, Valéria Fonseca
Vieira, Tiago Luiz
Antonelli, Alexandre
No one-size-fits-all solution to clean GBIF
title No one-size-fits-all solution to clean GBIF
title_full No one-size-fits-all solution to clean GBIF
title_fullStr No one-size-fits-all solution to clean GBIF
title_full_unstemmed No one-size-fits-all solution to clean GBIF
title_short No one-size-fits-all solution to clean GBIF
title_sort no one-size-fits-all solution to clean gbif
topic Biodiversity
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7528811/
https://www.ncbi.nlm.nih.gov/pubmed/33062422
http://dx.doi.org/10.7717/peerj.9916
work_keys_str_mv AT zizkaalexander noonesizefitsallsolutiontocleangbif
AT antunescarvalhofernanda noonesizefitsallsolutiontocleangbif
AT calventealice noonesizefitsallsolutiontocleangbif
AT rociobaezlizarazomabel noonesizefitsallsolutiontocleangbif
AT cabralandressa noonesizefitsallsolutiontocleangbif
AT coelhojessicafernandaramos noonesizefitsallsolutiontocleangbif
AT collisilvamatheus noonesizefitsallsolutiontocleangbif
AT fantinatimarianaramos noonesizefitsallsolutiontocleangbif
AT fernandesmoabef noonesizefitsallsolutiontocleangbif
AT ferreiraaraujothais noonesizefitsallsolutiontocleangbif
AT gondimlambertmoreirafernanda noonesizefitsallsolutiontocleangbif
AT santosnathaliamichellydacunha noonesizefitsallsolutiontocleangbif
AT santostiagoandradeborges noonesizefitsallsolutiontocleangbif
AT dossantoscostarenataclicia noonesizefitsallsolutiontocleangbif
AT serranofilipec noonesizefitsallsolutiontocleangbif
AT alvesdasilvaanapaula noonesizefitsallsolutiontocleangbif
AT desouzasoaresarthur noonesizefitsallsolutiontocleangbif
AT cavalcantedesouzapaollagabryelle noonesizefitsallsolutiontocleangbif
AT calistotomazeduardo noonesizefitsallsolutiontocleangbif
AT valevaleriafonseca noonesizefitsallsolutiontocleangbif
AT vieiratiagoluiz noonesizefitsallsolutiontocleangbif
AT antonellialexandre noonesizefitsallsolutiontocleangbif