Cargando…

A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control

Abstract. The compilation and cleaning of data needed for analyses and prediction of species distributions is a time consuming process requiring a solid understanding of data formats and service APIs provided by biodiversity informatics infrastructures. We designed and implemented a Taverna-based Da...

Descripción completa

Detalles Bibliográficos
Autores principales: Mathew, Cherian, Güntsch, Anton, Obst, Matthias, Vicario, Saverio, Haines, Robert, Williams, Alan R., de Jong, Yde, Goble, Carole
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Pensoft Publishers 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4267104/
https://www.ncbi.nlm.nih.gov/pubmed/25535486
http://dx.doi.org/10.3897/BDJ.2.e4221
_version_ 1782349102481145856
author Mathew, Cherian
Güntsch, Anton
Obst, Matthias
Vicario, Saverio
Haines, Robert
Williams, Alan R.
de Jong, Yde
Goble, Carole
author_facet Mathew, Cherian
Güntsch, Anton
Obst, Matthias
Vicario, Saverio
Haines, Robert
Williams, Alan R.
de Jong, Yde
Goble, Carole
author_sort Mathew, Cherian
collection PubMed
description Abstract. The compilation and cleaning of data needed for analyses and prediction of species distributions is a time consuming process requiring a solid understanding of data formats and service APIs provided by biodiversity informatics infrastructures. We designed and implemented a Taverna-based Data Refinement Workflow which integrates taxonomic data retrieval, data cleaning, and data selection into a consistent, standards-based, and effective system hiding the complexity of underlying service infrastructures. The workflow can be freely used both locally and through a web-portal which does not require additional software installations by users.
format Online
Article
Text
id pubmed-4267104
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Pensoft Publishers
record_format MEDLINE/PubMed
spelling pubmed-42671042014-12-22 A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control Mathew, Cherian Güntsch, Anton Obst, Matthias Vicario, Saverio Haines, Robert Williams, Alan R. de Jong, Yde Goble, Carole Biodivers Data J Software Description Abstract. The compilation and cleaning of data needed for analyses and prediction of species distributions is a time consuming process requiring a solid understanding of data formats and service APIs provided by biodiversity informatics infrastructures. We designed and implemented a Taverna-based Data Refinement Workflow which integrates taxonomic data retrieval, data cleaning, and data selection into a consistent, standards-based, and effective system hiding the complexity of underlying service infrastructures. The workflow can be freely used both locally and through a web-portal which does not require additional software installations by users. Pensoft Publishers 2014-12-11 /pmc/articles/PMC4267104/ /pubmed/25535486 http://dx.doi.org/10.3897/BDJ.2.e4221 Text en Cherian Mathew, Anton Güntsch, Matthias Obst, Saverio Vicario, Robert Haines, Alan R. Williams, Yde de Jong, Carole Goble http://creativecommons.org/licenses/by/4.0 This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Software Description
Mathew, Cherian
Güntsch, Anton
Obst, Matthias
Vicario, Saverio
Haines, Robert
Williams, Alan R.
de Jong, Yde
Goble, Carole
A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control
title A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control
title_full A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control
title_fullStr A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control
title_full_unstemmed A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control
title_short A semi-automated workflow for biodiversity data retrieval, cleaning, and quality control
title_sort semi-automated workflow for biodiversity data retrieval, cleaning, and quality control
topic Software Description
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4267104/
https://www.ncbi.nlm.nih.gov/pubmed/25535486
http://dx.doi.org/10.3897/BDJ.2.e4221
work_keys_str_mv AT mathewcherian asemiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT guntschanton asemiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT obstmatthias asemiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT vicariosaverio asemiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT hainesrobert asemiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT williamsalanr asemiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT dejongyde asemiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT goblecarole asemiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT mathewcherian semiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT guntschanton semiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT obstmatthias semiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT vicariosaverio semiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT hainesrobert semiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT williamsalanr semiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT dejongyde semiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol
AT goblecarole semiautomatedworkflowforbiodiversitydataretrievalcleaningandqualitycontrol