Cargando…

A decoupled, modular and scriptable architecture for tools to curate data platforms

MOTIVATION: Curation is essential for any data platform to maintain the quality of the data it provides. Today, more effective curation tools are often vital to keep up with the rapid growth of existing, maintenance-requiring databases and the amount of newly published information that needs to be s...

Descripción completa

Detalles Bibliográficos
Autores principales: Langenstein, Momo, Hermjakob, Henning, Bernal Llinares, Manuel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8545344/
https://www.ncbi.nlm.nih.gov/pubmed/33830216
http://dx.doi.org/10.1093/bioinformatics/btab233
_version_ 1784589996808732672
author Langenstein, Momo
Hermjakob, Henning
Bernal Llinares, Manuel
author_facet Langenstein, Momo
Hermjakob, Henning
Bernal Llinares, Manuel
author_sort Langenstein, Momo
collection PubMed
description MOTIVATION: Curation is essential for any data platform to maintain the quality of the data it provides. Today, more effective curation tools are often vital to keep up with the rapid growth of existing, maintenance-requiring databases and the amount of newly published information that needs to be surveyed. However, curation interfaces are often complex and challenging to be further developed. Therefore, opportunities for experimentation with curation workflows may be lost due to a lack of development resources or a reluctance to change sensitive production systems. RESULTS: We propose a decoupled, modular and scriptable architecture to build new curation tools on top of existing platforms. Our architecture treats the existing platform as a black box. It, therefore, only relies on its public application programming interfaces and web application instead of requiring any changes to the existing infrastructure. As a case study, we have implemented this architecture in cmd-iaso, a curation tool for the identifiers.org registry. With cmd-iaso, we also show that the proposed design’s flexibility can be utilized to streamline and enhance the curator’s workflow with the platform’s existing web interface. AVAILABILITYAND IMPLEMENTATION: The cmd-iaso curation tool is implemented in Python 3.7+ and supports Linux, macOS and Windows. Its source code and documentation are freely available from https://github.com/identifiers-org/cmd-iaso. It is also published as a Docker container at https://hub.docker.com/r/identifiersorg/cmd-iaso. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8545344
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85453442021-10-26 A decoupled, modular and scriptable architecture for tools to curate data platforms Langenstein, Momo Hermjakob, Henning Bernal Llinares, Manuel Bioinformatics Applications Notes MOTIVATION: Curation is essential for any data platform to maintain the quality of the data it provides. Today, more effective curation tools are often vital to keep up with the rapid growth of existing, maintenance-requiring databases and the amount of newly published information that needs to be surveyed. However, curation interfaces are often complex and challenging to be further developed. Therefore, opportunities for experimentation with curation workflows may be lost due to a lack of development resources or a reluctance to change sensitive production systems. RESULTS: We propose a decoupled, modular and scriptable architecture to build new curation tools on top of existing platforms. Our architecture treats the existing platform as a black box. It, therefore, only relies on its public application programming interfaces and web application instead of requiring any changes to the existing infrastructure. As a case study, we have implemented this architecture in cmd-iaso, a curation tool for the identifiers.org registry. With cmd-iaso, we also show that the proposed design’s flexibility can be utilized to streamline and enhance the curator’s workflow with the platform’s existing web interface. AVAILABILITYAND IMPLEMENTATION: The cmd-iaso curation tool is implemented in Python 3.7+ and supports Linux, macOS and Windows. Its source code and documentation are freely available from https://github.com/identifiers-org/cmd-iaso. It is also published as a Docker container at https://hub.docker.com/r/identifiersorg/cmd-iaso. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-04-08 /pmc/articles/PMC8545344/ /pubmed/33830216 http://dx.doi.org/10.1093/bioinformatics/btab233 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Langenstein, Momo
Hermjakob, Henning
Bernal Llinares, Manuel
A decoupled, modular and scriptable architecture for tools to curate data platforms
title A decoupled, modular and scriptable architecture for tools to curate data platforms
title_full A decoupled, modular and scriptable architecture for tools to curate data platforms
title_fullStr A decoupled, modular and scriptable architecture for tools to curate data platforms
title_full_unstemmed A decoupled, modular and scriptable architecture for tools to curate data platforms
title_short A decoupled, modular and scriptable architecture for tools to curate data platforms
title_sort decoupled, modular and scriptable architecture for tools to curate data platforms
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8545344/
https://www.ncbi.nlm.nih.gov/pubmed/33830216
http://dx.doi.org/10.1093/bioinformatics/btab233
work_keys_str_mv AT langensteinmomo adecoupledmodularandscriptablearchitecturefortoolstocuratedataplatforms
AT hermjakobhenning adecoupledmodularandscriptablearchitecturefortoolstocuratedataplatforms
AT bernalllinaresmanuel adecoupledmodularandscriptablearchitecturefortoolstocuratedataplatforms
AT langensteinmomo decoupledmodularandscriptablearchitecturefortoolstocuratedataplatforms
AT hermjakobhenning decoupledmodularandscriptablearchitecturefortoolstocuratedataplatforms
AT bernalllinaresmanuel decoupledmodularandscriptablearchitecturefortoolstocuratedataplatforms