Cargando…

PyCellBase, an efficient python package for easy retrieval of biological data from heterogeneous sources

BACKGROUND: Biological databases and repositories are incrementing in diversity and complexity over the years. This rapid expansion of current and new sources of biological knowledge raises serious problems of data accessibility and integration. To handle the growing necessity of unification, CellBa...

Descripción completa

Detalles Bibliográficos
Autores principales: Perez-Gil, Daniel, Lopez, Francisco J., Dopazo, Joaquin, Marin-Garcia, Pablo, Rendon, Augusto, Medina, Ignacio
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6438028/
https://www.ncbi.nlm.nih.gov/pubmed/30922213
http://dx.doi.org/10.1186/s12859-019-2726-4
_version_ 1783407044438523904
author Perez-Gil, Daniel
Lopez, Francisco J.
Dopazo, Joaquin
Marin-Garcia, Pablo
Rendon, Augusto
Medina, Ignacio
author_facet Perez-Gil, Daniel
Lopez, Francisco J.
Dopazo, Joaquin
Marin-Garcia, Pablo
Rendon, Augusto
Medina, Ignacio
author_sort Perez-Gil, Daniel
collection PubMed
description BACKGROUND: Biological databases and repositories are incrementing in diversity and complexity over the years. This rapid expansion of current and new sources of biological knowledge raises serious problems of data accessibility and integration. To handle the growing necessity of unification, CellBase was created as an integrative solution. CellBase provides a centralized NoSQL database containing biological information from different and heterogeneous sources. Access to this information is done through a RESTful web service API, which provides an efficient interface to the data. RESULTS: In this work we present PyCellBase, a Python package that provides programmatic access to the rich RESTful web service API offered by CellBase. This package offers a fast and user-friendly access to biological information without the need of installing any local database. In addition, a series of command-line tools are provided to perform common bioinformatic tasks, such as variant annotation. CellBase data is always available by a high-availability cluster and queries have been tuned to ensure a real-time performance. CONCLUSION: PyCellBase is an open-source Python package that provides an efficient access to heterogeneous biological information. It allows to perform tasks that require a comprehensive set of knowledge resources, as for example variant annotation. Queries can be easily fine-tuned to retrieve the desired information of particular biological features. PyCellBase offers the convenience of an object-oriented scripting language and provides the ability to integrate the obtained results into other Python applications and pipelines. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2726-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6438028
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64380282019-04-08 PyCellBase, an efficient python package for easy retrieval of biological data from heterogeneous sources Perez-Gil, Daniel Lopez, Francisco J. Dopazo, Joaquin Marin-Garcia, Pablo Rendon, Augusto Medina, Ignacio BMC Bioinformatics Software BACKGROUND: Biological databases and repositories are incrementing in diversity and complexity over the years. This rapid expansion of current and new sources of biological knowledge raises serious problems of data accessibility and integration. To handle the growing necessity of unification, CellBase was created as an integrative solution. CellBase provides a centralized NoSQL database containing biological information from different and heterogeneous sources. Access to this information is done through a RESTful web service API, which provides an efficient interface to the data. RESULTS: In this work we present PyCellBase, a Python package that provides programmatic access to the rich RESTful web service API offered by CellBase. This package offers a fast and user-friendly access to biological information without the need of installing any local database. In addition, a series of command-line tools are provided to perform common bioinformatic tasks, such as variant annotation. CellBase data is always available by a high-availability cluster and queries have been tuned to ensure a real-time performance. CONCLUSION: PyCellBase is an open-source Python package that provides an efficient access to heterogeneous biological information. It allows to perform tasks that require a comprehensive set of knowledge resources, as for example variant annotation. Queries can be easily fine-tuned to retrieve the desired information of particular biological features. PyCellBase offers the convenience of an object-oriented scripting language and provides the ability to integrate the obtained results into other Python applications and pipelines. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2726-4) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-28 /pmc/articles/PMC6438028/ /pubmed/30922213 http://dx.doi.org/10.1186/s12859-019-2726-4 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Perez-Gil, Daniel
Lopez, Francisco J.
Dopazo, Joaquin
Marin-Garcia, Pablo
Rendon, Augusto
Medina, Ignacio
PyCellBase, an efficient python package for easy retrieval of biological data from heterogeneous sources
title PyCellBase, an efficient python package for easy retrieval of biological data from heterogeneous sources
title_full PyCellBase, an efficient python package for easy retrieval of biological data from heterogeneous sources
title_fullStr PyCellBase, an efficient python package for easy retrieval of biological data from heterogeneous sources
title_full_unstemmed PyCellBase, an efficient python package for easy retrieval of biological data from heterogeneous sources
title_short PyCellBase, an efficient python package for easy retrieval of biological data from heterogeneous sources
title_sort pycellbase, an efficient python package for easy retrieval of biological data from heterogeneous sources
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6438028/
https://www.ncbi.nlm.nih.gov/pubmed/30922213
http://dx.doi.org/10.1186/s12859-019-2726-4
work_keys_str_mv AT perezgildaniel pycellbaseanefficientpythonpackageforeasyretrievalofbiologicaldatafromheterogeneoussources
AT lopezfranciscoj pycellbaseanefficientpythonpackageforeasyretrievalofbiologicaldatafromheterogeneoussources
AT dopazojoaquin pycellbaseanefficientpythonpackageforeasyretrievalofbiologicaldatafromheterogeneoussources
AT maringarciapablo pycellbaseanefficientpythonpackageforeasyretrievalofbiologicaldatafromheterogeneoussources
AT rendonaugusto pycellbaseanefficientpythonpackageforeasyretrievalofbiologicaldatafromheterogeneoussources
AT medinaignacio pycellbaseanefficientpythonpackageforeasyretrievalofbiologicaldatafromheterogeneoussources