Cargando…

Machado: Open source genomics data integration framework

BACKGROUND: Genome projects and multiomics experiments generate huge volumes of data that must be stored, mined, and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scena...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mudadu, Mauricio de Alvarenga, Zerlotini, Adhemar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7490629/ https://www.ncbi.nlm.nih.gov/pubmed/32930331 http://dx.doi.org/10.1093/gigascience/giaa097

_version_	1783582071025827840
author	Mudadu, Mauricio de Alvarenga Zerlotini, Adhemar
author_facet	Mudadu, Mauricio de Alvarenga Zerlotini, Adhemar
author_sort	Mudadu, Mauricio de Alvarenga
collection	PubMed
description	BACKGROUND: Genome projects and multiomics experiments generate huge volumes of data that must be stored, mined, and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scenario for more than a decade and have been implementing software and databases to meet this challenge. The GMOD's (Generic Model Organism Database) biological relational database schema, known as Chado, is one of the few successful open source initiatives; it is widely adopted and many software packages are able to connect to it. FINDINGS: We have been developing an open source software package named Machado, a genomics data integration framework implemented in Python, to enable research groups to both store and visualize genomics data. The framework relies on the Chado database schema and, therefore, should be very intuitive for current developers to adopt it or have it running on top of already existing databases. It has several data-loading tools for genomics and transcriptomics data and also for annotation results from tools such as BLAST, InterproScan, OrthoMCL, and LSTrAP. There is an API to connect to JBrowse, and a web visualization tool is implemented using Django Views and Templates. The Haystack library integrated with the ElasticSearch engine was used to implement a Google-like search, i.e., single auto-complete search box that provides fast results and filters. CONCLUSION: Machado aims to be a modern object-relational framework that uses the latest Python libraries to produce an effective open source resource for genomics research.
format	Online Article Text
id	pubmed-7490629
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-74906292020-09-21 Machado: Open source genomics data integration framework Mudadu, Mauricio de Alvarenga Zerlotini, Adhemar Gigascience Technical Note BACKGROUND: Genome projects and multiomics experiments generate huge volumes of data that must be stored, mined, and transformed into useful knowledge. All this information is supposed to be accessible and, if possible, browsable afterwards. Computational biologists have been dealing with this scenario for more than a decade and have been implementing software and databases to meet this challenge. The GMOD's (Generic Model Organism Database) biological relational database schema, known as Chado, is one of the few successful open source initiatives; it is widely adopted and many software packages are able to connect to it. FINDINGS: We have been developing an open source software package named Machado, a genomics data integration framework implemented in Python, to enable research groups to both store and visualize genomics data. The framework relies on the Chado database schema and, therefore, should be very intuitive for current developers to adopt it or have it running on top of already existing databases. It has several data-loading tools for genomics and transcriptomics data and also for annotation results from tools such as BLAST, InterproScan, OrthoMCL, and LSTrAP. There is an API to connect to JBrowse, and a web visualization tool is implemented using Django Views and Templates. The Haystack library integrated with the ElasticSearch engine was used to implement a Google-like search, i.e., single auto-complete search box that provides fast results and filters. CONCLUSION: Machado aims to be a modern object-relational framework that uses the latest Python libraries to produce an effective open source resource for genomics research. Oxford University Press 2020-09-14 /pmc/articles/PMC7490629/ /pubmed/32930331 http://dx.doi.org/10.1093/gigascience/giaa097 Text en © The Author(s) 2020. Published by Oxford University Press GigaScience. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited
spellingShingle	Technical Note Mudadu, Mauricio de Alvarenga Zerlotini, Adhemar Machado: Open source genomics data integration framework
title	Machado: Open source genomics data integration framework
title_full	Machado: Open source genomics data integration framework
title_fullStr	Machado: Open source genomics data integration framework
title_full_unstemmed	Machado: Open source genomics data integration framework
title_short	Machado: Open source genomics data integration framework
title_sort	machado: open source genomics data integration framework
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7490629/ https://www.ncbi.nlm.nih.gov/pubmed/32930331 http://dx.doi.org/10.1093/gigascience/giaa097
work_keys_str_mv	AT mudadumauriciodealvarenga machadoopensourcegenomicsdataintegrationframework AT zerlotiniadhemar machadoopensourcegenomicsdataintegrationframework

Machado: Open source genomics data integration framework

Ejemplares similares