Cargando…

Atlas – a data warehouse for integrative bioinformatics

BACKGROUND: We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastr...

Descripción completa

Detalles Bibliográficos
Autores principales: Shah, Sohrab P, Huang, Yong, Xu, Tao, Yuen, Macaire MS, Ling, John, Ouellette, BF Francis
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC554782/
https://www.ncbi.nlm.nih.gov/pubmed/15723693
http://dx.doi.org/10.1186/1471-2105-6-34
_version_ 1782122521685917696
author Shah, Sohrab P
Huang, Yong
Xu, Tao
Yuen, Macaire MS
Ling, John
Ouellette, BF Francis
author_facet Shah, Sohrab P
Huang, Yong
Xu, Tao
Yuen, Macaire MS
Ling, John
Ouellette, BF Francis
author_sort Shah, Sohrab P
collection PubMed
description BACKGROUND: We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. DESCRIPTION: The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. CONCLUSION: The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at:
format Text
id pubmed-554782
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5547822005-03-18 Atlas – a data warehouse for integrative bioinformatics Shah, Sohrab P Huang, Yong Xu, Tao Yuen, Macaire MS Ling, John Ouellette, BF Francis BMC Bioinformatics Database BACKGROUND: We present a biological data warehouse called Atlas that locally stores and integrates biological sequences, molecular interactions, homology information, functional annotations of genes, and biological ontologies. The goal of the system is to provide data, as well as a software infrastructure for bioinformatics research and development. DESCRIPTION: The Atlas system is based on relational data models that we developed for each of the source data types. Data stored within these relational models are managed through Structured Query Language (SQL) calls that are implemented in a set of Application Programming Interfaces (APIs). The APIs include three languages: C++, Java, and Perl. The methods in these API libraries are used to construct a set of loader applications, which parse and load the source datasets into the Atlas database, and a set of toolbox applications which facilitate data retrieval. Atlas stores and integrates local instances of GenBank, RefSeq, UniProt, Human Protein Reference Database (HPRD), Biomolecular Interaction Network Database (BIND), Database of Interacting Proteins (DIP), Molecular Interactions Database (MINT), IntAct, NCBI Taxonomy, Gene Ontology (GO), Online Mendelian Inheritance in Man (OMIM), LocusLink, Entrez Gene and HomoloGene. The retrieval APIs and toolbox applications are critical components that offer end-users flexible, easy, integrated access to this data. We present use cases that use Atlas to integrate these sources for genome annotation, inference of molecular interactions across species, and gene-disease associations. CONCLUSION: The Atlas biological data warehouse serves as data infrastructure for bioinformatics research and development. It forms the backbone of the research activities in our laboratory and facilitates the integration of disparate, heterogeneous biological sources of data enabling new scientific inferences. Atlas achieves integration of diverse data sets at two levels. First, Atlas stores data of similar types using common data models, enforcing the relationships between data types. Second, integration is achieved through a combination of APIs, ontology, and tools. The Atlas software is freely available under the GNU General Public License at: BioMed Central 2005-02-21 /pmc/articles/PMC554782/ /pubmed/15723693 http://dx.doi.org/10.1186/1471-2105-6-34 Text en Copyright © 2005 Shah et al; licensee BioMed Central Ltd.
spellingShingle Database
Shah, Sohrab P
Huang, Yong
Xu, Tao
Yuen, Macaire MS
Ling, John
Ouellette, BF Francis
Atlas – a data warehouse for integrative bioinformatics
title Atlas – a data warehouse for integrative bioinformatics
title_full Atlas – a data warehouse for integrative bioinformatics
title_fullStr Atlas – a data warehouse for integrative bioinformatics
title_full_unstemmed Atlas – a data warehouse for integrative bioinformatics
title_short Atlas – a data warehouse for integrative bioinformatics
title_sort atlas – a data warehouse for integrative bioinformatics
topic Database
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC554782/
https://www.ncbi.nlm.nih.gov/pubmed/15723693
http://dx.doi.org/10.1186/1471-2105-6-34
work_keys_str_mv AT shahsohrabp atlasadatawarehouseforintegrativebioinformatics
AT huangyong atlasadatawarehouseforintegrativebioinformatics
AT xutao atlasadatawarehouseforintegrativebioinformatics
AT yuenmacairems atlasadatawarehouseforintegrativebioinformatics
AT lingjohn atlasadatawarehouseforintegrativebioinformatics
AT ouellettebffrancis atlasadatawarehouseforintegrativebioinformatics