Cargando…

GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database

BACKGROUND: In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pannarale, Paolo, Catalano, Domenico, De Caro, Giorgio, Grillo, Giorgio, Leo, Pietro, Pappadà, Graziano, Rubino, Francesco, Scioscia, Gaetano, Licciulli, Flavio
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3303717/ https://www.ncbi.nlm.nih.gov/pubmed/22536971 http://dx.doi.org/10.1186/1471-2105-13-S4-S4

_version_	1782226777023709184
author	Pannarale, Paolo Catalano, Domenico De Caro, Giorgio Grillo, Giorgio Leo, Pietro Pappadà, Graziano Rubino, Francesco Scioscia, Gaetano Licciulli, Flavio
author_facet	Pannarale, Paolo Catalano, Domenico De Caro, Giorgio Grillo, Giorgio Leo, Pietro Pappadà, Graziano Rubino, Francesco Scioscia, Gaetano Licciulli, Flavio
author_sort	Pannarale, Paolo
collection	PubMed
description	BACKGROUND: In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database. METHODS: The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing. RESULTS AND CONCLUSIONS: Entries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS.
format	Online Article Text
id	pubmed-3303717
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-33037172012-03-16 GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database Pannarale, Paolo Catalano, Domenico De Caro, Giorgio Grillo, Giorgio Leo, Pietro Pappadà, Graziano Rubino, Francesco Scioscia, Gaetano Licciulli, Flavio BMC Bioinformatics Research BACKGROUND: In the scientific biodiversity community, it is increasingly perceived the need to build a bridge between molecular and traditional biodiversity studies. We believe that the information technology could have a preeminent role in integrating the information generated by these studies with the large amount of molecular data we can find in bioinformatics public databases. This work is primarily aimed at building a bioinformatic infrastructure for the integration of public and private biodiversity data through the development of GIDL, an Intelligent Data Loader coupled with the Molecular Biodiversity Database. The system presented here organizes in an ontological way and locally stores the sequence and annotation data contained in the GenBank primary database. METHODS: The GIDL architecture consists of a relational database and of an intelligent data loader software. The relational database schema is designed to manage biodiversity information (Molecular Biodiversity Database) and it is organized in four areas: MolecularData, Experiment, Collection and Taxonomy. The MolecularData area is inspired to an established standard in Generic Model Organism Databases, the Chado relational schema. The peculiarity of Chado, and also its strength, is the adoption of an ontological schema which makes use of the Sequence Ontology. The Intelligent Data Loader (IDL) component of GIDL is an Extract, Transform and Load software able to parse data, to discover hidden information in the GenBank entries and to populate the Molecular Biodiversity Database. The IDL is composed by three main modules: the Parser, able to parse GenBank flat files; the Reasoner, which automatically builds CLIPS facts mapping the biological knowledge expressed by the Sequence Ontology; the DBFiller, which translates the CLIPS facts into ordered SQL statements used to populate the database. In GIDL Semantic Web technologies have been adopted due to their advantages in data representation, integration and processing. RESULTS AND CONCLUSIONS: Entries coming from Virus (814,122), Plant (1,365,360) and Invertebrate (959,065) divisions of GenBank rel.180 have been loaded in the Molecular Biodiversity Database by GIDL. Our system, combining the Sequence Ontology and the Chado schema, allows a more powerful query expressiveness compared with the most commonly used sequence retrieval systems like Entrez or SRS. BioMed Central 2012-03-28 /pmc/articles/PMC3303717/ /pubmed/22536971 http://dx.doi.org/10.1186/1471-2105-13-S4-S4 Text en Copyright ©2012 Pannarale et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Pannarale, Paolo Catalano, Domenico De Caro, Giorgio Grillo, Giorgio Leo, Pietro Pappadà, Graziano Rubino, Francesco Scioscia, Gaetano Licciulli, Flavio GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database
title	GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database
title_full	GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database
title_fullStr	GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database
title_full_unstemmed	GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database
title_short	GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database
title_sort	gidl: a rule based expert system for genbank intelligent data loading into the molecular biodiversity database
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3303717/ https://www.ncbi.nlm.nih.gov/pubmed/22536971 http://dx.doi.org/10.1186/1471-2105-13-S4-S4
work_keys_str_mv	AT pannaralepaolo gidlarulebasedexpertsystemforgenbankintelligentdataloadingintothemolecularbiodiversitydatabase AT catalanodomenico gidlarulebasedexpertsystemforgenbankintelligentdataloadingintothemolecularbiodiversitydatabase AT decarogiorgio gidlarulebasedexpertsystemforgenbankintelligentdataloadingintothemolecularbiodiversitydatabase AT grillogiorgio gidlarulebasedexpertsystemforgenbankintelligentdataloadingintothemolecularbiodiversitydatabase AT leopietro gidlarulebasedexpertsystemforgenbankintelligentdataloadingintothemolecularbiodiversitydatabase AT pappadagraziano gidlarulebasedexpertsystemforgenbankintelligentdataloadingintothemolecularbiodiversitydatabase AT rubinofrancesco gidlarulebasedexpertsystemforgenbankintelligentdataloadingintothemolecularbiodiversitydatabase AT sciosciagaetano gidlarulebasedexpertsystemforgenbankintelligentdataloadingintothemolecularbiodiversitydatabase AT licciulliflavio gidlarulebasedexpertsystemforgenbankintelligentdataloadingintothemolecularbiodiversitydatabase

GIDL: a rule based expert system for GenBank Intelligent Data Loading into the Molecular Biodiversity database

Ejemplares similares