Cargando…

EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration

BACKGROUND: Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevan...

Descripción completa

Detalles Bibliográficos
Autores principales: Forment, Javier, Gilabert, Francisco, Robles, Antonio, Conejero, Vicente, Nuez, Fernando, Blanca, Jose M
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2258287/
https://www.ncbi.nlm.nih.gov/pubmed/18179701
http://dx.doi.org/10.1186/1471-2105-9-5
_version_ 1782151332580294656
author Forment, Javier
Gilabert, Francisco
Robles, Antonio
Conejero, Vicente
Nuez, Fernando
Blanca, Jose M
author_facet Forment, Javier
Gilabert, Francisco
Robles, Antonio
Conejero, Vicente
Nuez, Fernando
Blanca, Jose M
author_sort Forment, Javier
collection PubMed
description BACKGROUND: Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation. RESULTS: We have created EST2uni, an integrated, highly-configurable EST analysis pipeline and data mining software package that automates the pre-processing, clustering, annotation, database creation, and data mining of EST collections. The pipeline uses standard EST analysis tools and the software has a modular design to facilitate the addition of new analytical methods and their configuration. Currently implemented analyses include functional and structural annotation, SNP and microsatellite discovery, integration of previously known genetic marker data and gene expression results, and assistance in cDNA microarray design. It can be run in parallel in a PC cluster in order to reduce the time necessary for the analysis. It also creates a web site linked to the database, showing collection statistics, with complex query capabilities and tools for data mining and retrieval. CONCLUSION: The software package presented here provides an efficient and complete bioinformatics tool for the management of EST collections which is very easy to adapt to the local needs of different EST projects. The code is freely available under the GPL license and can be obtained at . This site also provides detailed instructions for installation and configuration of the software package. The code is under active development to incorporate new analyses, methods, and algorithms as they are released by the bioinformatics community.
format Text
id pubmed-2258287
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22582872008-02-29 EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration Forment, Javier Gilabert, Francisco Robles, Antonio Conejero, Vicente Nuez, Fernando Blanca, Jose M BMC Bioinformatics Software BACKGROUND: Expressed sequence tag (EST) collections are composed of a high number of single-pass, redundant, partial sequences, which need to be processed, clustered, and annotated to remove low-quality and vector regions, eliminate redundancy and sequencing errors, and provide biologically relevant information. In order to provide a suitable way of performing the different steps in the analysis of the ESTs, flexible computation pipelines adapted to the local needs of specific EST projects have to be developed. Furthermore, EST collections must be stored in highly structured relational databases available to researchers through user-friendly interfaces which allow efficient and complex data mining, thus offering maximum capabilities for their full exploitation. RESULTS: We have created EST2uni, an integrated, highly-configurable EST analysis pipeline and data mining software package that automates the pre-processing, clustering, annotation, database creation, and data mining of EST collections. The pipeline uses standard EST analysis tools and the software has a modular design to facilitate the addition of new analytical methods and their configuration. Currently implemented analyses include functional and structural annotation, SNP and microsatellite discovery, integration of previously known genetic marker data and gene expression results, and assistance in cDNA microarray design. It can be run in parallel in a PC cluster in order to reduce the time necessary for the analysis. It also creates a web site linked to the database, showing collection statistics, with complex query capabilities and tools for data mining and retrieval. CONCLUSION: The software package presented here provides an efficient and complete bioinformatics tool for the management of EST collections which is very easy to adapt to the local needs of different EST projects. The code is freely available under the GPL license and can be obtained at . This site also provides detailed instructions for installation and configuration of the software package. The code is under active development to incorporate new analyses, methods, and algorithms as they are released by the bioinformatics community. BioMed Central 2008-01-07 /pmc/articles/PMC2258287/ /pubmed/18179701 http://dx.doi.org/10.1186/1471-2105-9-5 Text en Copyright © 2008 Forment et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Forment, Javier
Gilabert, Francisco
Robles, Antonio
Conejero, Vicente
Nuez, Fernando
Blanca, Jose M
EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration
title EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration
title_full EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration
title_fullStr EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration
title_full_unstemmed EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration
title_short EST2uni: an open, parallel tool for automated EST analysis and database creation, with a data mining web interface and microarray expression data integration
title_sort est2uni: an open, parallel tool for automated est analysis and database creation, with a data mining web interface and microarray expression data integration
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2258287/
https://www.ncbi.nlm.nih.gov/pubmed/18179701
http://dx.doi.org/10.1186/1471-2105-9-5
work_keys_str_mv AT formentjavier est2unianopenparalleltoolforautomatedestanalysisanddatabasecreationwithadataminingwebinterfaceandmicroarrayexpressiondataintegration
AT gilabertfrancisco est2unianopenparalleltoolforautomatedestanalysisanddatabasecreationwithadataminingwebinterfaceandmicroarrayexpressiondataintegration
AT roblesantonio est2unianopenparalleltoolforautomatedestanalysisanddatabasecreationwithadataminingwebinterfaceandmicroarrayexpressiondataintegration
AT conejerovicente est2unianopenparalleltoolforautomatedestanalysisanddatabasecreationwithadataminingwebinterfaceandmicroarrayexpressiondataintegration
AT nuezfernando est2unianopenparalleltoolforautomatedestanalysisanddatabasecreationwithadataminingwebinterfaceandmicroarrayexpressiondataintegration
AT blancajosem est2unianopenparalleltoolforautomatedestanalysisanddatabasecreationwithadataminingwebinterfaceandmicroarrayexpressiondataintegration