Cargando…

The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation

BACKGROUND: Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yu, Chenggang, Zavaljevski, Nela, Desai, Valmik, Johnson, Seth, Stevens, Fred J, Reifman, Jaques
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2259298/ https://www.ncbi.nlm.nih.gov/pubmed/18221520 http://dx.doi.org/10.1186/1471-2105-9-52

_version_	1782151367385677824
author	Yu, Chenggang Zavaljevski, Nela Desai, Valmik Johnson, Seth Stevens, Fred J Reifman, Jaques
author_facet	Yu, Chenggang Zavaljevski, Nela Desai, Valmik Johnson, Seth Stevens, Fred J Reifman, Jaques
author_sort	Yu, Chenggang
collection	PubMed
description	BACKGROUND: Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, integrated systems usually do not provide mechanisms to generate customized databases to predict particular protein functions. Here, we describe a tool termed PIPA (Pipeline for Protein Annotation) that has these capabilities. RESULTS: PIPA annotates protein functions by combining the results of multiple programs and databases, such as InterPro and the Conserved Domains Database, into common Gene Ontology (GO) terms. The major algorithms implemented in PIPA are: (1) a profile database generation algorithm, which generates customized profile databases to predict particular protein functions, (2) an automated ontology mapping generation algorithm, which maps various classification schemes into GO, and (3) a consensus algorithm to reconcile annotations from the integrated programs and databases. PIPA's profile generation algorithm is employed to construct the enzyme profile database CatFam, which predicts catalytic functions described by Enzyme Commission (EC) numbers. Validation tests show that CatFam yields average recall and precision larger than 95.0%. CatFam is integrated with PIPA. We use an association rule mining algorithm to automatically generate mappings between terms of two ontologies from annotated sample proteins. Incorporating the ontologies' hierarchical topology into the algorithm increases the number of generated mappings. In particular, it generates 40.0% additional mappings from the Clusters of Orthologous Groups (COG) to EC numbers and a six-fold increase in mappings from COG to GO terms. The mappings to EC numbers show a very high precision (99.8%) and recall (96.6%), while the mappings to GO terms show moderate precision (80.0%) and low recall (33.0%). Our consensus algorithm for GO annotation is based on the computation and propagation of likelihood scores associated with GO terms. The test results suggest that, for a given recall, the application of the consensus algorithm yields higher precision than when consensus is not used. CONCLUSION: The algorithms implemented in PIPA provide automated genome-wide protein function annotation based on reconciled predictions from multiple resources.
format	Text
id	pubmed-2259298
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-22592982008-03-04 The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation Yu, Chenggang Zavaljevski, Nela Desai, Valmik Johnson, Seth Stevens, Fred J Reifman, Jaques BMC Bioinformatics Methodology Article BACKGROUND: Automated protein function prediction methods are needed to keep pace with high-throughput sequencing. With the existence of many programs and databases for inferring different protein functions, a pipeline that properly integrates these resources will benefit from the advantages of each method. However, integrated systems usually do not provide mechanisms to generate customized databases to predict particular protein functions. Here, we describe a tool termed PIPA (Pipeline for Protein Annotation) that has these capabilities. RESULTS: PIPA annotates protein functions by combining the results of multiple programs and databases, such as InterPro and the Conserved Domains Database, into common Gene Ontology (GO) terms. The major algorithms implemented in PIPA are: (1) a profile database generation algorithm, which generates customized profile databases to predict particular protein functions, (2) an automated ontology mapping generation algorithm, which maps various classification schemes into GO, and (3) a consensus algorithm to reconcile annotations from the integrated programs and databases. PIPA's profile generation algorithm is employed to construct the enzyme profile database CatFam, which predicts catalytic functions described by Enzyme Commission (EC) numbers. Validation tests show that CatFam yields average recall and precision larger than 95.0%. CatFam is integrated with PIPA. We use an association rule mining algorithm to automatically generate mappings between terms of two ontologies from annotated sample proteins. Incorporating the ontologies' hierarchical topology into the algorithm increases the number of generated mappings. In particular, it generates 40.0% additional mappings from the Clusters of Orthologous Groups (COG) to EC numbers and a six-fold increase in mappings from COG to GO terms. The mappings to EC numbers show a very high precision (99.8%) and recall (96.6%), while the mappings to GO terms show moderate precision (80.0%) and low recall (33.0%). Our consensus algorithm for GO annotation is based on the computation and propagation of likelihood scores associated with GO terms. The test results suggest that, for a given recall, the application of the consensus algorithm yields higher precision than when consensus is not used. CONCLUSION: The algorithms implemented in PIPA provide automated genome-wide protein function annotation based on reconciled predictions from multiple resources. BioMed Central 2008-01-25 /pmc/articles/PMC2259298/ /pubmed/18221520 http://dx.doi.org/10.1186/1471-2105-9-52 Text en Copyright © 2008 Yu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Yu, Chenggang Zavaljevski, Nela Desai, Valmik Johnson, Seth Stevens, Fred J Reifman, Jaques The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation
title	The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation
title_full	The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation
title_fullStr	The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation
title_full_unstemmed	The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation
title_short	The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation
title_sort	development of pipa: an integrated and automated pipeline for genome-wide protein function annotation
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2259298/ https://www.ncbi.nlm.nih.gov/pubmed/18221520 http://dx.doi.org/10.1186/1471-2105-9-52
work_keys_str_mv	AT yuchenggang thedevelopmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation AT zavaljevskinela thedevelopmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation AT desaivalmik thedevelopmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation AT johnsonseth thedevelopmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation AT stevensfredj thedevelopmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation AT reifmanjaques thedevelopmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation AT yuchenggang developmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation AT zavaljevskinela developmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation AT desaivalmik developmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation AT johnsonseth developmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation AT stevensfredj developmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation AT reifmanjaques developmentofpipaanintegratedandautomatedpipelineforgenomewideproteinfunctionannotation

The development of PIPA: an integrated and automated pipeline for genome-wide protein function annotation

Ejemplares similares