Cargando…

Predicting protein linkages in bacteria: Which method is best depends on task

BACKGROUND: Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phy...

Descripción completa

Detalles Bibliográficos
Autores principales:	Karimpour-Fard, Anis, Leach, Sonia M, Gill, Ryan T, Hunter, Lawrence E
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2570368/ https://www.ncbi.nlm.nih.gov/pubmed/18816389 http://dx.doi.org/10.1186/1471-2105-9-397

_version_	1782160117577285632
author	Karimpour-Fard, Anis Leach, Sonia M Gill, Ryan T Hunter, Lawrence E
author_facet	Karimpour-Fard, Anis Leach, Sonia M Gill, Ryan T Hunter, Lawrence E
author_sort	Karimpour-Fard, Anis
collection	PubMed
description	BACKGROUND: Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations. RESULTS: Using Escherichia coli K12 and Bacillus subtilis, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in E. coli K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in E. coli K12 and 88% (333/418)in B. subtilis. Comparing two versions of the E. coli K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction. CONCLUSION: A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.
format	Text
id	pubmed-2570368
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-25703682008-10-21 Predicting protein linkages in bacteria: Which method is best depends on task Karimpour-Fard, Anis Leach, Sonia M Gill, Ryan T Hunter, Lawrence E BMC Bioinformatics Research Article BACKGROUND: Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations. RESULTS: Using Escherichia coli K12 and Bacillus subtilis, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in E. coli K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in E. coli K12 and 88% (333/418)in B. subtilis. Comparing two versions of the E. coli K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction. CONCLUSION: A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task. BioMed Central 2008-09-24 /pmc/articles/PMC2570368/ /pubmed/18816389 http://dx.doi.org/10.1186/1471-2105-9-397 Text en Copyright © 2008 Karimpour-Fard et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Karimpour-Fard, Anis Leach, Sonia M Gill, Ryan T Hunter, Lawrence E Predicting protein linkages in bacteria: Which method is best depends on task
title	Predicting protein linkages in bacteria: Which method is best depends on task
title_full	Predicting protein linkages in bacteria: Which method is best depends on task
title_fullStr	Predicting protein linkages in bacteria: Which method is best depends on task
title_full_unstemmed	Predicting protein linkages in bacteria: Which method is best depends on task
title_short	Predicting protein linkages in bacteria: Which method is best depends on task
title_sort	predicting protein linkages in bacteria: which method is best depends on task
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2570368/ https://www.ncbi.nlm.nih.gov/pubmed/18816389 http://dx.doi.org/10.1186/1471-2105-9-397
work_keys_str_mv	AT karimpourfardanis predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask AT leachsoniam predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask AT gillryant predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask AT hunterlawrencee predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask

Predicting protein linkages in bacteria: Which method is best depends on task

Ejemplares similares