Cargando…

Predicting protein linkages in bacteria: Which method is best depends on task

BACKGROUND: Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phy...

Descripción completa

Detalles Bibliográficos
Autores principales: Karimpour-Fard, Anis, Leach, Sonia M, Gill, Ryan T, Hunter, Lawrence E
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2570368/
https://www.ncbi.nlm.nih.gov/pubmed/18816389
http://dx.doi.org/10.1186/1471-2105-9-397
_version_ 1782160117577285632
author Karimpour-Fard, Anis
Leach, Sonia M
Gill, Ryan T
Hunter, Lawrence E
author_facet Karimpour-Fard, Anis
Leach, Sonia M
Gill, Ryan T
Hunter, Lawrence E
author_sort Karimpour-Fard, Anis
collection PubMed
description BACKGROUND: Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations. RESULTS: Using Escherichia coli K12 and Bacillus subtilis, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in E. coli K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in E. coli K12 and 88% (333/418)in B. subtilis. Comparing two versions of the E. coli K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction. CONCLUSION: A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task.
format Text
id pubmed-2570368
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-25703682008-10-21 Predicting protein linkages in bacteria: Which method is best depends on task Karimpour-Fard, Anis Leach, Sonia M Gill, Ryan T Hunter, Lawrence E BMC Bioinformatics Research Article BACKGROUND: Applications of computational methods for predicting protein functional linkages are increasing. In recent years, several bacteria-specific methods for predicting linkages have been developed. The four major genomic context methods are: Gene cluster, Gene neighbor, Rosetta Stone, and Phylogenetic profiles. These methods have been shown to be powerful tools and this paper provides guidelines for when each method is appropriate by exploring different features of each method and potential improvements offered by their combination. We also review many previous treatments of these prediction methods, use the latest available annotations, and offer a number of new observations. RESULTS: Using Escherichia coli K12 and Bacillus subtilis, linkage predictions made by each of these methods were evaluated against three benchmarks: functional categories defined by COG and KEGG, known pathways listed in EcoCyc, and known operons listed in RegulonDB. Each evaluated method had strengths and weaknesses, with no one method dominating all aspects of predictive ability studied. For functional categories, as previous studies have shown, the Rosetta Stone method was individually best at detecting linkages and predicting functions among proteins with shared KEGG categories while the Phylogenetic profile method was best for linkage detection and function prediction among proteins with common COG functions. Differences in performance under COG versus KEGG may be attributable to the presence of paralogs. Better function prediction was observed when using a weighted combination of linkages based on reliability versus using a simple unweighted union of the linkage sets. For pathway reconstruction, 99 complete metabolic pathways in E. coli K12 (out of the 209 known, non-trivial pathways) and 193 pathways with 50% of their proteins were covered by linkages from at least one method. Gene neighbor was most effective individually on pathway reconstruction, with 48 complete pathways reconstructed. For operon prediction, Gene cluster predicted completely 59% of the known operons in E. coli K12 and 88% (333/418)in B. subtilis. Comparing two versions of the E. coli K12 operon database, many of the unannotated predictions in the earlier version were updated to true predictions in the later version. Using only linkages found by both Gene Cluster and Gene Neighbor improved the precision of operon predictions. Additionally, as previous studies have shown, combining features based on intergenic region and protein function improved the specificity of operon prediction. CONCLUSION: A common problem for computational methods is the generation of a large number of false positives that might be caused by an incomplete source of validation. By comparing two versions of a database, we demonstrated the dramatic differences on reported results. We used several benchmarks on which we have shown the comparative effectiveness of each prediction method, as well as provided guidelines as to which method is most appropriate for a given prediction task. BioMed Central 2008-09-24 /pmc/articles/PMC2570368/ /pubmed/18816389 http://dx.doi.org/10.1186/1471-2105-9-397 Text en Copyright © 2008 Karimpour-Fard et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Karimpour-Fard, Anis
Leach, Sonia M
Gill, Ryan T
Hunter, Lawrence E
Predicting protein linkages in bacteria: Which method is best depends on task
title Predicting protein linkages in bacteria: Which method is best depends on task
title_full Predicting protein linkages in bacteria: Which method is best depends on task
title_fullStr Predicting protein linkages in bacteria: Which method is best depends on task
title_full_unstemmed Predicting protein linkages in bacteria: Which method is best depends on task
title_short Predicting protein linkages in bacteria: Which method is best depends on task
title_sort predicting protein linkages in bacteria: which method is best depends on task
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2570368/
https://www.ncbi.nlm.nih.gov/pubmed/18816389
http://dx.doi.org/10.1186/1471-2105-9-397
work_keys_str_mv AT karimpourfardanis predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask
AT leachsoniam predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask
AT gillryant predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask
AT hunterlawrencee predictingproteinlinkagesinbacteriawhichmethodisbestdependsontask