Cargando…

Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment

BACKGROUND: A widely-used approach for discovering functional and physical interactions among proteins involves phylogenetic profile comparisons (PPCs). Here, proteins with similar profiles are inferred to be functionally related under the assumption that proteins involved in the same metabolic path...

Descripción completa

Detalles Bibliográficos
Autores principales: Jothi, Raja, Przytycka, Teresa M, Aravind, L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1904249/
https://www.ncbi.nlm.nih.gov/pubmed/17521444
http://dx.doi.org/10.1186/1471-2105-8-173
_version_ 1782133980507668480
author Jothi, Raja
Przytycka, Teresa M
Aravind, L
author_facet Jothi, Raja
Przytycka, Teresa M
Aravind, L
author_sort Jothi, Raja
collection PubMed
description BACKGROUND: A widely-used approach for discovering functional and physical interactions among proteins involves phylogenetic profile comparisons (PPCs). Here, proteins with similar profiles are inferred to be functionally related under the assumption that proteins involved in the same metabolic pathway or cellular system are likely to have been co-inherited during evolution. RESULTS: Our experimentation with E. coli and yeast proteins with 16 different carefully composed reference sets of genomes revealed that the phyletic patterns of proteins in prokaryotes alone could be adequate enough to make reasonably accurate functional linkage predictions. A slight improvement in performance is observed on adding few eukaryotes into the reference set, but a noticeable drop-off in performance is observed with increased number of eukaryotes. Inclusion of most parasitic, pathogenic or vertebrate genomes and multiple strains of the same species into the reference set do not necessarily contribute to an improved sensitivity or accuracy. Interestingly, we also found that evolutionary histories of individual pathways have a significant affect on the performance of the PPC approach with respect to a particular reference set. For example, to accurately predict functional links in carbohydrate or lipid metabolism, a reference set solely composed of prokaryotic (or bacterial) genomes performed among the best compared to one composed of genomes from all three super-kingdoms; this is in contrast to predicting functional links in translation for which a reference set composed of prokaryotic (or bacterial) genomes performed the worst. We also demonstrate that the widely used random null model to quantify the statistical significance of profile similarity is incomplete, which could result in an increased number of false-positives. CONCLUSION: Contrary to previous proposals, it is not merely the number of genomes but a careful selection of informative genomes in the reference set that influences the prediction accuracy of the PPC approach. We note that the predictive power of the PPC approach, especially in eukaryotes, is heavily influenced by the primary endosymbiosis and subsequent bacterial contributions. The over-representation of parasitic unicellular eukaryotes and vertebrates additionally make eukaryotes less useful in the reference sets. Reference sets composed of highly non-redundant set of genomes from all three super-kingdoms fare better with pathways showing considerable vertical inheritance and strong conservation (e.g. translation apparatus), while reference sets solely composed of prokaryotic genomes fare better for more variable pathways like carbohydrate metabolism. Differential performance of the PPC approach on various pathways, and a weak positive correlation between functional and profile similarities suggest that caution should be exercised while interpreting functional linkages inferred from genome-wide large-scale profile comparisons using a single reference set.
format Text
id pubmed-1904249
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19042492007-06-29 Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment Jothi, Raja Przytycka, Teresa M Aravind, L BMC Bioinformatics Research Article BACKGROUND: A widely-used approach for discovering functional and physical interactions among proteins involves phylogenetic profile comparisons (PPCs). Here, proteins with similar profiles are inferred to be functionally related under the assumption that proteins involved in the same metabolic pathway or cellular system are likely to have been co-inherited during evolution. RESULTS: Our experimentation with E. coli and yeast proteins with 16 different carefully composed reference sets of genomes revealed that the phyletic patterns of proteins in prokaryotes alone could be adequate enough to make reasonably accurate functional linkage predictions. A slight improvement in performance is observed on adding few eukaryotes into the reference set, but a noticeable drop-off in performance is observed with increased number of eukaryotes. Inclusion of most parasitic, pathogenic or vertebrate genomes and multiple strains of the same species into the reference set do not necessarily contribute to an improved sensitivity or accuracy. Interestingly, we also found that evolutionary histories of individual pathways have a significant affect on the performance of the PPC approach with respect to a particular reference set. For example, to accurately predict functional links in carbohydrate or lipid metabolism, a reference set solely composed of prokaryotic (or bacterial) genomes performed among the best compared to one composed of genomes from all three super-kingdoms; this is in contrast to predicting functional links in translation for which a reference set composed of prokaryotic (or bacterial) genomes performed the worst. We also demonstrate that the widely used random null model to quantify the statistical significance of profile similarity is incomplete, which could result in an increased number of false-positives. CONCLUSION: Contrary to previous proposals, it is not merely the number of genomes but a careful selection of informative genomes in the reference set that influences the prediction accuracy of the PPC approach. We note that the predictive power of the PPC approach, especially in eukaryotes, is heavily influenced by the primary endosymbiosis and subsequent bacterial contributions. The over-representation of parasitic unicellular eukaryotes and vertebrates additionally make eukaryotes less useful in the reference sets. Reference sets composed of highly non-redundant set of genomes from all three super-kingdoms fare better with pathways showing considerable vertical inheritance and strong conservation (e.g. translation apparatus), while reference sets solely composed of prokaryotic genomes fare better for more variable pathways like carbohydrate metabolism. Differential performance of the PPC approach on various pathways, and a weak positive correlation between functional and profile similarities suggest that caution should be exercised while interpreting functional linkages inferred from genome-wide large-scale profile comparisons using a single reference set. BioMed Central 2007-05-23 /pmc/articles/PMC1904249/ /pubmed/17521444 http://dx.doi.org/10.1186/1471-2105-8-173 Text en Copyright © 2007 Jothi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Jothi, Raja
Przytycka, Teresa M
Aravind, L
Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment
title Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment
title_full Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment
title_fullStr Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment
title_full_unstemmed Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment
title_short Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment
title_sort discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1904249/
https://www.ncbi.nlm.nih.gov/pubmed/17521444
http://dx.doi.org/10.1186/1471-2105-8-173
work_keys_str_mv AT jothiraja discoveringfunctionallinkagesanduncharacterizedcellularpathwaysusingphylogeneticprofilecomparisonsacomprehensiveassessment
AT przytyckateresam discoveringfunctionallinkagesanduncharacterizedcellularpathwaysusingphylogeneticprofilecomparisonsacomprehensiveassessment
AT aravindl discoveringfunctionallinkagesanduncharacterizedcellularpathwaysusingphylogeneticprofilecomparisonsacomprehensiveassessment