Cargando…

Using Phylogeny to Improve Genome-Wide Distant Homology Recognition

The gap between the number of known protein sequences and structures continues to widen, particularly as a result of sequencing projects for entire genomes. Recently there have been many attempts to generate structural assignments to all genes on sets of completed genomes using fold-recognition meth...

Descripción completa

Detalles Bibliográficos
Autores principales: Abeln, Sanne, Teubner, Carlo, Deane, Charlotte M
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1779300/
https://www.ncbi.nlm.nih.gov/pubmed/17238281
http://dx.doi.org/10.1371/journal.pcbi.0030003
_version_ 1782131747927883776
author Abeln, Sanne
Teubner, Carlo
Deane, Charlotte M
author_facet Abeln, Sanne
Teubner, Carlo
Deane, Charlotte M
author_sort Abeln, Sanne
collection PubMed
description The gap between the number of known protein sequences and structures continues to widen, particularly as a result of sequencing projects for entire genomes. Recently there have been many attempts to generate structural assignments to all genes on sets of completed genomes using fold-recognition methods. We developed a method that detects false positives made by these genome-wide structural assignment experiments by identifying isolated occurrences. The method was tested using two sets of assignments, generated by SUPERFAMILY and PSI-BLAST, on 150 completed genomes. A phylogeny of these genomes was built and a parsimony algorithm was used to identify isolated occurrences by detecting occurrences that cause a gain at leaf level. Isolated occurrences tend to have high e-values, and in both sets of assignments, a sudden increase in isolated occurrences is observed for e-values >10(−8) for SUPERFAMILY and >10(−4) for PSI-BLAST. Conditions to predict false positives are based on these results. Independent tests confirm that the predicted false positives are indeed more likely to be incorrectly assigned. Evaluation of the predicted false positives also showed that the accuracy of profile-based fold-recognition methods might depend on secondary structure content and sequence length. We show that false positives generated by fold-recognition methods can be identified by considering structural occurrence patterns on completed genomes; occurrences that are isolated within the phylogeny tend to be less reliable. The method provides a new independent way to examine the quality of fold assignments and may be used to improve the output of any genome-wide fold assignment method.
format Text
id pubmed-1779300
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-17793002007-01-27 Using Phylogeny to Improve Genome-Wide Distant Homology Recognition Abeln, Sanne Teubner, Carlo Deane, Charlotte M PLoS Comput Biol Research Article The gap between the number of known protein sequences and structures continues to widen, particularly as a result of sequencing projects for entire genomes. Recently there have been many attempts to generate structural assignments to all genes on sets of completed genomes using fold-recognition methods. We developed a method that detects false positives made by these genome-wide structural assignment experiments by identifying isolated occurrences. The method was tested using two sets of assignments, generated by SUPERFAMILY and PSI-BLAST, on 150 completed genomes. A phylogeny of these genomes was built and a parsimony algorithm was used to identify isolated occurrences by detecting occurrences that cause a gain at leaf level. Isolated occurrences tend to have high e-values, and in both sets of assignments, a sudden increase in isolated occurrences is observed for e-values >10(−8) for SUPERFAMILY and >10(−4) for PSI-BLAST. Conditions to predict false positives are based on these results. Independent tests confirm that the predicted false positives are indeed more likely to be incorrectly assigned. Evaluation of the predicted false positives also showed that the accuracy of profile-based fold-recognition methods might depend on secondary structure content and sequence length. We show that false positives generated by fold-recognition methods can be identified by considering structural occurrence patterns on completed genomes; occurrences that are isolated within the phylogeny tend to be less reliable. The method provides a new independent way to examine the quality of fold assignments and may be used to improve the output of any genome-wide fold assignment method. Public Library of Science 2007-01 2007-01-19 /pmc/articles/PMC1779300/ /pubmed/17238281 http://dx.doi.org/10.1371/journal.pcbi.0030003 Text en © 2007 Abeln et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Abeln, Sanne
Teubner, Carlo
Deane, Charlotte M
Using Phylogeny to Improve Genome-Wide Distant Homology Recognition
title Using Phylogeny to Improve Genome-Wide Distant Homology Recognition
title_full Using Phylogeny to Improve Genome-Wide Distant Homology Recognition
title_fullStr Using Phylogeny to Improve Genome-Wide Distant Homology Recognition
title_full_unstemmed Using Phylogeny to Improve Genome-Wide Distant Homology Recognition
title_short Using Phylogeny to Improve Genome-Wide Distant Homology Recognition
title_sort using phylogeny to improve genome-wide distant homology recognition
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1779300/
https://www.ncbi.nlm.nih.gov/pubmed/17238281
http://dx.doi.org/10.1371/journal.pcbi.0030003
work_keys_str_mv AT abelnsanne usingphylogenytoimprovegenomewidedistanthomologyrecognition
AT teubnercarlo usingphylogenytoimprovegenomewidedistanthomologyrecognition
AT deanecharlottem usingphylogenytoimprovegenomewidedistanthomologyrecognition