Cargando…

The distance-profile representation and its application to detection of distantly related protein families

BACKGROUND: Detecting homology between remotely related protein families is an important problem in computational biology since the biological properties of uncharacterized proteins can often be inferred from those of homologous proteins. Many existing approaches address this problem by measuring th...

Descripción completa

Detalles Bibliográficos
Autores principales: Ku, Chin-Jen, Yona, Golan
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1345692/
https://www.ncbi.nlm.nih.gov/pubmed/16316461
http://dx.doi.org/10.1186/1471-2105-6-282
_version_ 1782126592586153984
author Ku, Chin-Jen
Yona, Golan
author_facet Ku, Chin-Jen
Yona, Golan
author_sort Ku, Chin-Jen
collection PubMed
description BACKGROUND: Detecting homology between remotely related protein families is an important problem in computational biology since the biological properties of uncharacterized proteins can often be inferred from those of homologous proteins. Many existing approaches address this problem by measuring the similarity between proteins through sequence or structural alignment. However, these methods do not exploit collective aspects of the protein space and the computed scores are often noisy and frequently fail to recognize distantly related protein families. RESULTS: We describe an algorithm that improves over the state of the art in homology detection by utilizing global information on the proximity of entities in the protein space. Our method relies on a vectorial representation of proteins and protein families and uses structure-specific association measures between proteins and template structures to form a high-dimensional feature vector for each query protein. These vectors are then processed and transformed to sparse feature vectors that are treated as statistical fingerprints of the query proteins. The new representation induces a new metric between proteins measured by the statistical difference between their corresponding probability distributions. CONCLUSION: Using several performance measures we show that the new tool considerably improves the performance in recognizing distant homologies compared to existing approaches such as PSIBLAST and FUGUE.
format Text
id pubmed-1345692
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-13456922006-01-30 The distance-profile representation and its application to detection of distantly related protein families Ku, Chin-Jen Yona, Golan BMC Bioinformatics Research Article BACKGROUND: Detecting homology between remotely related protein families is an important problem in computational biology since the biological properties of uncharacterized proteins can often be inferred from those of homologous proteins. Many existing approaches address this problem by measuring the similarity between proteins through sequence or structural alignment. However, these methods do not exploit collective aspects of the protein space and the computed scores are often noisy and frequently fail to recognize distantly related protein families. RESULTS: We describe an algorithm that improves over the state of the art in homology detection by utilizing global information on the proximity of entities in the protein space. Our method relies on a vectorial representation of proteins and protein families and uses structure-specific association measures between proteins and template structures to form a high-dimensional feature vector for each query protein. These vectors are then processed and transformed to sparse feature vectors that are treated as statistical fingerprints of the query proteins. The new representation induces a new metric between proteins measured by the statistical difference between their corresponding probability distributions. CONCLUSION: Using several performance measures we show that the new tool considerably improves the performance in recognizing distant homologies compared to existing approaches such as PSIBLAST and FUGUE. BioMed Central 2005-11-29 /pmc/articles/PMC1345692/ /pubmed/16316461 http://dx.doi.org/10.1186/1471-2105-6-282 Text en Copyright © 2005 Ku and Yona; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ku, Chin-Jen
Yona, Golan
The distance-profile representation and its application to detection of distantly related protein families
title The distance-profile representation and its application to detection of distantly related protein families
title_full The distance-profile representation and its application to detection of distantly related protein families
title_fullStr The distance-profile representation and its application to detection of distantly related protein families
title_full_unstemmed The distance-profile representation and its application to detection of distantly related protein families
title_short The distance-profile representation and its application to detection of distantly related protein families
title_sort distance-profile representation and its application to detection of distantly related protein families
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1345692/
https://www.ncbi.nlm.nih.gov/pubmed/16316461
http://dx.doi.org/10.1186/1471-2105-6-282
work_keys_str_mv AT kuchinjen thedistanceprofilerepresentationanditsapplicationtodetectionofdistantlyrelatedproteinfamilies
AT yonagolan thedistanceprofilerepresentationanditsapplicationtodetectionofdistantlyrelatedproteinfamilies
AT kuchinjen distanceprofilerepresentationanditsapplicationtodetectionofdistantlyrelatedproteinfamilies
AT yonagolan distanceprofilerepresentationanditsapplicationtodetectionofdistantlyrelatedproteinfamilies