Cargando…

On single and multiple models of protein families for the detection of remote sequence relationships

BACKGROUND: The detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation. However in many cases these relationships can not be identified easily from direct comparison of the two seque...

Descripción completa

Detalles Bibliográficos
Autores principales: Casbon, James A, Saqi, Mansoor AS
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1397874/
https://www.ncbi.nlm.nih.gov/pubmed/16448555
http://dx.doi.org/10.1186/1471-2105-7-48
_version_ 1782126983593852928
author Casbon, James A
Saqi, Mansoor AS
author_facet Casbon, James A
Saqi, Mansoor AS
author_sort Casbon, James A
collection PubMed
description BACKGROUND: The detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation. However in many cases these relationships can not be identified easily from direct comparison of the two sequences. Methods which compare sequence profiles have been shown to improve the detection of these remote sequence relationships. However, the best method for building a profile of a known set of sequences has not been established. Here we examine how the type of profile built affects its performance, both in detecting remote homologs and in the resulting alignment accuracy. In particular, we consider whether it is better to model a protein superfamily using a single structure-based alignment that is representative of all known cases of the superfamily, or to use multiple sequence-based profiles each representing an individual member of the superfamily. RESULTS: Using profile-profile methods for remote homolog detection we benchmark the performance of single structure-based superfamily models and multiple domain models. On average, over all superfamilies, using a truncated receiver operator characteristic (ROC(5)) we find that multiple domain models outperform single superfamily models, except at low error rates where the two models behave in a similar way. However there is a wide range of performance depending on the superfamily. For 12% of all superfamilies the ROC(5 )value for superfamily models is greater than 0.2 above the domain models and for 10% of superfamilies the domain models show a similar improvement in performance over the superfamily models. CONCLUSION: Using a sensitive profile-profile method we have investigated the performance of single structure-based models and multiple sequence models (domain models) in detecting remote superfamily members. We find that overall, multiple models perform better in recognition although single structure-based models display better alignment accuracy.
format Text
id pubmed-1397874
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-13978742006-04-21 On single and multiple models of protein families for the detection of remote sequence relationships Casbon, James A Saqi, Mansoor AS BMC Bioinformatics Methodology Article BACKGROUND: The detection of relationships between a protein sequence of unknown function and a sequence whose function has been characterised enables the transfer of functional annotation. However in many cases these relationships can not be identified easily from direct comparison of the two sequences. Methods which compare sequence profiles have been shown to improve the detection of these remote sequence relationships. However, the best method for building a profile of a known set of sequences has not been established. Here we examine how the type of profile built affects its performance, both in detecting remote homologs and in the resulting alignment accuracy. In particular, we consider whether it is better to model a protein superfamily using a single structure-based alignment that is representative of all known cases of the superfamily, or to use multiple sequence-based profiles each representing an individual member of the superfamily. RESULTS: Using profile-profile methods for remote homolog detection we benchmark the performance of single structure-based superfamily models and multiple domain models. On average, over all superfamilies, using a truncated receiver operator characteristic (ROC(5)) we find that multiple domain models outperform single superfamily models, except at low error rates where the two models behave in a similar way. However there is a wide range of performance depending on the superfamily. For 12% of all superfamilies the ROC(5 )value for superfamily models is greater than 0.2 above the domain models and for 10% of superfamilies the domain models show a similar improvement in performance over the superfamily models. CONCLUSION: Using a sensitive profile-profile method we have investigated the performance of single structure-based models and multiple sequence models (domain models) in detecting remote superfamily members. We find that overall, multiple models perform better in recognition although single structure-based models display better alignment accuracy. BioMed Central 2006-01-31 /pmc/articles/PMC1397874/ /pubmed/16448555 http://dx.doi.org/10.1186/1471-2105-7-48 Text en Copyright © 2006 Casbon and Saqi; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Casbon, James A
Saqi, Mansoor AS
On single and multiple models of protein families for the detection of remote sequence relationships
title On single and multiple models of protein families for the detection of remote sequence relationships
title_full On single and multiple models of protein families for the detection of remote sequence relationships
title_fullStr On single and multiple models of protein families for the detection of remote sequence relationships
title_full_unstemmed On single and multiple models of protein families for the detection of remote sequence relationships
title_short On single and multiple models of protein families for the detection of remote sequence relationships
title_sort on single and multiple models of protein families for the detection of remote sequence relationships
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1397874/
https://www.ncbi.nlm.nih.gov/pubmed/16448555
http://dx.doi.org/10.1186/1471-2105-7-48
work_keys_str_mv AT casbonjamesa onsingleandmultiplemodelsofproteinfamiliesforthedetectionofremotesequencerelationships
AT saqimansooras onsingleandmultiplemodelsofproteinfamiliesforthedetectionofremotesequencerelationships