Cargando…

Clustering of protein domains for functional and evolutionary studies

BACKGROUND: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test...

Descripción completa

Detalles Bibliográficos
Autores principales: Goldstein, Pavle, Zucko, Jurica, Vujaklija, Dušica, Kriško, Anita, Hranueli, Daslav, Long, Paul F, Etchebest, Catherine, Basrak, Bojan, Cullum, John
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2770074/
https://www.ncbi.nlm.nih.gov/pubmed/19832975
http://dx.doi.org/10.1186/1471-2105-10-335
_version_ 1782173625583927296
author Goldstein, Pavle
Zucko, Jurica
Vujaklija, Dušica
Kriško, Anita
Hranueli, Daslav
Long, Paul F
Etchebest, Catherine
Basrak, Bojan
Cullum, John
author_facet Goldstein, Pavle
Zucko, Jurica
Vujaklija, Dušica
Kriško, Anita
Hranueli, Daslav
Long, Paul F
Etchebest, Catherine
Basrak, Bojan
Cullum, John
author_sort Goldstein, Pavle
collection PubMed
description BACKGROUND: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. RESULTS: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. CONCLUSION: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score.
format Text
id pubmed-2770074
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27700742009-10-29 Clustering of protein domains for functional and evolutionary studies Goldstein, Pavle Zucko, Jurica Vujaklija, Dušica Kriško, Anita Hranueli, Daslav Long, Paul F Etchebest, Catherine Basrak, Bojan Cullum, John BMC Bioinformatics Research Article BACKGROUND: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. RESULTS: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. CONCLUSION: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score. BioMed Central 2009-10-15 /pmc/articles/PMC2770074/ /pubmed/19832975 http://dx.doi.org/10.1186/1471-2105-10-335 Text en Copyright © 2009 Goldstein et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Goldstein, Pavle
Zucko, Jurica
Vujaklija, Dušica
Kriško, Anita
Hranueli, Daslav
Long, Paul F
Etchebest, Catherine
Basrak, Bojan
Cullum, John
Clustering of protein domains for functional and evolutionary studies
title Clustering of protein domains for functional and evolutionary studies
title_full Clustering of protein domains for functional and evolutionary studies
title_fullStr Clustering of protein domains for functional and evolutionary studies
title_full_unstemmed Clustering of protein domains for functional and evolutionary studies
title_short Clustering of protein domains for functional and evolutionary studies
title_sort clustering of protein domains for functional and evolutionary studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2770074/
https://www.ncbi.nlm.nih.gov/pubmed/19832975
http://dx.doi.org/10.1186/1471-2105-10-335
work_keys_str_mv AT goldsteinpavle clusteringofproteindomainsforfunctionalandevolutionarystudies
AT zuckojurica clusteringofproteindomainsforfunctionalandevolutionarystudies
AT vujaklijadusica clusteringofproteindomainsforfunctionalandevolutionarystudies
AT kriskoanita clusteringofproteindomainsforfunctionalandevolutionarystudies
AT hranuelidaslav clusteringofproteindomainsforfunctionalandevolutionarystudies
AT longpaulf clusteringofproteindomainsforfunctionalandevolutionarystudies
AT etchebestcatherine clusteringofproteindomainsforfunctionalandevolutionarystudies
AT basrakbojan clusteringofproteindomainsforfunctionalandevolutionarystudies
AT cullumjohn clusteringofproteindomainsforfunctionalandevolutionarystudies