Cargando…
Clustering of protein domains for functional and evolutionary studies
BACKGROUND: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2770074/ https://www.ncbi.nlm.nih.gov/pubmed/19832975 http://dx.doi.org/10.1186/1471-2105-10-335 |
_version_ | 1782173625583927296 |
---|---|
author | Goldstein, Pavle Zucko, Jurica Vujaklija, Dušica Kriško, Anita Hranueli, Daslav Long, Paul F Etchebest, Catherine Basrak, Bojan Cullum, John |
author_facet | Goldstein, Pavle Zucko, Jurica Vujaklija, Dušica Kriško, Anita Hranueli, Daslav Long, Paul F Etchebest, Catherine Basrak, Bojan Cullum, John |
author_sort | Goldstein, Pavle |
collection | PubMed |
description | BACKGROUND: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. RESULTS: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. CONCLUSION: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score. |
format | Text |
id | pubmed-2770074 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-27700742009-10-29 Clustering of protein domains for functional and evolutionary studies Goldstein, Pavle Zucko, Jurica Vujaklija, Dušica Kriško, Anita Hranueli, Daslav Long, Paul F Etchebest, Catherine Basrak, Bojan Cullum, John BMC Bioinformatics Research Article BACKGROUND: The number of protein family members defined by DNA sequencing is usually much larger than those characterised experimentally. This paper describes a method to divide protein families into subtypes purely on sequence criteria. Comparison with experimental data allows an independent test of the quality of the clustering. RESULTS: An evolutionary split statistic is calculated for each column in a protein multiple sequence alignment; the statistic has a larger value when a column is better described by an evolutionary model that assumes clustering around two or more amino acids rather than a single amino acid. The user selects columns (typically the top ranked columns) to construct a motif. The motif is used to divide the family into subtypes using a stochastic optimization procedure related to the deterministic annealing EM algorithm (DAEM), which yields a specificity score showing how well each family member is assigned to a subtype. The clustering obtained is not strongly dependent on the number of amino acids chosen for the motif. The robustness of this method was demonstrated using six well characterized protein families: nucleotidyl cyclase, protein kinase, dehydrogenase, two polyketide synthase domains and small heat shock proteins. Phylogenetic trees did not allow accurate clustering for three of the six families. CONCLUSION: The method clustered the families into functional subtypes with an accuracy of 90 to 100%. False assignments usually had a low specificity score. BioMed Central 2009-10-15 /pmc/articles/PMC2770074/ /pubmed/19832975 http://dx.doi.org/10.1186/1471-2105-10-335 Text en Copyright © 2009 Goldstein et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Goldstein, Pavle Zucko, Jurica Vujaklija, Dušica Kriško, Anita Hranueli, Daslav Long, Paul F Etchebest, Catherine Basrak, Bojan Cullum, John Clustering of protein domains for functional and evolutionary studies |
title | Clustering of protein domains for functional and evolutionary studies |
title_full | Clustering of protein domains for functional and evolutionary studies |
title_fullStr | Clustering of protein domains for functional and evolutionary studies |
title_full_unstemmed | Clustering of protein domains for functional and evolutionary studies |
title_short | Clustering of protein domains for functional and evolutionary studies |
title_sort | clustering of protein domains for functional and evolutionary studies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2770074/ https://www.ncbi.nlm.nih.gov/pubmed/19832975 http://dx.doi.org/10.1186/1471-2105-10-335 |
work_keys_str_mv | AT goldsteinpavle clusteringofproteindomainsforfunctionalandevolutionarystudies AT zuckojurica clusteringofproteindomainsforfunctionalandevolutionarystudies AT vujaklijadusica clusteringofproteindomainsforfunctionalandevolutionarystudies AT kriskoanita clusteringofproteindomainsforfunctionalandevolutionarystudies AT hranuelidaslav clusteringofproteindomainsforfunctionalandevolutionarystudies AT longpaulf clusteringofproteindomainsforfunctionalandevolutionarystudies AT etchebestcatherine clusteringofproteindomainsforfunctionalandevolutionarystudies AT basrakbojan clusteringofproteindomainsforfunctionalandevolutionarystudies AT cullumjohn clusteringofproteindomainsforfunctionalandevolutionarystudies |