Cargando…

Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets

BACKGROUND: Phylogenetic analysis can be used to divide a protein family into subfamilies in the absence of experimental information. Most phylogenetic analysis methods utilize multiple alignment of sequences and are based on an evolutionary model. However, multiple alignment is not an automated pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Albayrak, Aydin, Otu, Hasan H, Sezerman, Ugur O
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2936399/
https://www.ncbi.nlm.nih.gov/pubmed/20718947
http://dx.doi.org/10.1186/1471-2105-11-428
_version_ 1782186486764929024
author Albayrak, Aydin
Otu, Hasan H
Sezerman, Ugur O
author_facet Albayrak, Aydin
Otu, Hasan H
Sezerman, Ugur O
author_sort Albayrak, Aydin
collection PubMed
description BACKGROUND: Phylogenetic analysis can be used to divide a protein family into subfamilies in the absence of experimental information. Most phylogenetic analysis methods utilize multiple alignment of sequences and are based on an evolutionary model. However, multiple alignment is not an automated procedure and requires human intervention to maintain alignment integrity and to produce phylogenies consistent with the functional splits in underlying sequences. To address this problem, we propose to use the alignment-free Relative Complexity Measure (RCM) combined with reduced amino acid alphabets to cluster protein families into functional subtypes purely on sequence criteria. Comparison with an alignment-based approach was also carried out to test the quality of the clustering. RESULTS: We demonstrate the robustness of RCM with reduced alphabets in clustering of protein sequences into families in a simulated dataset and seven well-characterized protein datasets. On protein datasets, crotonases, mandelate racemases, nucleotidyl cyclases and glycoside hydrolase family 2 were clustered into subfamilies with 100% accuracy whereas acyl transferase domains, haloacid dehalogenases, and vicinal oxygen chelates could be assigned to subfamilies with 97.2%, 96.9% and 92.2% accuracies, respectively. CONCLUSIONS: The overall combination of methods in this paper is useful for clustering protein families into subtypes based on solely protein sequence information. The method is also flexible and computationally fast because it does not require multiple alignment of sequences.
format Text
id pubmed-2936399
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29363992011-07-08 Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets Albayrak, Aydin Otu, Hasan H Sezerman, Ugur O BMC Bioinformatics Methodology Article BACKGROUND: Phylogenetic analysis can be used to divide a protein family into subfamilies in the absence of experimental information. Most phylogenetic analysis methods utilize multiple alignment of sequences and are based on an evolutionary model. However, multiple alignment is not an automated procedure and requires human intervention to maintain alignment integrity and to produce phylogenies consistent with the functional splits in underlying sequences. To address this problem, we propose to use the alignment-free Relative Complexity Measure (RCM) combined with reduced amino acid alphabets to cluster protein families into functional subtypes purely on sequence criteria. Comparison with an alignment-based approach was also carried out to test the quality of the clustering. RESULTS: We demonstrate the robustness of RCM with reduced alphabets in clustering of protein sequences into families in a simulated dataset and seven well-characterized protein datasets. On protein datasets, crotonases, mandelate racemases, nucleotidyl cyclases and glycoside hydrolase family 2 were clustered into subfamilies with 100% accuracy whereas acyl transferase domains, haloacid dehalogenases, and vicinal oxygen chelates could be assigned to subfamilies with 97.2%, 96.9% and 92.2% accuracies, respectively. CONCLUSIONS: The overall combination of methods in this paper is useful for clustering protein families into subtypes based on solely protein sequence information. The method is also flexible and computationally fast because it does not require multiple alignment of sequences. BioMed Central 2010-08-18 /pmc/articles/PMC2936399/ /pubmed/20718947 http://dx.doi.org/10.1186/1471-2105-11-428 Text en Copyright ©2010 Albayrak et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Albayrak, Aydin
Otu, Hasan H
Sezerman, Ugur O
Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets
title Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets
title_full Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets
title_fullStr Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets
title_full_unstemmed Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets
title_short Clustering of protein families into functional subtypes using Relative Complexity Measure with reduced amino acid alphabets
title_sort clustering of protein families into functional subtypes using relative complexity measure with reduced amino acid alphabets
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2936399/
https://www.ncbi.nlm.nih.gov/pubmed/20718947
http://dx.doi.org/10.1186/1471-2105-11-428
work_keys_str_mv AT albayrakaydin clusteringofproteinfamiliesintofunctionalsubtypesusingrelativecomplexitymeasurewithreducedaminoacidalphabets
AT otuhasanh clusteringofproteinfamiliesintofunctionalsubtypesusingrelativecomplexitymeasurewithreducedaminoacidalphabets
AT sezermanuguro clusteringofproteinfamiliesintofunctionalsubtypesusingrelativecomplexitymeasurewithreducedaminoacidalphabets