Cargando…

Top-Down Clustering for Protein Subfamily Identification

We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein...

Descripción completa

Detalles Bibliográficos
Autores principales:	Costa, Eduardo P., Vens, Celine, Blockeel, Hendrik
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Libertas Academica 2013
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3653887/ https://www.ncbi.nlm.nih.gov/pubmed/23700359 http://dx.doi.org/10.4137/EBO.S11609

Descripción
Sumario:	We propose a novel method for the task of protein subfamily identification; that is, finding subgroups of functionally closely related sequences within a protein family. In line with phylogenomic analysis, the method first builds a hierarchical tree using as input a multiple alignment of the protein sequences, then uses a post-pruning procedure to extract clusters from the tree. Differently from existing methods, it constructs the hierarchical tree top-down, rather than bottom-up and associates particular mutations with each division into subclusters. The motivating hypothesis for this method is that it may yield a better tree topology with more accurate subfamily identification as a result and additionally indicates functionally important sites and allows for easy classification of new proteins. A thorough experimental evaluation confirms the hypothesis. The novel method yields more accurate clusters and a better tree topology than the state-of-the-art method SCI-PHY, identifies known functional sites, and identifies mutations that alone allow for classifying new sequences with an accuracy approaching that of hidden Markov models.

Top-Down Clustering for Protein Subfamily Identification

Ejemplares similares