Cargando…

Ranking and compacting binding segments of protein families using aligned pattern clusters

BACKGROUND: Discovering sequence patterns with variation can unveil functions of a protein family that are important for drug discovery. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search, called motif finding in Bi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lee, En-Shiun Annie, Wong, Andrew KC
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3907781/ https://www.ncbi.nlm.nih.gov/pubmed/24564874 http://dx.doi.org/10.1186/1477-5956-11-S1-S8

_version_	1782301651290292224
author	Lee, En-Shiun Annie Wong, Andrew KC
author_facet	Lee, En-Shiun Annie Wong, Andrew KC
author_sort	Lee, En-Shiun Annie
collection	PubMed
description	BACKGROUND: Discovering sequence patterns with variation can unveil functions of a protein family that are important for drug discovery. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search, called motif finding in Bioinformatics, is used. However, at present, combinatorial algorithms result in large sets of solutions, and probabilistic models require a richer representation of the amino acid associations. To overcome these shortcomings, we present a method for ranking and compacting these solutions in a new representation referred to as Aligned Pattern Clusters (APCs). To tackle the problem of a large solution set, our method reveals a reduced set of candidate solutions without losing any information. To address the problem of representation, our method captures the amino acid associations and conservations of the aligned patterns. Our algorithm renders a set of APCs in which a set of patterns is discovered, pruned, aligned, and synthesized from the input sequences of a protein family. RESULTS: Our algorithm identifies the binding or other functional segments and their embedded residues which are important drug targets from the cytochrome c and the ubiquitin protein families taken from Unitprot. The results are independently confirmed by pFam's multiple sequence alignment. For cytochrome c protein the number of resulting patterns with variations are reduced by 76.62% from the number of original patterns without variations. Furthermore, all of the top four candidate APCs correspond to the binding segments with one of each of their conserved amino acid as the binding residue. The discovered proximal APCs agree with pFam and PROSITE results. Surprisingly, the distal binding site discovered by our algorithm is not discovered by pFam nor PROSITE, but confirmed by the three-dimensional cytochrome c structure. When applied to the ubiquitin protein family, our results agree with pFam and reveals six of the seven Lysine binding residues as conserved aligned columns with entropy redundancy measure of 1.0. CONCLUSION: The discovery, ranking, reduction, and representation of a set of patterns is important to avert time-consuming and expensive simulations and experimentations during proteomic study and drug discovery.
format	Online Article Text
id	pubmed-3907781
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-39077812014-02-13 Ranking and compacting binding segments of protein families using aligned pattern clusters Lee, En-Shiun Annie Wong, Andrew KC Proteome Sci Research BACKGROUND: Discovering sequence patterns with variation can unveil functions of a protein family that are important for drug discovery. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search, called motif finding in Bioinformatics, is used. However, at present, combinatorial algorithms result in large sets of solutions, and probabilistic models require a richer representation of the amino acid associations. To overcome these shortcomings, we present a method for ranking and compacting these solutions in a new representation referred to as Aligned Pattern Clusters (APCs). To tackle the problem of a large solution set, our method reveals a reduced set of candidate solutions without losing any information. To address the problem of representation, our method captures the amino acid associations and conservations of the aligned patterns. Our algorithm renders a set of APCs in which a set of patterns is discovered, pruned, aligned, and synthesized from the input sequences of a protein family. RESULTS: Our algorithm identifies the binding or other functional segments and their embedded residues which are important drug targets from the cytochrome c and the ubiquitin protein families taken from Unitprot. The results are independently confirmed by pFam's multiple sequence alignment. For cytochrome c protein the number of resulting patterns with variations are reduced by 76.62% from the number of original patterns without variations. Furthermore, all of the top four candidate APCs correspond to the binding segments with one of each of their conserved amino acid as the binding residue. The discovered proximal APCs agree with pFam and PROSITE results. Surprisingly, the distal binding site discovered by our algorithm is not discovered by pFam nor PROSITE, but confirmed by the three-dimensional cytochrome c structure. When applied to the ubiquitin protein family, our results agree with pFam and reveals six of the seven Lysine binding residues as conserved aligned columns with entropy redundancy measure of 1.0. CONCLUSION: The discovery, ranking, reduction, and representation of a set of patterns is important to avert time-consuming and expensive simulations and experimentations during proteomic study and drug discovery. BioMed Central 2013-11-07 /pmc/articles/PMC3907781/ /pubmed/24564874 http://dx.doi.org/10.1186/1477-5956-11-S1-S8 Text en Copyright © 2013 Lee and Wong; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Lee, En-Shiun Annie Wong, Andrew KC Ranking and compacting binding segments of protein families using aligned pattern clusters
title	Ranking and compacting binding segments of protein families using aligned pattern clusters
title_full	Ranking and compacting binding segments of protein families using aligned pattern clusters
title_fullStr	Ranking and compacting binding segments of protein families using aligned pattern clusters
title_full_unstemmed	Ranking and compacting binding segments of protein families using aligned pattern clusters
title_short	Ranking and compacting binding segments of protein families using aligned pattern clusters
title_sort	ranking and compacting binding segments of protein families using aligned pattern clusters
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3907781/ https://www.ncbi.nlm.nih.gov/pubmed/24564874 http://dx.doi.org/10.1186/1477-5956-11-S1-S8
work_keys_str_mv	AT leeenshiunannie rankingandcompactingbindingsegmentsofproteinfamiliesusingalignedpatternclusters AT wongandrewkc rankingandcompactingbindingsegmentsofproteinfamiliesusingalignedpatternclusters

Ranking and compacting binding segments of protein families using aligned pattern clusters

Ejemplares similares