Cargando…

Ranking and compacting binding segments of protein families using aligned pattern clusters

BACKGROUND: Discovering sequence patterns with variation can unveil functions of a protein family that are important for drug discovery. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search, called motif finding in Bi...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, En-Shiun Annie, Wong, Andrew KC
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3907781/
https://www.ncbi.nlm.nih.gov/pubmed/24564874
http://dx.doi.org/10.1186/1477-5956-11-S1-S8
_version_ 1782301651290292224
author Lee, En-Shiun Annie
Wong, Andrew KC
author_facet Lee, En-Shiun Annie
Wong, Andrew KC
author_sort Lee, En-Shiun Annie
collection PubMed
description BACKGROUND: Discovering sequence patterns with variation can unveil functions of a protein family that are important for drug discovery. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search, called motif finding in Bioinformatics, is used. However, at present, combinatorial algorithms result in large sets of solutions, and probabilistic models require a richer representation of the amino acid associations. To overcome these shortcomings, we present a method for ranking and compacting these solutions in a new representation referred to as Aligned Pattern Clusters (APCs). To tackle the problem of a large solution set, our method reveals a reduced set of candidate solutions without losing any information. To address the problem of representation, our method captures the amino acid associations and conservations of the aligned patterns. Our algorithm renders a set of APCs in which a set of patterns is discovered, pruned, aligned, and synthesized from the input sequences of a protein family. RESULTS: Our algorithm identifies the binding or other functional segments and their embedded residues which are important drug targets from the cytochrome c and the ubiquitin protein families taken from Unitprot. The results are independently confirmed by pFam's multiple sequence alignment. For cytochrome c protein the number of resulting patterns with variations are reduced by 76.62% from the number of original patterns without variations. Furthermore, all of the top four candidate APCs correspond to the binding segments with one of each of their conserved amino acid as the binding residue. The discovered proximal APCs agree with pFam and PROSITE results. Surprisingly, the distal binding site discovered by our algorithm is not discovered by pFam nor PROSITE, but confirmed by the three-dimensional cytochrome c structure. When applied to the ubiquitin protein family, our results agree with pFam and reveals six of the seven Lysine binding residues as conserved aligned columns with entropy redundancy measure of 1.0. CONCLUSION: The discovery, ranking, reduction, and representation of a set of patterns is important to avert time-consuming and expensive simulations and experimentations during proteomic study and drug discovery.
format Online
Article
Text
id pubmed-3907781
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39077812014-02-13 Ranking and compacting binding segments of protein families using aligned pattern clusters Lee, En-Shiun Annie Wong, Andrew KC Proteome Sci Research BACKGROUND: Discovering sequence patterns with variation can unveil functions of a protein family that are important for drug discovery. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search, called motif finding in Bioinformatics, is used. However, at present, combinatorial algorithms result in large sets of solutions, and probabilistic models require a richer representation of the amino acid associations. To overcome these shortcomings, we present a method for ranking and compacting these solutions in a new representation referred to as Aligned Pattern Clusters (APCs). To tackle the problem of a large solution set, our method reveals a reduced set of candidate solutions without losing any information. To address the problem of representation, our method captures the amino acid associations and conservations of the aligned patterns. Our algorithm renders a set of APCs in which a set of patterns is discovered, pruned, aligned, and synthesized from the input sequences of a protein family. RESULTS: Our algorithm identifies the binding or other functional segments and their embedded residues which are important drug targets from the cytochrome c and the ubiquitin protein families taken from Unitprot. The results are independently confirmed by pFam's multiple sequence alignment. For cytochrome c protein the number of resulting patterns with variations are reduced by 76.62% from the number of original patterns without variations. Furthermore, all of the top four candidate APCs correspond to the binding segments with one of each of their conserved amino acid as the binding residue. The discovered proximal APCs agree with pFam and PROSITE results. Surprisingly, the distal binding site discovered by our algorithm is not discovered by pFam nor PROSITE, but confirmed by the three-dimensional cytochrome c structure. When applied to the ubiquitin protein family, our results agree with pFam and reveals six of the seven Lysine binding residues as conserved aligned columns with entropy redundancy measure of 1.0. CONCLUSION: The discovery, ranking, reduction, and representation of a set of patterns is important to avert time-consuming and expensive simulations and experimentations during proteomic study and drug discovery. BioMed Central 2013-11-07 /pmc/articles/PMC3907781/ /pubmed/24564874 http://dx.doi.org/10.1186/1477-5956-11-S1-S8 Text en Copyright © 2013 Lee and Wong; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Lee, En-Shiun Annie
Wong, Andrew KC
Ranking and compacting binding segments of protein families using aligned pattern clusters
title Ranking and compacting binding segments of protein families using aligned pattern clusters
title_full Ranking and compacting binding segments of protein families using aligned pattern clusters
title_fullStr Ranking and compacting binding segments of protein families using aligned pattern clusters
title_full_unstemmed Ranking and compacting binding segments of protein families using aligned pattern clusters
title_short Ranking and compacting binding segments of protein families using aligned pattern clusters
title_sort ranking and compacting binding segments of protein families using aligned pattern clusters
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3907781/
https://www.ncbi.nlm.nih.gov/pubmed/24564874
http://dx.doi.org/10.1186/1477-5956-11-S1-S8
work_keys_str_mv AT leeenshiunannie rankingandcompactingbindingsegmentsofproteinfamiliesusingalignedpatternclusters
AT wongandrewkc rankingandcompactingbindingsegmentsofproteinfamiliesusingalignedpatternclusters