Cargando…
Ranking and compacting binding segments of protein families using aligned pattern clusters
BACKGROUND: Discovering sequence patterns with variation can unveil functions of a protein family that are important for drug discovery. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search, called motif finding in Bi...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3907781/ https://www.ncbi.nlm.nih.gov/pubmed/24564874 http://dx.doi.org/10.1186/1477-5956-11-S1-S8 |
_version_ | 1782301651290292224 |
---|---|
author | Lee, En-Shiun Annie Wong, Andrew KC |
author_facet | Lee, En-Shiun Annie Wong, Andrew KC |
author_sort | Lee, En-Shiun Annie |
collection | PubMed |
description | BACKGROUND: Discovering sequence patterns with variation can unveil functions of a protein family that are important for drug discovery. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search, called motif finding in Bioinformatics, is used. However, at present, combinatorial algorithms result in large sets of solutions, and probabilistic models require a richer representation of the amino acid associations. To overcome these shortcomings, we present a method for ranking and compacting these solutions in a new representation referred to as Aligned Pattern Clusters (APCs). To tackle the problem of a large solution set, our method reveals a reduced set of candidate solutions without losing any information. To address the problem of representation, our method captures the amino acid associations and conservations of the aligned patterns. Our algorithm renders a set of APCs in which a set of patterns is discovered, pruned, aligned, and synthesized from the input sequences of a protein family. RESULTS: Our algorithm identifies the binding or other functional segments and their embedded residues which are important drug targets from the cytochrome c and the ubiquitin protein families taken from Unitprot. The results are independently confirmed by pFam's multiple sequence alignment. For cytochrome c protein the number of resulting patterns with variations are reduced by 76.62% from the number of original patterns without variations. Furthermore, all of the top four candidate APCs correspond to the binding segments with one of each of their conserved amino acid as the binding residue. The discovered proximal APCs agree with pFam and PROSITE results. Surprisingly, the distal binding site discovered by our algorithm is not discovered by pFam nor PROSITE, but confirmed by the three-dimensional cytochrome c structure. When applied to the ubiquitin protein family, our results agree with pFam and reveals six of the seven Lysine binding residues as conserved aligned columns with entropy redundancy measure of 1.0. CONCLUSION: The discovery, ranking, reduction, and representation of a set of patterns is important to avert time-consuming and expensive simulations and experimentations during proteomic study and drug discovery. |
format | Online Article Text |
id | pubmed-3907781 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-39077812014-02-13 Ranking and compacting binding segments of protein families using aligned pattern clusters Lee, En-Shiun Annie Wong, Andrew KC Proteome Sci Research BACKGROUND: Discovering sequence patterns with variation can unveil functions of a protein family that are important for drug discovery. Exploring protein families using existing methods such as multiple sequence alignment is computationally expensive, thus pattern search, called motif finding in Bioinformatics, is used. However, at present, combinatorial algorithms result in large sets of solutions, and probabilistic models require a richer representation of the amino acid associations. To overcome these shortcomings, we present a method for ranking and compacting these solutions in a new representation referred to as Aligned Pattern Clusters (APCs). To tackle the problem of a large solution set, our method reveals a reduced set of candidate solutions without losing any information. To address the problem of representation, our method captures the amino acid associations and conservations of the aligned patterns. Our algorithm renders a set of APCs in which a set of patterns is discovered, pruned, aligned, and synthesized from the input sequences of a protein family. RESULTS: Our algorithm identifies the binding or other functional segments and their embedded residues which are important drug targets from the cytochrome c and the ubiquitin protein families taken from Unitprot. The results are independently confirmed by pFam's multiple sequence alignment. For cytochrome c protein the number of resulting patterns with variations are reduced by 76.62% from the number of original patterns without variations. Furthermore, all of the top four candidate APCs correspond to the binding segments with one of each of their conserved amino acid as the binding residue. The discovered proximal APCs agree with pFam and PROSITE results. Surprisingly, the distal binding site discovered by our algorithm is not discovered by pFam nor PROSITE, but confirmed by the three-dimensional cytochrome c structure. When applied to the ubiquitin protein family, our results agree with pFam and reveals six of the seven Lysine binding residues as conserved aligned columns with entropy redundancy measure of 1.0. CONCLUSION: The discovery, ranking, reduction, and representation of a set of patterns is important to avert time-consuming and expensive simulations and experimentations during proteomic study and drug discovery. BioMed Central 2013-11-07 /pmc/articles/PMC3907781/ /pubmed/24564874 http://dx.doi.org/10.1186/1477-5956-11-S1-S8 Text en Copyright © 2013 Lee and Wong; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Lee, En-Shiun Annie Wong, Andrew KC Ranking and compacting binding segments of protein families using aligned pattern clusters |
title | Ranking and compacting binding segments of protein families using aligned pattern clusters |
title_full | Ranking and compacting binding segments of protein families using aligned pattern clusters |
title_fullStr | Ranking and compacting binding segments of protein families using aligned pattern clusters |
title_full_unstemmed | Ranking and compacting binding segments of protein families using aligned pattern clusters |
title_short | Ranking and compacting binding segments of protein families using aligned pattern clusters |
title_sort | ranking and compacting binding segments of protein families using aligned pattern clusters |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3907781/ https://www.ncbi.nlm.nih.gov/pubmed/24564874 http://dx.doi.org/10.1186/1477-5956-11-S1-S8 |
work_keys_str_mv | AT leeenshiunannie rankingandcompactingbindingsegmentsofproteinfamiliesusingalignedpatternclusters AT wongandrewkc rankingandcompactingbindingsegmentsofproteinfamiliesusingalignedpatternclusters |