Cargando…

Discovering amino acid patterns on binding sites in protein complexes

Discovering amino acid (AA) patterns on protein binding sites has recently become popular. We propose a method to discover the association relationship among AAs on binding sites. Such knowledge of binding sites is very helpful in predicting protein-protein interactions. In this paper, we focus on p...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuo, Huang-Cheng, Ong, Ping-Lin, Lin, Jung-Chang, Huang, Jen-Peng
Formato: Texto
Lenguaje:English
Publicado: Biomedical Informatics 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3064845/
https://www.ncbi.nlm.nih.gov/pubmed/21464838
_version_ 1782200927009112064
author Kuo, Huang-Cheng
Ong, Ping-Lin
Lin, Jung-Chang
Huang, Jen-Peng
author_facet Kuo, Huang-Cheng
Ong, Ping-Lin
Lin, Jung-Chang
Huang, Jen-Peng
author_sort Kuo, Huang-Cheng
collection PubMed
description Discovering amino acid (AA) patterns on protein binding sites has recently become popular. We propose a method to discover the association relationship among AAs on binding sites. Such knowledge of binding sites is very helpful in predicting protein-protein interactions. In this paper, we focus on protein complexes which have protein-protein recognition. The association rule mining technique is used to discover geographically adjacent amino acids on a binding site of a protein complex. When mining, instead of treating all AAs of binding sites as a transaction, we geographically partition AAs of binding sites in a protein complex. AAs in a partition are treated as a transaction. For the partition process, AAs on a binding site are projected from three-dimensional to two-dimensional. And then, assisted with a circular grid, AAs on the binding site are placed into grid cells. A circular grid has ten rings: a central ring, the second ring with 6 sectors, the third ring with 12 sectors, and later rings are added to four sectors in order. As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user. After placing these recognition complexes on the circular grid, we obtain mining records (i.e. transactions) from each sector. A sector is regarded as a record. Finally, we use the association rule to mine these records for frequent AA patterns. If the support of an AA pattern is larger than the predetermined minimum support (i.e. threshold), it is called a frequent pattern. With these discovered patterns, we offer the biologists a novel point of view, which will improve the prediction accuracy of protein-protein recognition. In our experiments, we produced the AA patterns by data mining. As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest. In addition, if we discriminate the shape of binding sites between concave and convex further, we discover that patterns {arg, glu, asp} and {arg, ser, asp} on the concave shape of binding sites in a protein more frequently (i.e. higher probability) make contact with {lys} or {arg} on the convex shape of binding sites in another protein. Thus, we can confidently achieve a rate of at least 78%. On the other hand {val, gly, lys} on the convex surface of binding sites in proteins is more frequently in contact with {asp} on the concave site of another protein, and the confidence achieved is over 81%. Applying data mining in biology can reveal more facts that may otherwise be ignored or not easily discovered by the naked eye. Furthermore, we can discover more relationships among AAs on binding sites by appropriately rotating these residues on binding sites from a three-dimension to two-dimension perspective. We designed a circular grid to deposit the data, which total to 463 records consisting of AAs. Then we used the association rules to mine these records for discovering relationships. The proposed method in this paper provides an insight into the characteristics of binding sites for recognition complexes.
format Text
id pubmed-3064845
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Biomedical Informatics
record_format MEDLINE/PubMed
spelling pubmed-30648452011-04-04 Discovering amino acid patterns on binding sites in protein complexes Kuo, Huang-Cheng Ong, Ping-Lin Lin, Jung-Chang Huang, Jen-Peng Bioinformation Hypothesis Discovering amino acid (AA) patterns on protein binding sites has recently become popular. We propose a method to discover the association relationship among AAs on binding sites. Such knowledge of binding sites is very helpful in predicting protein-protein interactions. In this paper, we focus on protein complexes which have protein-protein recognition. The association rule mining technique is used to discover geographically adjacent amino acids on a binding site of a protein complex. When mining, instead of treating all AAs of binding sites as a transaction, we geographically partition AAs of binding sites in a protein complex. AAs in a partition are treated as a transaction. For the partition process, AAs on a binding site are projected from three-dimensional to two-dimensional. And then, assisted with a circular grid, AAs on the binding site are placed into grid cells. A circular grid has ten rings: a central ring, the second ring with 6 sectors, the third ring with 12 sectors, and later rings are added to four sectors in order. As for the radius of each ring, we examined the complexes and found that 10Å is a suitable range, which can be set by the user. After placing these recognition complexes on the circular grid, we obtain mining records (i.e. transactions) from each sector. A sector is regarded as a record. Finally, we use the association rule to mine these records for frequent AA patterns. If the support of an AA pattern is larger than the predetermined minimum support (i.e. threshold), it is called a frequent pattern. With these discovered patterns, we offer the biologists a novel point of view, which will improve the prediction accuracy of protein-protein recognition. In our experiments, we produced the AA patterns by data mining. As a result, we found that arginine (arg) most frequently appears on the binding sites of two proteins in the recognition protein complexes, while cysteine (cys) appears the fewest. In addition, if we discriminate the shape of binding sites between concave and convex further, we discover that patterns {arg, glu, asp} and {arg, ser, asp} on the concave shape of binding sites in a protein more frequently (i.e. higher probability) make contact with {lys} or {arg} on the convex shape of binding sites in another protein. Thus, we can confidently achieve a rate of at least 78%. On the other hand {val, gly, lys} on the convex surface of binding sites in proteins is more frequently in contact with {asp} on the concave site of another protein, and the confidence achieved is over 81%. Applying data mining in biology can reveal more facts that may otherwise be ignored or not easily discovered by the naked eye. Furthermore, we can discover more relationships among AAs on binding sites by appropriately rotating these residues on binding sites from a three-dimension to two-dimension perspective. We designed a circular grid to deposit the data, which total to 463 records consisting of AAs. Then we used the association rules to mine these records for discovering relationships. The proposed method in this paper provides an insight into the characteristics of binding sites for recognition complexes. Biomedical Informatics 2011-03-02 /pmc/articles/PMC3064845/ /pubmed/21464838 Text en © 2011 Biomedical Informatics This is an open-access article, which permits unrestricted use, distribution, and reproduction in any medium, for non-commercial purposes, provided the original author and source are credited.
spellingShingle Hypothesis
Kuo, Huang-Cheng
Ong, Ping-Lin
Lin, Jung-Chang
Huang, Jen-Peng
Discovering amino acid patterns on binding sites in protein complexes
title Discovering amino acid patterns on binding sites in protein complexes
title_full Discovering amino acid patterns on binding sites in protein complexes
title_fullStr Discovering amino acid patterns on binding sites in protein complexes
title_full_unstemmed Discovering amino acid patterns on binding sites in protein complexes
title_short Discovering amino acid patterns on binding sites in protein complexes
title_sort discovering amino acid patterns on binding sites in protein complexes
topic Hypothesis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3064845/
https://www.ncbi.nlm.nih.gov/pubmed/21464838
work_keys_str_mv AT kuohuangcheng discoveringaminoacidpatternsonbindingsitesinproteincomplexes
AT ongpinglin discoveringaminoacidpatternsonbindingsitesinproteincomplexes
AT linjungchang discoveringaminoacidpatternsonbindingsitesinproteincomplexes
AT huangjenpeng discoveringaminoacidpatternsonbindingsitesinproteincomplexes