Cargando…

Identification of hot regions in protein-protein interactions by sequential pattern mining

BACKGROUND: Identification of protein interacting sites is an important task in computational molecular biology. As more and more protein sequences are deposited without available structural information, it is strongly desirable to predict protein binding regions by their sequences alone. This paper...

Descripción completa

Detalles Bibliográficos
Autores principales: Hsu, Chen-Ming, Chen, Chien-Yu, Liu, Baw-Jhiune, Huang, Chih-Chang, Laio, Min-Hung, Lin, Chien-Chieh, Wu, Tzung-Lin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892096/
https://www.ncbi.nlm.nih.gov/pubmed/17570867
http://dx.doi.org/10.1186/1471-2105-8-S5-S8
_version_ 1782133826250604544
author Hsu, Chen-Ming
Chen, Chien-Yu
Liu, Baw-Jhiune
Huang, Chih-Chang
Laio, Min-Hung
Lin, Chien-Chieh
Wu, Tzung-Lin
author_facet Hsu, Chen-Ming
Chen, Chien-Yu
Liu, Baw-Jhiune
Huang, Chih-Chang
Laio, Min-Hung
Lin, Chien-Chieh
Wu, Tzung-Lin
author_sort Hsu, Chen-Ming
collection PubMed
description BACKGROUND: Identification of protein interacting sites is an important task in computational molecular biology. As more and more protein sequences are deposited without available structural information, it is strongly desirable to predict protein binding regions by their sequences alone. This paper presents a pattern mining approach to tackle this problem. It is observed that a functional region of protein structures usually consists of several peptide segments linked with large wildcard regions. Thus, the proposed mining technology considers large irregular gaps when growing patterns, in order to find the residues that are simultaneously conserved but largely separated on the sequences. A derived pattern is called a cluster-like pattern since the discovered conserved residues are always grouped into several blocks, which each corresponds to a local conserved region on the protein sequence. RESULTS: The experiments conducted in this work demonstrate that the derived long patterns automatically discover the important residues that form one or several hot regions of protein-protein interactions. The methodology is evaluated by conducting experiments on the web server MAGIIC-PRO based on a well known benchmark containing 220 protein chains from 72 distinct complexes. Among the tested 218 proteins, there are 900 sequential blocks discovered, 4.25 blocks per protein chain on average. About 92% of the derived blocks are observed to be clustered in space with at least one of the other blocks, and about 66% of the blocks are found to be near the interface of protein-protein interactions. It is summarized that for about 83% of the tested proteins, at least two interacting blocks can be discovered by this approach. CONCLUSION: This work aims to demonstrate that the important residues associated with the interface of protein-protein interactions may be automatically discovered by sequential pattern mining. The detected regions possess high conservation and thus are considered as the computational hot regions. This information would be useful to characterizing protein sequences, predicting protein function, finding potential partners, and facilitating protein docking for drug discovery.
format Text
id pubmed-1892096
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-18920962007-06-15 Identification of hot regions in protein-protein interactions by sequential pattern mining Hsu, Chen-Ming Chen, Chien-Yu Liu, Baw-Jhiune Huang, Chih-Chang Laio, Min-Hung Lin, Chien-Chieh Wu, Tzung-Lin BMC Bioinformatics Research BACKGROUND: Identification of protein interacting sites is an important task in computational molecular biology. As more and more protein sequences are deposited without available structural information, it is strongly desirable to predict protein binding regions by their sequences alone. This paper presents a pattern mining approach to tackle this problem. It is observed that a functional region of protein structures usually consists of several peptide segments linked with large wildcard regions. Thus, the proposed mining technology considers large irregular gaps when growing patterns, in order to find the residues that are simultaneously conserved but largely separated on the sequences. A derived pattern is called a cluster-like pattern since the discovered conserved residues are always grouped into several blocks, which each corresponds to a local conserved region on the protein sequence. RESULTS: The experiments conducted in this work demonstrate that the derived long patterns automatically discover the important residues that form one or several hot regions of protein-protein interactions. The methodology is evaluated by conducting experiments on the web server MAGIIC-PRO based on a well known benchmark containing 220 protein chains from 72 distinct complexes. Among the tested 218 proteins, there are 900 sequential blocks discovered, 4.25 blocks per protein chain on average. About 92% of the derived blocks are observed to be clustered in space with at least one of the other blocks, and about 66% of the blocks are found to be near the interface of protein-protein interactions. It is summarized that for about 83% of the tested proteins, at least two interacting blocks can be discovered by this approach. CONCLUSION: This work aims to demonstrate that the important residues associated with the interface of protein-protein interactions may be automatically discovered by sequential pattern mining. The detected regions possess high conservation and thus are considered as the computational hot regions. This information would be useful to characterizing protein sequences, predicting protein function, finding potential partners, and facilitating protein docking for drug discovery. BioMed Central 2007-05-24 /pmc/articles/PMC1892096/ /pubmed/17570867 http://dx.doi.org/10.1186/1471-2105-8-S5-S8 Text en Copyright © 2007 Hsu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Hsu, Chen-Ming
Chen, Chien-Yu
Liu, Baw-Jhiune
Huang, Chih-Chang
Laio, Min-Hung
Lin, Chien-Chieh
Wu, Tzung-Lin
Identification of hot regions in protein-protein interactions by sequential pattern mining
title Identification of hot regions in protein-protein interactions by sequential pattern mining
title_full Identification of hot regions in protein-protein interactions by sequential pattern mining
title_fullStr Identification of hot regions in protein-protein interactions by sequential pattern mining
title_full_unstemmed Identification of hot regions in protein-protein interactions by sequential pattern mining
title_short Identification of hot regions in protein-protein interactions by sequential pattern mining
title_sort identification of hot regions in protein-protein interactions by sequential pattern mining
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892096/
https://www.ncbi.nlm.nih.gov/pubmed/17570867
http://dx.doi.org/10.1186/1471-2105-8-S5-S8
work_keys_str_mv AT hsuchenming identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining
AT chenchienyu identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining
AT liubawjhiune identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining
AT huangchihchang identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining
AT laiominhung identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining
AT linchienchieh identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining
AT wutzunglin identificationofhotregionsinproteinproteininteractionsbysequentialpatternmining