Cargando…

Cluster based prediction of PDZ-peptide interactions

BACKGROUND: PDZ domains are one of the most promiscuous protein recognition modules that bind with short linear peptides and play an important role in cellular signaling. Recently, few high-throughput techniques (e.g. protein microarray screen, phage display) have been applied to determine in-vitro...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kundu, Kousik, Backofen, Rolf
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046824/ https://www.ncbi.nlm.nih.gov/pubmed/24564547 http://dx.doi.org/10.1186/1471-2164-15-S1-S5

_version_	1782480320058097664
author	Kundu, Kousik Backofen, Rolf
author_facet	Kundu, Kousik Backofen, Rolf
author_sort	Kundu, Kousik
collection	PubMed
description	BACKGROUND: PDZ domains are one of the most promiscuous protein recognition modules that bind with short linear peptides and play an important role in cellular signaling. Recently, few high-throughput techniques (e.g. protein microarray screen, phage display) have been applied to determine in-vitro binding specificity of PDZ domains. Currently, many computational methods are available to predict PDZ-peptide interactions but they often provide domain specific models and/or have a limited domain coverage. RESULTS: Here, we composed the largest set of PDZ domains derived from human, mouse, fly and worm proteomes and defined binding models for PDZ domain families to improve the domain coverage and prediction specificity. For that purpose, we first identified a novel set of 138 PDZ families, comprising of 548 PDZ domains from aforementioned organisms, based on efficient clustering according to their sequence identity. For 43 PDZ families, covering 226 PDZ domains with available interaction data, we built specialized models using a support vector machine approach. The advantage of family-wise models is that they can also be used to determine the binding specificity of a newly characterized PDZ domain with sufficient sequence identity to the known families. Since most current experimental approaches provide only positive data, we have to cope with the class imbalance problem. Thus, to enrich the negative class, we introduced a powerful semi-supervised technique to generate high confidence non-interaction data. We report competitive predictive performance with respect to state-of-the-art approaches. CONCLUSIONS: Our approach has several contributions. First, we show that domain coverage can be increased by applying accurate clustering technique. Second, we developed an approach based on a semi-supervised strategy to get high confidence negative data. Third, we allowed high order correlations between the amino acid positions in the binding peptides. Fourth, our method is general enough and will easily be applicable to other peptide recognition modules such as SH2 domains and finally, we performed a genome-wide prediction for 101 human and 102 mouse PDZ domains and uncovered novel interactions with biological relevance. We make all the predictive models and genome-wide predictions freely available to the scientific community. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-S1-S5) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4046824
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40468242014-06-06 Cluster based prediction of PDZ-peptide interactions Kundu, Kousik Backofen, Rolf BMC Genomics Proceedings BACKGROUND: PDZ domains are one of the most promiscuous protein recognition modules that bind with short linear peptides and play an important role in cellular signaling. Recently, few high-throughput techniques (e.g. protein microarray screen, phage display) have been applied to determine in-vitro binding specificity of PDZ domains. Currently, many computational methods are available to predict PDZ-peptide interactions but they often provide domain specific models and/or have a limited domain coverage. RESULTS: Here, we composed the largest set of PDZ domains derived from human, mouse, fly and worm proteomes and defined binding models for PDZ domain families to improve the domain coverage and prediction specificity. For that purpose, we first identified a novel set of 138 PDZ families, comprising of 548 PDZ domains from aforementioned organisms, based on efficient clustering according to their sequence identity. For 43 PDZ families, covering 226 PDZ domains with available interaction data, we built specialized models using a support vector machine approach. The advantage of family-wise models is that they can also be used to determine the binding specificity of a newly characterized PDZ domain with sufficient sequence identity to the known families. Since most current experimental approaches provide only positive data, we have to cope with the class imbalance problem. Thus, to enrich the negative class, we introduced a powerful semi-supervised technique to generate high confidence non-interaction data. We report competitive predictive performance with respect to state-of-the-art approaches. CONCLUSIONS: Our approach has several contributions. First, we show that domain coverage can be increased by applying accurate clustering technique. Second, we developed an approach based on a semi-supervised strategy to get high confidence negative data. Third, we allowed high order correlations between the amino acid positions in the binding peptides. Fourth, our method is general enough and will easily be applicable to other peptide recognition modules such as SH2 domains and finally, we performed a genome-wide prediction for 101 human and 102 mouse PDZ domains and uncovered novel interactions with biological relevance. We make all the predictive models and genome-wide predictions freely available to the scientific community. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-S1-S5) contains supplementary material, which is available to authorized users. BioMed Central 2014-01-24 /pmc/articles/PMC4046824/ /pubmed/24564547 http://dx.doi.org/10.1186/1471-2164-15-S1-S5 Text en © Kundu and Backofen; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Kundu, Kousik Backofen, Rolf Cluster based prediction of PDZ-peptide interactions
title	Cluster based prediction of PDZ-peptide interactions
title_full	Cluster based prediction of PDZ-peptide interactions
title_fullStr	Cluster based prediction of PDZ-peptide interactions
title_full_unstemmed	Cluster based prediction of PDZ-peptide interactions
title_short	Cluster based prediction of PDZ-peptide interactions
title_sort	cluster based prediction of pdz-peptide interactions
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046824/ https://www.ncbi.nlm.nih.gov/pubmed/24564547 http://dx.doi.org/10.1186/1471-2164-15-S1-S5
work_keys_str_mv	AT kundukousik clusterbasedpredictionofpdzpeptideinteractions AT backofenrolf clusterbasedpredictionofpdzpeptideinteractions

Cluster based prediction of PDZ-peptide interactions

Ejemplares similares