Cargando…

ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval

BACKGROUND: The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind o...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jingyan, Gao, Xin, Wang, Quanquan, Li, Yongping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3348016/
https://www.ncbi.nlm.nih.gov/pubmed/22594999
http://dx.doi.org/10.1186/1471-2105-13-S7-S2
_version_ 1782232352767868928
author Wang, Jingyan
Gao, Xin
Wang, Quanquan
Li, Yongping
author_facet Wang, Jingyan
Gao, Xin
Wang, Quanquan
Li, Yongping
author_sort Wang, Jingyan
collection PubMed
description BACKGROUND: The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database. RESULTS: In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure d(ij )by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context [Formula: see text] and [Formula: see text] is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing d(ij )by a factor learned from the context [Formula: see text] and [Formula: see text]. Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial Context Coherently in an iterative algorithm--ProDis-ContSHC. We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information. CONCLUSIONS: Using the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature.
format Online
Article
Text
id pubmed-3348016
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33480162012-05-09 ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval Wang, Jingyan Gao, Xin Wang, Quanquan Li, Yongping BMC Bioinformatics Proceedings BACKGROUND: The need to retrieve or classify protein molecules using structure or sequence-based similarity measures underlies a wide range of biomedical applications. Traditional protein search methods rely on a pairwise dissimilarity/similarity measure for comparing a pair of proteins. This kind of pairwise measures suffer from the limitation of neglecting the distribution of other proteins and thus cannot satisfy the need for high accuracy of the retrieval systems. Recent work in the machine learning community has shown that exploiting the global structure of the database and learning the contextual dissimilarity/similarity measures can improve the retrieval performance significantly. However, most existing contextual dissimilarity/similarity learning algorithms work in an unsupervised manner, which does not utilize the information of the known class labels of proteins in the database. RESULTS: In this paper, we propose a novel protein-protein dissimilarity learning algorithm, ProDis-ContSHC. ProDis-ContSHC regularizes an existing dissimilarity measure d(ij )by considering the contextual information of the proteins. The context of a protein is defined by its neighboring proteins. The basic idea is, for a pair of proteins (i, j), if their context [Formula: see text] and [Formula: see text] is similar to each other, the two proteins should also have a high similarity. We implement this idea by regularizing d(ij )by a factor learned from the context [Formula: see text] and [Formula: see text]. Moreover, we divide the context to hierarchial sub-context and get the contextual dissimilarity vector for each protein pair. Using the class label information of the proteins, we select the relevant (a pair of proteins that has the same class labels) and irrelevant (with different labels) protein pairs, and train an SVM model to distinguish between their contextual dissimilarity vectors. The SVM model is further used to learn a supervised regularizing factor. Finally, with the new Supervised learned Dissimilarity measure, we update the Protein Hierarchial Context Coherently in an iterative algorithm--ProDis-ContSHC. We test the performance of ProDis-ContSHC on two benchmark sets, i.e., the ASTRAL 1.73 database and the FSSP/DALI database. Experimental results demonstrate that plugging our supervised contextual dissimilarity measures into the retrieval systems significantly outperforms the context-free dissimilarity/similarity measures and other unsupervised contextual dissimilarity measures that do not use the class label information. CONCLUSIONS: Using the contextual proteins with their class labels in the database, we can improve the accuracy of the pairwise dissimilarity/similarity measures dramatically for the protein retrieval tasks. In this work, for the first time, we propose the idea of supervised contextual dissimilarity learning, resulting in the ProDis-ContSHC algorithm. Among different contextual dissimilarity learning approaches that can be used to compare a pair of proteins, ProDis-ContSHC provides the highest accuracy. Finally, ProDis-ContSHC compares favorably with other methods reported in the recent literature. BioMed Central 2012-05-08 /pmc/articles/PMC3348016/ /pubmed/22594999 http://dx.doi.org/10.1186/1471-2105-13-S7-S2 Text en Copyright ©2012 Wang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Wang, Jingyan
Gao, Xin
Wang, Quanquan
Li, Yongping
ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval
title ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval
title_full ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval
title_fullStr ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval
title_full_unstemmed ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval
title_short ProDis-ContSHC: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval
title_sort prodis-contshc: learning protein dissimilarity measures and hierarchical context coherently for protein-protein comparison in protein database retrieval
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3348016/
https://www.ncbi.nlm.nih.gov/pubmed/22594999
http://dx.doi.org/10.1186/1471-2105-13-S7-S2
work_keys_str_mv AT wangjingyan prodiscontshclearningproteindissimilaritymeasuresandhierarchicalcontextcoherentlyforproteinproteincomparisoninproteindatabaseretrieval
AT gaoxin prodiscontshclearningproteindissimilaritymeasuresandhierarchicalcontextcoherentlyforproteinproteincomparisoninproteindatabaseretrieval
AT wangquanquan prodiscontshclearningproteindissimilaritymeasuresandhierarchicalcontextcoherentlyforproteinproteincomparisoninproteindatabaseretrieval
AT liyongping prodiscontshclearningproteindissimilaritymeasuresandhierarchicalcontextcoherentlyforproteinproteincomparisoninproteindatabaseretrieval