Cargando…

Heterodimeric protein complex identification by naïve Bayes classifiers

BACKGROUND: Protein complexes are basic cellular entities that carry out the functions of their components. It can be found that in databases of protein complexes of yeast like CYC2008, the major type of known protein complexes is heterodimeric complexes. Although a number of methods for trying to p...

Descripción completa

Detalles Bibliográficos
Autor principal: Maruyama, Osamu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4219333/
https://www.ncbi.nlm.nih.gov/pubmed/24299017
http://dx.doi.org/10.1186/1471-2105-14-347
_version_ 1782342567058210816
author Maruyama, Osamu
author_facet Maruyama, Osamu
author_sort Maruyama, Osamu
collection PubMed
description BACKGROUND: Protein complexes are basic cellular entities that carry out the functions of their components. It can be found that in databases of protein complexes of yeast like CYC2008, the major type of known protein complexes is heterodimeric complexes. Although a number of methods for trying to predict sets of proteins that form arbitrary types of protein complexes simultaneously have been proposed, it can be found that they often fail to predict heterodimeric complexes. RESULTS: In this paper, we have designed several features characterizing heterodimeric protein complexes based on genomic data sets, and proposed a supervised-learning method for the prediction of heterodimeric protein complexes. This method learns the parameters of the features, which are embedded in the naïve Bayes classifier. The log-likelihood ratio derived from the naïve Bayes classifier with the parameter values obtained by maximum likelihood estimation gives the score of a given pair of proteins to predict whether the pair is a heterodimeric complex or not. A five-fold cross-validation shows good performance on yeast. The trained classifiers also show higher predictability than various existing algorithms on yeast data sets with approximate and exact matching criteria. CONCLUSIONS: Heterodimeric protein complex prediction is a rather harder problem than heteromeric protein complex prediction because heterodimeric protein complex is topologically simpler. However, it turns out that by designing features specialized for heterodimeric protein complexes, predictability of them can be improved. Thus, the design of more sophisticate features for heterodimeric protein complexes as well as the accumulation of more accurate and useful genome-wide data sets will lead to higher predictability of heterodimeric protein complexes. Our tool can be downloaded from http://imi.kyushu-u.ac.jp/~om/.
format Online
Article
Text
id pubmed-4219333
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42193332014-11-07 Heterodimeric protein complex identification by naïve Bayes classifiers Maruyama, Osamu BMC Bioinformatics Research Article BACKGROUND: Protein complexes are basic cellular entities that carry out the functions of their components. It can be found that in databases of protein complexes of yeast like CYC2008, the major type of known protein complexes is heterodimeric complexes. Although a number of methods for trying to predict sets of proteins that form arbitrary types of protein complexes simultaneously have been proposed, it can be found that they often fail to predict heterodimeric complexes. RESULTS: In this paper, we have designed several features characterizing heterodimeric protein complexes based on genomic data sets, and proposed a supervised-learning method for the prediction of heterodimeric protein complexes. This method learns the parameters of the features, which are embedded in the naïve Bayes classifier. The log-likelihood ratio derived from the naïve Bayes classifier with the parameter values obtained by maximum likelihood estimation gives the score of a given pair of proteins to predict whether the pair is a heterodimeric complex or not. A five-fold cross-validation shows good performance on yeast. The trained classifiers also show higher predictability than various existing algorithms on yeast data sets with approximate and exact matching criteria. CONCLUSIONS: Heterodimeric protein complex prediction is a rather harder problem than heteromeric protein complex prediction because heterodimeric protein complex is topologically simpler. However, it turns out that by designing features specialized for heterodimeric protein complexes, predictability of them can be improved. Thus, the design of more sophisticate features for heterodimeric protein complexes as well as the accumulation of more accurate and useful genome-wide data sets will lead to higher predictability of heterodimeric protein complexes. Our tool can be downloaded from http://imi.kyushu-u.ac.jp/~om/. BioMed Central 2013-12-03 /pmc/articles/PMC4219333/ /pubmed/24299017 http://dx.doi.org/10.1186/1471-2105-14-347 Text en Copyright © 2013 Maruyama; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License(http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Maruyama, Osamu
Heterodimeric protein complex identification by naïve Bayes classifiers
title Heterodimeric protein complex identification by naïve Bayes classifiers
title_full Heterodimeric protein complex identification by naïve Bayes classifiers
title_fullStr Heterodimeric protein complex identification by naïve Bayes classifiers
title_full_unstemmed Heterodimeric protein complex identification by naïve Bayes classifiers
title_short Heterodimeric protein complex identification by naïve Bayes classifiers
title_sort heterodimeric protein complex identification by naïve bayes classifiers
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4219333/
https://www.ncbi.nlm.nih.gov/pubmed/24299017
http://dx.doi.org/10.1186/1471-2105-14-347
work_keys_str_mv AT maruyamaosamu heterodimericproteincomplexidentificationbynaivebayesclassifiers