Cargando…

Exploring Mouse Protein Function via Multiple Approaches

Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Guohua, Chu, Chen, Huang, Tao, Kong, Xiangyin, Zhang, Yunhua, Zhang, Ning, Cai, Yu-Dong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5112993/
https://www.ncbi.nlm.nih.gov/pubmed/27846315
http://dx.doi.org/10.1371/journal.pone.0166580
_version_ 1782468119267115008
author Huang, Guohua
Chu, Chen
Huang, Tao
Kong, Xiangyin
Zhang, Yunhua
Zhang, Ning
Cai, Yu-Dong
author_facet Huang, Guohua
Chu, Chen
Huang, Tao
Kong, Xiangyin
Zhang, Yunhua
Zhang, Ning
Cai, Yu-Dong
author_sort Huang, Guohua
collection PubMed
description Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1(st)-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1(st)-order predicted functions are wrong but the 2(nd)-order predicted functions are correct, the 1(st)-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality.
format Online
Article
Text
id pubmed-5112993
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-51129932016-12-08 Exploring Mouse Protein Function via Multiple Approaches Huang, Guohua Chu, Chen Huang, Tao Kong, Xiangyin Zhang, Yunhua Zhang, Ning Cai, Yu-Dong PLoS One Research Article Although the number of available protein sequences is growing exponentially, functional protein annotations lag far behind. Therefore, accurate identification of protein functions remains one of the major challenges in molecular biology. In this study, we presented a novel approach to predict mouse protein functions. The approach was a sequential combination of a similarity-based approach, an interaction-based approach and a pseudo amino acid composition-based approach. The method achieved an accuracy of about 0.8450 for the 1(st)-order predictions in the leave-one-out and ten-fold cross-validations. For the results yielded by the leave-one-out cross-validation, although the similarity-based approach alone achieved an accuracy of 0.8756, it was unable to predict the functions of proteins with no homologues. Comparatively, the pseudo amino acid composition-based approach alone reached an accuracy of 0.6786. Although the accuracy was lower than that of the previous approach, it could predict the functions of almost all proteins, even proteins with no homologues. Therefore, the combined method balanced the advantages and disadvantages of both approaches to achieve efficient performance. Furthermore, the results yielded by the ten-fold cross-validation indicate that the combined method is still effective and stable when there are no close homologs are available. However, the accuracy of the predicted functions can only be determined according to known protein functions based on current knowledge. Many protein functions remain unknown. By exploring the functions of proteins for which the 1(st)-order predicted functions are wrong but the 2(nd)-order predicted functions are correct, the 1(st)-order wrongly predicted functions were shown to be closely associated with the genes encoding the proteins. The so-called wrongly predicted functions could also potentially be correct upon future experimental verification. Therefore, the accuracy of the presented method may be much higher in reality. Public Library of Science 2016-11-15 /pmc/articles/PMC5112993/ /pubmed/27846315 http://dx.doi.org/10.1371/journal.pone.0166580 Text en © 2016 Huang et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Huang, Guohua
Chu, Chen
Huang, Tao
Kong, Xiangyin
Zhang, Yunhua
Zhang, Ning
Cai, Yu-Dong
Exploring Mouse Protein Function via Multiple Approaches
title Exploring Mouse Protein Function via Multiple Approaches
title_full Exploring Mouse Protein Function via Multiple Approaches
title_fullStr Exploring Mouse Protein Function via Multiple Approaches
title_full_unstemmed Exploring Mouse Protein Function via Multiple Approaches
title_short Exploring Mouse Protein Function via Multiple Approaches
title_sort exploring mouse protein function via multiple approaches
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5112993/
https://www.ncbi.nlm.nih.gov/pubmed/27846315
http://dx.doi.org/10.1371/journal.pone.0166580
work_keys_str_mv AT huangguohua exploringmouseproteinfunctionviamultipleapproaches
AT chuchen exploringmouseproteinfunctionviamultipleapproaches
AT huangtao exploringmouseproteinfunctionviamultipleapproaches
AT kongxiangyin exploringmouseproteinfunctionviamultipleapproaches
AT zhangyunhua exploringmouseproteinfunctionviamultipleapproaches
AT zhangning exploringmouseproteinfunctionviamultipleapproaches
AT caiyudong exploringmouseproteinfunctionviamultipleapproaches