Cargando…

Identification of protein functions using a machine-learning approach based on sequence-derived properties

BACKGROUND: Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein s...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lee, Bum Ju, Shin, Moon Sun, Oh, Young Joon, Oh, Hae Seok, Ryu, Keun Ho
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2009
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2731080/ https://www.ncbi.nlm.nih.gov/pubmed/19664241 http://dx.doi.org/10.1186/1477-5956-7-27

_version_	1782170935053254656
author	Lee, Bum Ju Shin, Moon Sun Oh, Young Joon Oh, Hae Seok Ryu, Keun Ho
author_facet	Lee, Bum Ju Shin, Moon Sun Oh, Young Joon Oh, Hae Seok Ryu, Keun Ho
author_sort	Lee, Bum Ju
collection	PubMed
description	BACKGROUND: Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak. This study aimed to develop an accurate prediction method for identifying protein function, irrespective of sequence and structural similarities. RESULTS: A highly accurate prediction method capable of identifying protein function, based solely on protein sequence properties, is described. This method analyses and identifies specific features of the protein sequence that are highly correlated with certain protein functions and determines the combination of protein sequence features that best characterises protein function. Thirty-three features that represent subtle differences in local regions and full regions of the protein sequences were introduced. On the basis of 484 features extracted solely from the protein sequence, models were built to predict the functions of 11 different proteins from a broad range of cellular components, molecular functions, and biological processes. The accuracy of protein function prediction using random forests with feature selection ranged from 94.23% to 100%. The local sequence information was found to have a broad range of applicability in predicting protein function. CONCLUSION: We present an accurate prediction method using a machine-learning approach based solely on protein sequence properties. The primary contribution of this paper is to propose new PNPRD features representing global and/or local differences in sequences, based on positively and/or negatively charged residues, to assist in predicting protein function. In addition, we identified a compact and useful feature subset for predicting the function of various proteins. Our results indicate that sequence-based classifiers can provide good results among a broad range of proteins, that the proposed features are useful in predicting several functions, and that the combination of our and traditional features may support the creation of a discriminative feature set for specific protein functions.
format	Text
id	pubmed-2731080
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-27310802009-08-24 Identification of protein functions using a machine-learning approach based on sequence-derived properties Lee, Bum Ju Shin, Moon Sun Oh, Young Joon Oh, Hae Seok Ryu, Keun Ho Proteome Sci Research BACKGROUND: Predicting the function of an unknown protein is an essential goal in bioinformatics. Sequence similarity-based approaches are widely used for function prediction; however, they are often inadequate in the absence of similar sequences or when the sequence similarity among known protein sequences is statistically weak. This study aimed to develop an accurate prediction method for identifying protein function, irrespective of sequence and structural similarities. RESULTS: A highly accurate prediction method capable of identifying protein function, based solely on protein sequence properties, is described. This method analyses and identifies specific features of the protein sequence that are highly correlated with certain protein functions and determines the combination of protein sequence features that best characterises protein function. Thirty-three features that represent subtle differences in local regions and full regions of the protein sequences were introduced. On the basis of 484 features extracted solely from the protein sequence, models were built to predict the functions of 11 different proteins from a broad range of cellular components, molecular functions, and biological processes. The accuracy of protein function prediction using random forests with feature selection ranged from 94.23% to 100%. The local sequence information was found to have a broad range of applicability in predicting protein function. CONCLUSION: We present an accurate prediction method using a machine-learning approach based solely on protein sequence properties. The primary contribution of this paper is to propose new PNPRD features representing global and/or local differences in sequences, based on positively and/or negatively charged residues, to assist in predicting protein function. In addition, we identified a compact and useful feature subset for predicting the function of various proteins. Our results indicate that sequence-based classifiers can provide good results among a broad range of proteins, that the proposed features are useful in predicting several functions, and that the combination of our and traditional features may support the creation of a discriminative feature set for specific protein functions. BioMed Central 2009-08-09 /pmc/articles/PMC2731080/ /pubmed/19664241 http://dx.doi.org/10.1186/1477-5956-7-27 Text en Copyright © 2009 Lee et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Lee, Bum Ju Shin, Moon Sun Oh, Young Joon Oh, Hae Seok Ryu, Keun Ho Identification of protein functions using a machine-learning approach based on sequence-derived properties
title	Identification of protein functions using a machine-learning approach based on sequence-derived properties
title_full	Identification of protein functions using a machine-learning approach based on sequence-derived properties
title_fullStr	Identification of protein functions using a machine-learning approach based on sequence-derived properties
title_full_unstemmed	Identification of protein functions using a machine-learning approach based on sequence-derived properties
title_short	Identification of protein functions using a machine-learning approach based on sequence-derived properties
title_sort	identification of protein functions using a machine-learning approach based on sequence-derived properties
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2731080/ https://www.ncbi.nlm.nih.gov/pubmed/19664241 http://dx.doi.org/10.1186/1477-5956-7-27
work_keys_str_mv	AT leebumju identificationofproteinfunctionsusingamachinelearningapproachbasedonsequencederivedproperties AT shinmoonsun identificationofproteinfunctionsusingamachinelearningapproachbasedonsequencederivedproperties AT ohyoungjoon identificationofproteinfunctionsusingamachinelearningapproachbasedonsequencederivedproperties AT ohhaeseok identificationofproteinfunctionsusingamachinelearningapproachbasedonsequencederivedproperties AT ryukeunho identificationofproteinfunctionsusingamachinelearningapproachbasedonsequencederivedproperties

Identification of protein functions using a machine-learning approach based on sequence-derived properties

Ejemplares similares