Cargando…

Graph pyramids for protein function prediction

BACKGROUND: Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction...

Descripción completa

Detalles Bibliográficos
Autores principales: Sandhan, Tushar, Yoo, Youngjun, Choi, Jin Young, Kim, Sun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460595/
https://www.ncbi.nlm.nih.gov/pubmed/26044522
http://dx.doi.org/10.1186/1755-8794-8-S2-S12
_version_ 1782375396740694016
author Sandhan, Tushar
Yoo, Youngjun
Choi, Jin Young
Kim, Sun
author_facet Sandhan, Tushar
Yoo, Youngjun
Choi, Jin Young
Kim, Sun
author_sort Sandhan, Tushar
collection PubMed
description BACKGROUND: Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. METHODS: Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. RESULTS: Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data.
format Online
Article
Text
id pubmed-4460595
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44605952015-06-29 Graph pyramids for protein function prediction Sandhan, Tushar Yoo, Youngjun Choi, Jin Young Kim, Sun BMC Med Genomics Research BACKGROUND: Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. METHODS: Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. RESULTS: Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. BioMed Central 2015-05-29 /pmc/articles/PMC4460595/ /pubmed/26044522 http://dx.doi.org/10.1186/1755-8794-8-S2-S12 Text en Copyright © 2015 Sandhan et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Sandhan, Tushar
Yoo, Youngjun
Choi, Jin Young
Kim, Sun
Graph pyramids for protein function prediction
title Graph pyramids for protein function prediction
title_full Graph pyramids for protein function prediction
title_fullStr Graph pyramids for protein function prediction
title_full_unstemmed Graph pyramids for protein function prediction
title_short Graph pyramids for protein function prediction
title_sort graph pyramids for protein function prediction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460595/
https://www.ncbi.nlm.nih.gov/pubmed/26044522
http://dx.doi.org/10.1186/1755-8794-8-S2-S12
work_keys_str_mv AT sandhantushar graphpyramidsforproteinfunctionprediction
AT yooyoungjun graphpyramidsforproteinfunctionprediction
AT choijinyoung graphpyramidsforproteinfunctionprediction
AT kimsun graphpyramidsforproteinfunctionprediction