Cargando…

Protein fold recognition using geometric kernel data fusion

Motivation: Various approaches based on features extracted from protein sequences and often machine learning methods have been used in the prediction of protein folds. Finding an efficient technique for integrating these different protein features has received increasing attention. In particular, ke...

Descripción completa

Detalles Bibliográficos
Autores principales: Zakeri, Pooya, Jeuris, Ben, Vandebril, Raf, Moreau, Yves
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4071197/
https://www.ncbi.nlm.nih.gov/pubmed/24590441
http://dx.doi.org/10.1093/bioinformatics/btu118
_version_ 1782322786023243776
author Zakeri, Pooya
Jeuris, Ben
Vandebril, Raf
Moreau, Yves
author_facet Zakeri, Pooya
Jeuris, Ben
Vandebril, Raf
Moreau, Yves
author_sort Zakeri, Pooya
collection PubMed
description Motivation: Various approaches based on features extracted from protein sequences and often machine learning methods have been used in the prediction of protein folds. Finding an efficient technique for integrating these different protein features has received increasing attention. In particular, kernel methods are an interesting class of techniques for integrating heterogeneous data. Various methods have been proposed to fuse multiple kernels. Most techniques for multiple kernel learning focus on learning a convex linear combination of base kernels. In addition to the limitation of linear combinations, working with such approaches could cause a loss of potentially useful information. Results: We design several techniques to combine kernel matrices by taking more involved, geometry inspired means of these matrices instead of convex linear combinations. We consider various sequence-based protein features including information extracted directly from position-specific scoring matrices and local sequence alignment. We evaluate our methods for classification on the SCOP PDB-40D benchmark dataset for protein fold recognition. The best overall accuracy on the protein fold recognition test set obtained by our methods is ∼86.7%. This is an improvement over the results of the best existing approach. Moreover, our computational model has been developed by incorporating the functional domain composition of proteins through a hybridization model. It is observed that by using our proposed hybridization model, the protein fold recognition accuracy is further improved to 89.30%. Furthermore, we investigate the performance of our approach on the protein remote homology detection problem by fusing multiple string kernels. Availability and implementation: The MATLAB code used for our proposed geometric kernel fusion frameworks are publicly available at http://people.cs.kuleuven.be/∼raf.vandebril/homepage/software/geomean.php?menu=5/ Contact: pooyapaydar@gmail.com or yves.moreau@esat.kuleuven.be Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4071197
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-40711972014-06-26 Protein fold recognition using geometric kernel data fusion Zakeri, Pooya Jeuris, Ben Vandebril, Raf Moreau, Yves Bioinformatics Original Papers Motivation: Various approaches based on features extracted from protein sequences and often machine learning methods have been used in the prediction of protein folds. Finding an efficient technique for integrating these different protein features has received increasing attention. In particular, kernel methods are an interesting class of techniques for integrating heterogeneous data. Various methods have been proposed to fuse multiple kernels. Most techniques for multiple kernel learning focus on learning a convex linear combination of base kernels. In addition to the limitation of linear combinations, working with such approaches could cause a loss of potentially useful information. Results: We design several techniques to combine kernel matrices by taking more involved, geometry inspired means of these matrices instead of convex linear combinations. We consider various sequence-based protein features including information extracted directly from position-specific scoring matrices and local sequence alignment. We evaluate our methods for classification on the SCOP PDB-40D benchmark dataset for protein fold recognition. The best overall accuracy on the protein fold recognition test set obtained by our methods is ∼86.7%. This is an improvement over the results of the best existing approach. Moreover, our computational model has been developed by incorporating the functional domain composition of proteins through a hybridization model. It is observed that by using our proposed hybridization model, the protein fold recognition accuracy is further improved to 89.30%. Furthermore, we investigate the performance of our approach on the protein remote homology detection problem by fusing multiple string kernels. Availability and implementation: The MATLAB code used for our proposed geometric kernel fusion frameworks are publicly available at http://people.cs.kuleuven.be/∼raf.vandebril/homepage/software/geomean.php?menu=5/ Contact: pooyapaydar@gmail.com or yves.moreau@esat.kuleuven.be Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-07-01 2014-03-03 /pmc/articles/PMC4071197/ /pubmed/24590441 http://dx.doi.org/10.1093/bioinformatics/btu118 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Zakeri, Pooya
Jeuris, Ben
Vandebril, Raf
Moreau, Yves
Protein fold recognition using geometric kernel data fusion
title Protein fold recognition using geometric kernel data fusion
title_full Protein fold recognition using geometric kernel data fusion
title_fullStr Protein fold recognition using geometric kernel data fusion
title_full_unstemmed Protein fold recognition using geometric kernel data fusion
title_short Protein fold recognition using geometric kernel data fusion
title_sort protein fold recognition using geometric kernel data fusion
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4071197/
https://www.ncbi.nlm.nih.gov/pubmed/24590441
http://dx.doi.org/10.1093/bioinformatics/btu118
work_keys_str_mv AT zakeripooya proteinfoldrecognitionusinggeometrickerneldatafusion
AT jeurisben proteinfoldrecognitionusinggeometrickerneldatafusion
AT vandebrilraf proteinfoldrecognitionusinggeometrickerneldatafusion
AT moreauyves proteinfoldrecognitionusinggeometrickerneldatafusion