Cargando…

DeepSF: deep convolutional neural network for mapping protein sequences to folds

MOTIVATION: Protein fold recognition is an important problem in structural bioinformatics. Almost all traditional fold recognition methods use sequence (homology) comparison to indirectly predict the fold of a target protein based on the fold of a template protein with known structure, which cannot...

Descripción completa

Detalles Bibliográficos
Autores principales: Hou, Jie, Adhikari, Badri, Cheng, Jianlin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5905591/
https://www.ncbi.nlm.nih.gov/pubmed/29228193
http://dx.doi.org/10.1093/bioinformatics/btx780
_version_ 1783315289366069248
author Hou, Jie
Adhikari, Badri
Cheng, Jianlin
author_facet Hou, Jie
Adhikari, Badri
Cheng, Jianlin
author_sort Hou, Jie
collection PubMed
description MOTIVATION: Protein fold recognition is an important problem in structural bioinformatics. Almost all traditional fold recognition methods use sequence (homology) comparison to indirectly predict the fold of a target protein based on the fold of a template protein with known structure, which cannot explain the relationship between sequence and fold. Only a few methods had been developed to classify protein sequences into a small number of folds due to methodological limitations, which are not generally useful in practice. RESULTS: We develop a deep 1D-convolution neural network (DeepSF) to directly classify any protein sequence into one of 1195 known folds, which is useful for both fold recognition and the study of sequence–structure relationship. Different from traditional sequence alignment (comparison) based methods, our method automatically extracts fold-related features from a protein sequence of any length and maps it to the fold space. We train and test our method on the datasets curated from SCOP1.75, yielding an average classification accuracy of 75.3%. On the independent testing dataset curated from SCOP2.06, the classification accuracy is 73.0%. We compare our method with a top profile–profile alignment method—HHSearch on hard template-based and template-free modeling targets of CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is 12.63–26.32% higher than HHSearch on template-free modeling targets and 3.39–17.09% higher on hard template-based modeling targets for top 1, 5 and 10 predicted folds. The hidden features extracted from sequence by our method is robust against sequence mutation, insertion, deletion and truncation, and can be used for other protein pattern recognition problems such as protein clustering, comparison and ranking. AVAILABILITY AND IMPLEMENTATION: The DeepSF server is publicly available at: http://iris.rnet.missouri.edu/DeepSF/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5905591
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59055912018-04-23 DeepSF: deep convolutional neural network for mapping protein sequences to folds Hou, Jie Adhikari, Badri Cheng, Jianlin Bioinformatics Original Papers MOTIVATION: Protein fold recognition is an important problem in structural bioinformatics. Almost all traditional fold recognition methods use sequence (homology) comparison to indirectly predict the fold of a target protein based on the fold of a template protein with known structure, which cannot explain the relationship between sequence and fold. Only a few methods had been developed to classify protein sequences into a small number of folds due to methodological limitations, which are not generally useful in practice. RESULTS: We develop a deep 1D-convolution neural network (DeepSF) to directly classify any protein sequence into one of 1195 known folds, which is useful for both fold recognition and the study of sequence–structure relationship. Different from traditional sequence alignment (comparison) based methods, our method automatically extracts fold-related features from a protein sequence of any length and maps it to the fold space. We train and test our method on the datasets curated from SCOP1.75, yielding an average classification accuracy of 75.3%. On the independent testing dataset curated from SCOP2.06, the classification accuracy is 73.0%. We compare our method with a top profile–profile alignment method—HHSearch on hard template-based and template-free modeling targets of CASP9-12 in terms of fold recognition accuracy. The accuracy of our method is 12.63–26.32% higher than HHSearch on template-free modeling targets and 3.39–17.09% higher on hard template-based modeling targets for top 1, 5 and 10 predicted folds. The hidden features extracted from sequence by our method is robust against sequence mutation, insertion, deletion and truncation, and can be used for other protein pattern recognition problems such as protein clustering, comparison and ranking. AVAILABILITY AND IMPLEMENTATION: The DeepSF server is publicly available at: http://iris.rnet.missouri.edu/DeepSF/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2018-04-15 2017-12-08 /pmc/articles/PMC5905591/ /pubmed/29228193 http://dx.doi.org/10.1093/bioinformatics/btx780 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Hou, Jie
Adhikari, Badri
Cheng, Jianlin
DeepSF: deep convolutional neural network for mapping protein sequences to folds
title DeepSF: deep convolutional neural network for mapping protein sequences to folds
title_full DeepSF: deep convolutional neural network for mapping protein sequences to folds
title_fullStr DeepSF: deep convolutional neural network for mapping protein sequences to folds
title_full_unstemmed DeepSF: deep convolutional neural network for mapping protein sequences to folds
title_short DeepSF: deep convolutional neural network for mapping protein sequences to folds
title_sort deepsf: deep convolutional neural network for mapping protein sequences to folds
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5905591/
https://www.ncbi.nlm.nih.gov/pubmed/29228193
http://dx.doi.org/10.1093/bioinformatics/btx780
work_keys_str_mv AT houjie deepsfdeepconvolutionalneuralnetworkformappingproteinsequencestofolds
AT adhikaribadri deepsfdeepconvolutionalneuralnetworkformappingproteinsequencestofolds
AT chengjianlin deepsfdeepconvolutionalneuralnetworkformappingproteinsequencestofolds