Cargando…

Learning structural motif representations for efficient protein structure search

MOTIVATION: Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Yang, Ye, Qing, Wang, Liwei, Peng, Jian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129266/
https://www.ncbi.nlm.nih.gov/pubmed/30423083
http://dx.doi.org/10.1093/bioinformatics/bty585
_version_ 1783353770499899392
author Liu, Yang
Ye, Qing
Wang, Liwei
Peng, Jian
author_facet Liu, Yang
Ye, Qing
Wang, Liwei
Peng, Jian
author_sort Liu, Yang
collection PubMed
description MOTIVATION: Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a ‘bag of fragments’, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library. Despite being efficient, the accuracy of FragBag is unsatisfactory because its backbone fragment library may not be optimally constructed and long-range interacting patterns are omitted. RESULTS: Here we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs. AVAILABILITY AND IMPLEMENTATION: https://github.com/largelymfs/DeepFold
format Online
Article
Text
id pubmed-6129266
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61292662018-09-12 Learning structural motif representations for efficient protein structure search Liu, Yang Ye, Qing Wang, Liwei Peng, Jian Bioinformatics Eccb 2018: European Conference on Computational Biology Proceedings MOTIVATION: Given a protein of unknown function, fast identification of similar protein structures from the Protein Data Bank (PDB) is a critical step for inferring its biological function. Such structural neighbors can provide evolutionary insights into protein conformation, interfaces and binding sites that are not detectable from sequence similarity. However, the computational cost of performing pairwise structural alignment against all structures in PDB is prohibitively expensive. Alignment-free approaches have been introduced to enable fast but coarse comparisons by representing each protein as a vector of structure features or fingerprints and only computing similarity between vectors. As a notable example, FragBag represents each protein by a ‘bag of fragments’, which is a vector of frequencies of contiguous short backbone fragments from a predetermined library. Despite being efficient, the accuracy of FragBag is unsatisfactory because its backbone fragment library may not be optimally constructed and long-range interacting patterns are omitted. RESULTS: Here we present a new approach to learning effective structural motif presentations using deep learning. We develop DeepFold, a deep convolutional neural network model to extract structural motif features of a protein structure. We demonstrate that DeepFold substantially outperforms FragBag on protein structural search on a non-redundant protein structure database and a set of newly released structures. Remarkably, DeepFold not only extracts meaningful backbone segments but also finds important long-range interacting motifs for structural comparison. We expect that DeepFold will provide new insights into the evolution and hierarchical organization of protein structural motifs. AVAILABILITY AND IMPLEMENTATION: https://github.com/largelymfs/DeepFold Oxford University Press 2018-09-01 2018-09-08 /pmc/articles/PMC6129266/ /pubmed/30423083 http://dx.doi.org/10.1093/bioinformatics/bty585 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Eccb 2018: European Conference on Computational Biology Proceedings
Liu, Yang
Ye, Qing
Wang, Liwei
Peng, Jian
Learning structural motif representations for efficient protein structure search
title Learning structural motif representations for efficient protein structure search
title_full Learning structural motif representations for efficient protein structure search
title_fullStr Learning structural motif representations for efficient protein structure search
title_full_unstemmed Learning structural motif representations for efficient protein structure search
title_short Learning structural motif representations for efficient protein structure search
title_sort learning structural motif representations for efficient protein structure search
topic Eccb 2018: European Conference on Computational Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6129266/
https://www.ncbi.nlm.nih.gov/pubmed/30423083
http://dx.doi.org/10.1093/bioinformatics/bty585
work_keys_str_mv AT liuyang learningstructuralmotifrepresentationsforefficientproteinstructuresearch
AT yeqing learningstructuralmotifrepresentationsforefficientproteinstructuresearch
AT wangliwei learningstructuralmotifrepresentationsforefficientproteinstructuresearch
AT pengjian learningstructuralmotifrepresentationsforefficientproteinstructuresearch