Cargando…
3D deep convolutional neural networks for amino acid environment similarity analysis
BACKGROUND: Central to protein biology is the understanding of how structural elements give rise to observed function. The surfeit of protein structural data enables development of computational methods to systematically derive rules governing structural-functional relationships. However, performanc...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5472009/ https://www.ncbi.nlm.nih.gov/pubmed/28615003 http://dx.doi.org/10.1186/s12859-017-1702-0 |
_version_ | 1783244066698297344 |
---|---|
author | Torng, Wen Altman, Russ B. |
author_facet | Torng, Wen Altman, Russ B. |
author_sort | Torng, Wen |
collection | PubMed |
description | BACKGROUND: Central to protein biology is the understanding of how structural elements give rise to observed function. The surfeit of protein structural data enables development of computational methods to systematically derive rules governing structural-functional relationships. However, performance of these methods depends critically on the choice of protein structural representation. Most current methods rely on features that are manually selected based on knowledge about protein structures. These are often general-purpose but not optimized for the specific application of interest. In this paper, we present a general framework that applies 3D convolutional neural network (3DCNN) technology to structure-based protein analysis. The framework automatically extracts task-specific features from the raw atom distribution, driven by supervised labels. As a pilot study, we use our network to analyze local protein microenvironments surrounding the 20 amino acids, and predict the amino acids most compatible with environments within a protein structure. To further validate the power of our method, we construct two amino acid substitution matrices from the prediction statistics and use them to predict effects of mutations in T4 lysozyme structures. RESULTS: Our deep 3DCNN achieves a two-fold increase in prediction accuracy compared to models that employ conventional hand-engineered features and successfully recapitulates known information about similar and different microenvironments. Models built from our predictions and substitution matrices achieve an 85% accuracy predicting outcomes of the T4 lysozyme mutation variants. Our substitution matrices contain rich information relevant to mutation analysis compared to well-established substitution matrices. Finally, we present a visualization method to inspect the individual contributions of each atom to the classification decisions. CONCLUSIONS: End-to-end trained deep learning networks consistently outperform methods using hand-engineered features, suggesting that the 3DCNN framework is well suited for analysis of protein microenvironments and may be useful for other protein structural analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1702-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5472009 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-54720092017-06-19 3D deep convolutional neural networks for amino acid environment similarity analysis Torng, Wen Altman, Russ B. BMC Bioinformatics Methodology Article BACKGROUND: Central to protein biology is the understanding of how structural elements give rise to observed function. The surfeit of protein structural data enables development of computational methods to systematically derive rules governing structural-functional relationships. However, performance of these methods depends critically on the choice of protein structural representation. Most current methods rely on features that are manually selected based on knowledge about protein structures. These are often general-purpose but not optimized for the specific application of interest. In this paper, we present a general framework that applies 3D convolutional neural network (3DCNN) technology to structure-based protein analysis. The framework automatically extracts task-specific features from the raw atom distribution, driven by supervised labels. As a pilot study, we use our network to analyze local protein microenvironments surrounding the 20 amino acids, and predict the amino acids most compatible with environments within a protein structure. To further validate the power of our method, we construct two amino acid substitution matrices from the prediction statistics and use them to predict effects of mutations in T4 lysozyme structures. RESULTS: Our deep 3DCNN achieves a two-fold increase in prediction accuracy compared to models that employ conventional hand-engineered features and successfully recapitulates known information about similar and different microenvironments. Models built from our predictions and substitution matrices achieve an 85% accuracy predicting outcomes of the T4 lysozyme mutation variants. Our substitution matrices contain rich information relevant to mutation analysis compared to well-established substitution matrices. Finally, we present a visualization method to inspect the individual contributions of each atom to the classification decisions. CONCLUSIONS: End-to-end trained deep learning networks consistently outperform methods using hand-engineered features, suggesting that the 3DCNN framework is well suited for analysis of protein microenvironments and may be useful for other protein structural analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1702-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-14 /pmc/articles/PMC5472009/ /pubmed/28615003 http://dx.doi.org/10.1186/s12859-017-1702-0 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Torng, Wen Altman, Russ B. 3D deep convolutional neural networks for amino acid environment similarity analysis |
title | 3D deep convolutional neural networks for amino acid environment similarity analysis |
title_full | 3D deep convolutional neural networks for amino acid environment similarity analysis |
title_fullStr | 3D deep convolutional neural networks for amino acid environment similarity analysis |
title_full_unstemmed | 3D deep convolutional neural networks for amino acid environment similarity analysis |
title_short | 3D deep convolutional neural networks for amino acid environment similarity analysis |
title_sort | 3d deep convolutional neural networks for amino acid environment similarity analysis |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5472009/ https://www.ncbi.nlm.nih.gov/pubmed/28615003 http://dx.doi.org/10.1186/s12859-017-1702-0 |
work_keys_str_mv | AT torngwen 3ddeepconvolutionalneuralnetworksforaminoacidenvironmentsimilarityanalysis AT altmanrussb 3ddeepconvolutionalneuralnetworksforaminoacidenvironmentsimilarityanalysis |