Cargando…

EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction

BACKGROUND: Accurately predicted contacts allow to compute the 3D structure of a protein. Since the solution space of native residue-residue contact pairs is very large, it is necessary to leverage information to identify relevant regions of the solution space, i.e. correct contacts. Every additiona...

Descripción completa

Detalles Bibliográficos
Autores principales: Stahl, Kolja, Schneider, Michael, Brock, Oliver
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5474060/
https://www.ncbi.nlm.nih.gov/pubmed/28623886
http://dx.doi.org/10.1186/s12859-017-1713-x
_version_ 1783244402651561984
author Stahl, Kolja
Schneider, Michael
Brock, Oliver
author_facet Stahl, Kolja
Schneider, Michael
Brock, Oliver
author_sort Stahl, Kolja
collection PubMed
description BACKGROUND: Accurately predicted contacts allow to compute the 3D structure of a protein. Since the solution space of native residue-residue contact pairs is very large, it is necessary to leverage information to identify relevant regions of the solution space, i.e. correct contacts. Every additional source of information can contribute to narrowing down candidate regions. Therefore, recent methods combined evolutionary and sequence-based information as well as evolutionary and physicochemical information. We develop a new contact predictor (EPSILON-CP) that goes beyond current methods by combining evolutionary, physicochemical, and sequence-based information. The problems resulting from the increased dimensionality and complexity of the learning problem are combated with a careful feature analysis, which results in a drastically reduced feature set. The different information sources are combined using deep neural networks. RESULTS: On 21 hard CASP11 FM targets, EPSILON-CP achieves a mean precision of 35.7% for top- L/10 predicted long-range contacts, which is 11% better than the CASP11 winning version of MetaPSICOV. The improvement on 1.5L is 17%. Furthermore, in this study we find that the amino acid composition, a commonly used feature, is rendered ineffective in the context of meta approaches. The size of the refined feature set decreased by 75%, enabling a significant increase in training data for machine learning, contributing significantly to the observed improvements. CONCLUSIONS: Exploiting as much and diverse information as possible is key to accurate contact prediction. Simply merging the information introduces new challenges. Our study suggests that critical feature analysis can improve the performance of contact prediction methods that combine multiple information sources. EPSILON-CP is available as a webservice: http://compbio.robotics.tu-berlin.de/epsilon/ ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1713-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5474060
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54740602017-06-21 EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction Stahl, Kolja Schneider, Michael Brock, Oliver BMC Bioinformatics Research Article BACKGROUND: Accurately predicted contacts allow to compute the 3D structure of a protein. Since the solution space of native residue-residue contact pairs is very large, it is necessary to leverage information to identify relevant regions of the solution space, i.e. correct contacts. Every additional source of information can contribute to narrowing down candidate regions. Therefore, recent methods combined evolutionary and sequence-based information as well as evolutionary and physicochemical information. We develop a new contact predictor (EPSILON-CP) that goes beyond current methods by combining evolutionary, physicochemical, and sequence-based information. The problems resulting from the increased dimensionality and complexity of the learning problem are combated with a careful feature analysis, which results in a drastically reduced feature set. The different information sources are combined using deep neural networks. RESULTS: On 21 hard CASP11 FM targets, EPSILON-CP achieves a mean precision of 35.7% for top- L/10 predicted long-range contacts, which is 11% better than the CASP11 winning version of MetaPSICOV. The improvement on 1.5L is 17%. Furthermore, in this study we find that the amino acid composition, a commonly used feature, is rendered ineffective in the context of meta approaches. The size of the refined feature set decreased by 75%, enabling a significant increase in training data for machine learning, contributing significantly to the observed improvements. CONCLUSIONS: Exploiting as much and diverse information as possible is key to accurate contact prediction. Simply merging the information introduces new challenges. Our study suggests that critical feature analysis can improve the performance of contact prediction methods that combine multiple information sources. EPSILON-CP is available as a webservice: http://compbio.robotics.tu-berlin.de/epsilon/ ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1713-x) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-17 /pmc/articles/PMC5474060/ /pubmed/28623886 http://dx.doi.org/10.1186/s12859-017-1713-x Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Stahl, Kolja
Schneider, Michael
Brock, Oliver
EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction
title EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction
title_full EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction
title_fullStr EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction
title_full_unstemmed EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction
title_short EPSILON-CP: using deep learning to combine information from multiple sources for protein contact prediction
title_sort epsilon-cp: using deep learning to combine information from multiple sources for protein contact prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5474060/
https://www.ncbi.nlm.nih.gov/pubmed/28623886
http://dx.doi.org/10.1186/s12859-017-1713-x
work_keys_str_mv AT stahlkolja epsiloncpusingdeeplearningtocombineinformationfrommultiplesourcesforproteincontactprediction
AT schneidermichael epsiloncpusingdeeplearningtocombineinformationfrommultiplesourcesforproteincontactprediction
AT brockoliver epsiloncpusingdeeplearningtocombineinformationfrommultiplesourcesforproteincontactprediction