Cargando…
A simplified approach to disulfide connectivity prediction from protein sequences
BACKGROUND: Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375136/ https://www.ncbi.nlm.nih.gov/pubmed/18194539 http://dx.doi.org/10.1186/1471-2105-9-20 |
_version_ | 1782154585967689728 |
---|---|
author | Vincent, Marc Passerini, Andrea Labbé, Matthieu Frasconi, Paolo |
author_facet | Vincent, Marc Passerini, Andrea Labbé, Matthieu Frasconi, Paolo |
author_sort | Vincent, Marc |
collection | PubMed |
description | BACKGROUND: Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity. RESULTS: We introduce new methods for predicting disulfide bridges from protein sequences. The methods take advantage of two new decomposition kernels for measuring the similarity between protein sequences according to the amino acid environments around cysteines. Disulfide connectivity is predicted in two passes. First, a binary classifier is trained to predict whether a given protein chain has at least one intra-chain disulfide bridge. Second, a multiclass classifier (plemented by 1-nearest neighbor) is trained to predict connectivity patterns. The two passes can be easily cascaded to obtain connectivity prediction from sequence alone. We report an extensive experimental comparison on several data sets that have been previously employed in the literature to assess the accuracy of cysteine bonding state and disulfide connectivity predictors. CONCLUSION: We reach state-of-the-art results on bonding state prediction with a simple method that classifies chains rather than individual residues. The prediction accuracy reached by our connectivity prediction method compares favorably with respect to all but the most complex other approaches. On the other hand, our method does not need any model selection or hyperparameter tuning, a property that makes it less prone to overfitting and prediction accuracy overestimation. |
format | Text |
id | pubmed-2375136 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-23751362008-05-09 A simplified approach to disulfide connectivity prediction from protein sequences Vincent, Marc Passerini, Andrea Labbé, Matthieu Frasconi, Paolo BMC Bioinformatics Methodology Article BACKGROUND: Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity. RESULTS: We introduce new methods for predicting disulfide bridges from protein sequences. The methods take advantage of two new decomposition kernels for measuring the similarity between protein sequences according to the amino acid environments around cysteines. Disulfide connectivity is predicted in two passes. First, a binary classifier is trained to predict whether a given protein chain has at least one intra-chain disulfide bridge. Second, a multiclass classifier (plemented by 1-nearest neighbor) is trained to predict connectivity patterns. The two passes can be easily cascaded to obtain connectivity prediction from sequence alone. We report an extensive experimental comparison on several data sets that have been previously employed in the literature to assess the accuracy of cysteine bonding state and disulfide connectivity predictors. CONCLUSION: We reach state-of-the-art results on bonding state prediction with a simple method that classifies chains rather than individual residues. The prediction accuracy reached by our connectivity prediction method compares favorably with respect to all but the most complex other approaches. On the other hand, our method does not need any model selection or hyperparameter tuning, a property that makes it less prone to overfitting and prediction accuracy overestimation. BioMed Central 2008-01-14 /pmc/articles/PMC2375136/ /pubmed/18194539 http://dx.doi.org/10.1186/1471-2105-9-20 Text en Copyright © 2008 Vincent et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Vincent, Marc Passerini, Andrea Labbé, Matthieu Frasconi, Paolo A simplified approach to disulfide connectivity prediction from protein sequences |
title | A simplified approach to disulfide connectivity prediction from protein sequences |
title_full | A simplified approach to disulfide connectivity prediction from protein sequences |
title_fullStr | A simplified approach to disulfide connectivity prediction from protein sequences |
title_full_unstemmed | A simplified approach to disulfide connectivity prediction from protein sequences |
title_short | A simplified approach to disulfide connectivity prediction from protein sequences |
title_sort | simplified approach to disulfide connectivity prediction from protein sequences |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375136/ https://www.ncbi.nlm.nih.gov/pubmed/18194539 http://dx.doi.org/10.1186/1471-2105-9-20 |
work_keys_str_mv | AT vincentmarc asimplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences AT passeriniandrea asimplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences AT labbematthieu asimplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences AT frasconipaolo asimplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences AT vincentmarc simplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences AT passeriniandrea simplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences AT labbematthieu simplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences AT frasconipaolo simplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences |