Cargando…

A simplified approach to disulfide connectivity prediction from protein sequences

BACKGROUND: Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These...

Descripción completa

Detalles Bibliográficos
Autores principales: Vincent, Marc, Passerini, Andrea, Labbé, Matthieu, Frasconi, Paolo
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375136/
https://www.ncbi.nlm.nih.gov/pubmed/18194539
http://dx.doi.org/10.1186/1471-2105-9-20
_version_ 1782154585967689728
author Vincent, Marc
Passerini, Andrea
Labbé, Matthieu
Frasconi, Paolo
author_facet Vincent, Marc
Passerini, Andrea
Labbé, Matthieu
Frasconi, Paolo
author_sort Vincent, Marc
collection PubMed
description BACKGROUND: Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity. RESULTS: We introduce new methods for predicting disulfide bridges from protein sequences. The methods take advantage of two new decomposition kernels for measuring the similarity between protein sequences according to the amino acid environments around cysteines. Disulfide connectivity is predicted in two passes. First, a binary classifier is trained to predict whether a given protein chain has at least one intra-chain disulfide bridge. Second, a multiclass classifier (plemented by 1-nearest neighbor) is trained to predict connectivity patterns. The two passes can be easily cascaded to obtain connectivity prediction from sequence alone. We report an extensive experimental comparison on several data sets that have been previously employed in the literature to assess the accuracy of cysteine bonding state and disulfide connectivity predictors. CONCLUSION: We reach state-of-the-art results on bonding state prediction with a simple method that classifies chains rather than individual residues. The prediction accuracy reached by our connectivity prediction method compares favorably with respect to all but the most complex other approaches. On the other hand, our method does not need any model selection or hyperparameter tuning, a property that makes it less prone to overfitting and prediction accuracy overestimation.
format Text
id pubmed-2375136
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-23751362008-05-09 A simplified approach to disulfide connectivity prediction from protein sequences Vincent, Marc Passerini, Andrea Labbé, Matthieu Frasconi, Paolo BMC Bioinformatics Methodology Article BACKGROUND: Prediction of disulfide bridges from protein sequences is useful for characterizing structural and functional properties of proteins. Several methods based on different machine learning algorithms have been applied to solve this problem and public domain prediction services exist. These methods are however still potentially subject to significant improvements both in terms of prediction accuracy and overall architectural complexity. RESULTS: We introduce new methods for predicting disulfide bridges from protein sequences. The methods take advantage of two new decomposition kernels for measuring the similarity between protein sequences according to the amino acid environments around cysteines. Disulfide connectivity is predicted in two passes. First, a binary classifier is trained to predict whether a given protein chain has at least one intra-chain disulfide bridge. Second, a multiclass classifier (plemented by 1-nearest neighbor) is trained to predict connectivity patterns. The two passes can be easily cascaded to obtain connectivity prediction from sequence alone. We report an extensive experimental comparison on several data sets that have been previously employed in the literature to assess the accuracy of cysteine bonding state and disulfide connectivity predictors. CONCLUSION: We reach state-of-the-art results on bonding state prediction with a simple method that classifies chains rather than individual residues. The prediction accuracy reached by our connectivity prediction method compares favorably with respect to all but the most complex other approaches. On the other hand, our method does not need any model selection or hyperparameter tuning, a property that makes it less prone to overfitting and prediction accuracy overestimation. BioMed Central 2008-01-14 /pmc/articles/PMC2375136/ /pubmed/18194539 http://dx.doi.org/10.1186/1471-2105-9-20 Text en Copyright © 2008 Vincent et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Vincent, Marc
Passerini, Andrea
Labbé, Matthieu
Frasconi, Paolo
A simplified approach to disulfide connectivity prediction from protein sequences
title A simplified approach to disulfide connectivity prediction from protein sequences
title_full A simplified approach to disulfide connectivity prediction from protein sequences
title_fullStr A simplified approach to disulfide connectivity prediction from protein sequences
title_full_unstemmed A simplified approach to disulfide connectivity prediction from protein sequences
title_short A simplified approach to disulfide connectivity prediction from protein sequences
title_sort simplified approach to disulfide connectivity prediction from protein sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2375136/
https://www.ncbi.nlm.nih.gov/pubmed/18194539
http://dx.doi.org/10.1186/1471-2105-9-20
work_keys_str_mv AT vincentmarc asimplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences
AT passeriniandrea asimplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences
AT labbematthieu asimplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences
AT frasconipaolo asimplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences
AT vincentmarc simplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences
AT passeriniandrea simplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences
AT labbematthieu simplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences
AT frasconipaolo simplifiedapproachtodisulfideconnectivitypredictionfromproteinsequences