Cargando…

Protein profiles: Biases and protocols

The use of evolutionary profiles to predict protein secondary structure, as well as other protein structural features, has been standard practice since the 1990s. Using profiles in the input of such predictors, in place or in addition to the sequence itself, leads to significantly more accurate pred...

Descripción completa

Detalles Bibliográficos
Autores principales: Urban, Gregor, Torrisi, Mirko, Magnan, Christophe N., Pollastri, Gianluca, Baldi, Pierre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7486441/
https://www.ncbi.nlm.nih.gov/pubmed/32994887
http://dx.doi.org/10.1016/j.csbj.2020.08.015
_version_ 1783581336589565952
author Urban, Gregor
Torrisi, Mirko
Magnan, Christophe N.
Pollastri, Gianluca
Baldi, Pierre
author_facet Urban, Gregor
Torrisi, Mirko
Magnan, Christophe N.
Pollastri, Gianluca
Baldi, Pierre
author_sort Urban, Gregor
collection PubMed
description The use of evolutionary profiles to predict protein secondary structure, as well as other protein structural features, has been standard practice since the 1990s. Using profiles in the input of such predictors, in place or in addition to the sequence itself, leads to significantly more accurate predictions. While profiles can enhance structural signals, their role remains somewhat surprising as proteins do not use profiles when folding in vivo. Furthermore, the same sequence-based redundancy reduction protocols initially derived to train and evaluate sequence-based predictors, have been applied to train and evaluate profile-based predictors. This can lead to unfair comparisons since profiles may facilitate the bleeding of information between training and test sets. Here we use the extensively studied problem of secondary structure prediction to better evaluate the role of profiles and show that: (1) high levels of profile similarity between training and test proteins are observed when using standard sequence-based redundancy protocols; (2) the gain in accuracy for profile-based predictors, over sequence-based predictors, strongly relies on these high levels of profile similarity between training and test proteins; and (3) the overall accuracy of a profile-based predictor on a given protein dataset provides a biased measure when trying to estimate the actual accuracy of the predictor, or when comparing it to other predictors. We show, however, that this bias can be mitigated by implementing a new protocol (EVALpro) which evaluates the accuracy of profile-based predictors as a function of the profile similarity between training and test proteins. Such a protocol not only allows for a fair comparison of the predictors on equally hard or easy examples, but also reduces the impact of choosing a given similarity cutoff when selecting test proteins. The EVALpro program is available in the SCRATCH suite ( www.scratch.proteomics.ics.uci.edu) and can be downloaded at: www.download.igb.uci.edu/#evalpro.
format Online
Article
Text
id pubmed-7486441
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-74864412020-09-28 Protein profiles: Biases and protocols Urban, Gregor Torrisi, Mirko Magnan, Christophe N. Pollastri, Gianluca Baldi, Pierre Comput Struct Biotechnol J Research Article The use of evolutionary profiles to predict protein secondary structure, as well as other protein structural features, has been standard practice since the 1990s. Using profiles in the input of such predictors, in place or in addition to the sequence itself, leads to significantly more accurate predictions. While profiles can enhance structural signals, their role remains somewhat surprising as proteins do not use profiles when folding in vivo. Furthermore, the same sequence-based redundancy reduction protocols initially derived to train and evaluate sequence-based predictors, have been applied to train and evaluate profile-based predictors. This can lead to unfair comparisons since profiles may facilitate the bleeding of information between training and test sets. Here we use the extensively studied problem of secondary structure prediction to better evaluate the role of profiles and show that: (1) high levels of profile similarity between training and test proteins are observed when using standard sequence-based redundancy protocols; (2) the gain in accuracy for profile-based predictors, over sequence-based predictors, strongly relies on these high levels of profile similarity between training and test proteins; and (3) the overall accuracy of a profile-based predictor on a given protein dataset provides a biased measure when trying to estimate the actual accuracy of the predictor, or when comparing it to other predictors. We show, however, that this bias can be mitigated by implementing a new protocol (EVALpro) which evaluates the accuracy of profile-based predictors as a function of the profile similarity between training and test proteins. Such a protocol not only allows for a fair comparison of the predictors on equally hard or easy examples, but also reduces the impact of choosing a given similarity cutoff when selecting test proteins. The EVALpro program is available in the SCRATCH suite ( www.scratch.proteomics.ics.uci.edu) and can be downloaded at: www.download.igb.uci.edu/#evalpro. Research Network of Computational and Structural Biotechnology 2020-08-27 /pmc/articles/PMC7486441/ /pubmed/32994887 http://dx.doi.org/10.1016/j.csbj.2020.08.015 Text en © 2020 The Author(s) http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Urban, Gregor
Torrisi, Mirko
Magnan, Christophe N.
Pollastri, Gianluca
Baldi, Pierre
Protein profiles: Biases and protocols
title Protein profiles: Biases and protocols
title_full Protein profiles: Biases and protocols
title_fullStr Protein profiles: Biases and protocols
title_full_unstemmed Protein profiles: Biases and protocols
title_short Protein profiles: Biases and protocols
title_sort protein profiles: biases and protocols
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7486441/
https://www.ncbi.nlm.nih.gov/pubmed/32994887
http://dx.doi.org/10.1016/j.csbj.2020.08.015
work_keys_str_mv AT urbangregor proteinprofilesbiasesandprotocols
AT torrisimirko proteinprofilesbiasesandprotocols
AT magnanchristophen proteinprofilesbiasesandprotocols
AT pollastrigianluca proteinprofilesbiasesandprotocols
AT baldipierre proteinprofilesbiasesandprotocols