Cargando…

Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction

Motivation: Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as...

Descripción completa

Detalles Bibliográficos
Autores principales: Handl, Julia, Knowles, Joshua, Lovell, Simon C.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677743/
https://www.ncbi.nlm.nih.gov/pubmed/19297350
http://dx.doi.org/10.1093/bioinformatics/btp150
_version_ 1782166796003966976
author Handl, Julia
Knowles, Joshua
Lovell, Simon C.
author_facet Handl, Julia
Knowles, Joshua
Lovell, Simon C.
author_sort Handl, Julia
collection PubMed
description Motivation: Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental studies. Results: We find that artefacts and sampling issues in the large majority of these data make it trivial to discriminate the native structure. This underlines that evaluation based on the rank/z-score of the native is a weak test of scoring function performance. Moreover, sampling biases present in the way decoy sets are generated or used can strongly affect other types of evaluation measures such as the correlation between score and root mean squared deviation (RMSD) to the native. We demonstrate how, depending on type of bias and evaluation context, sampling biases may lead to both over- or under-estimation of the quality of scoring terms, functions or methods. Availability: Links to the software and data used in this study are available at http://dbkgroup.org/handl/decoy_sets. Contact: simon.lovell@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online.
format Text
id pubmed-2677743
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-26777432009-05-08 Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction Handl, Julia Knowles, Joshua Lovell, Simon C. Bioinformatics Original Papers Motivation: Decoy datasets, consisting of a solved protein structure and numerous alternative native-like structures, are in common use for the evaluation of scoring functions in protein structure prediction. Several pitfalls with the use of these datasets have been identified in the literature, as well as useful guidelines for generating more effective decoy datasets. We contribute to this ongoing discussion an empirical assessment of several decoy datasets commonly used in experimental studies. Results: We find that artefacts and sampling issues in the large majority of these data make it trivial to discriminate the native structure. This underlines that evaluation based on the rank/z-score of the native is a weak test of scoring function performance. Moreover, sampling biases present in the way decoy sets are generated or used can strongly affect other types of evaluation measures such as the correlation between score and root mean squared deviation (RMSD) to the native. We demonstrate how, depending on type of bias and evaluation context, sampling biases may lead to both over- or under-estimation of the quality of scoring terms, functions or methods. Availability: Links to the software and data used in this study are available at http://dbkgroup.org/handl/decoy_sets. Contact: simon.lovell@manchester.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2009-05-15 2009-03-17 /pmc/articles/PMC2677743/ /pubmed/19297350 http://dx.doi.org/10.1093/bioinformatics/btp150 Text en © 2009 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Handl, Julia
Knowles, Joshua
Lovell, Simon C.
Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction
title Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction
title_full Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction
title_fullStr Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction
title_full_unstemmed Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction
title_short Artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction
title_sort artefacts and biases affecting the evaluation of scoring functions on decoy sets for protein structure prediction
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2677743/
https://www.ncbi.nlm.nih.gov/pubmed/19297350
http://dx.doi.org/10.1093/bioinformatics/btp150
work_keys_str_mv AT handljulia artefactsandbiasesaffectingtheevaluationofscoringfunctionsondecoysetsforproteinstructureprediction
AT knowlesjoshua artefactsandbiasesaffectingtheevaluationofscoringfunctionsondecoysetsforproteinstructureprediction
AT lovellsimonc artefactsandbiasesaffectingtheevaluationofscoringfunctionsondecoysetsforproteinstructureprediction