Cargando…

RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold

While template-free protein structure prediction protocols now produce good quality models for many targets, modelling failure remains common. For these methods to be useful it is important that users can both choose the best model from the hundreds to thousands of models that are commonly generated...

Descripción completa

Detalles Bibliográficos
Autores principales:	West, Clare E., de Oliveira, Saulo H. P., Deane, Charlotte M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2019
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6802825/ https://www.ncbi.nlm.nih.gov/pubmed/31634369 http://dx.doi.org/10.1371/journal.pone.0218149

_version_	1783460866254962688
author	West, Clare E. de Oliveira, Saulo H. P. Deane, Charlotte M.
author_facet	West, Clare E. de Oliveira, Saulo H. P. Deane, Charlotte M.
author_sort	West, Clare E.
collection	PubMed
description	While template-free protein structure prediction protocols now produce good quality models for many targets, modelling failure remains common. For these methods to be useful it is important that users can both choose the best model from the hundreds to thousands of models that are commonly generated for a target, and determine whether this model is likely to be correct. We have developed Random Forest Quality Assessment (RFQAmodel), which assesses whether models produced by a protein structure prediction pipeline have the correct fold. RFQAmodel uses a combination of existing quality assessment scores with two predicted contact map alignment scores. These alignment scores are able to identify correct models for targets that are not otherwise captured. Our classifier was trained on a large set of protein domains that are structurally diverse and evenly balanced in terms of protein features known to have an effect on modelling success, and then tested on a second set of 244 protein domains with a similar spread of properties. When models for each target in this second set were ranked according to the RFQAmodel score, the highest-ranking model had a high-confidence RFQAmodel score for 67 modelling targets, of which 52 had the correct fold. At the other end of the scale RFQAmodel correctly predicted that for 59 targets the highest-ranked model was incorrect. In comparisons to other methods we found that RFQAmodel is better able to identify correct models for targets where only a few of the models are correct. We found that RFQAmodel achieved a similar performance on the model sets for CASP12 and CASP13 free-modelling targets. Finally, by iteratively generating models and running RFQAmodel until a model is produced that is predicted to be correct with high confidence, we demonstrate how such a protocol can be used to focus computational efforts on difficult modelling targets. RFQAmodel and the accompanying data can be downloaded from http://opig.stats.ox.ac.uk/resources.
format	Online Article Text
id	pubmed-6802825
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-68028252019-11-02 RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold West, Clare E. de Oliveira, Saulo H. P. Deane, Charlotte M. PLoS One Research Article While template-free protein structure prediction protocols now produce good quality models for many targets, modelling failure remains common. For these methods to be useful it is important that users can both choose the best model from the hundreds to thousands of models that are commonly generated for a target, and determine whether this model is likely to be correct. We have developed Random Forest Quality Assessment (RFQAmodel), which assesses whether models produced by a protein structure prediction pipeline have the correct fold. RFQAmodel uses a combination of existing quality assessment scores with two predicted contact map alignment scores. These alignment scores are able to identify correct models for targets that are not otherwise captured. Our classifier was trained on a large set of protein domains that are structurally diverse and evenly balanced in terms of protein features known to have an effect on modelling success, and then tested on a second set of 244 protein domains with a similar spread of properties. When models for each target in this second set were ranked according to the RFQAmodel score, the highest-ranking model had a high-confidence RFQAmodel score for 67 modelling targets, of which 52 had the correct fold. At the other end of the scale RFQAmodel correctly predicted that for 59 targets the highest-ranked model was incorrect. In comparisons to other methods we found that RFQAmodel is better able to identify correct models for targets where only a few of the models are correct. We found that RFQAmodel achieved a similar performance on the model sets for CASP12 and CASP13 free-modelling targets. Finally, by iteratively generating models and running RFQAmodel until a model is produced that is predicted to be correct with high confidence, we demonstrate how such a protocol can be used to focus computational efforts on difficult modelling targets. RFQAmodel and the accompanying data can be downloaded from http://opig.stats.ox.ac.uk/resources. Public Library of Science 2019-10-21 /pmc/articles/PMC6802825/ /pubmed/31634369 http://dx.doi.org/10.1371/journal.pone.0218149 Text en © 2019 West et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article West, Clare E. de Oliveira, Saulo H. P. Deane, Charlotte M. RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold
title	RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold
title_full	RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold
title_fullStr	RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold
title_full_unstemmed	RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold
title_short	RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold
title_sort	rfqamodel: random forest quality assessment to identify a predicted protein structure in the correct fold
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6802825/ https://www.ncbi.nlm.nih.gov/pubmed/31634369 http://dx.doi.org/10.1371/journal.pone.0218149
work_keys_str_mv	AT westclaree rfqamodelrandomforestqualityassessmenttoidentifyapredictedproteinstructureinthecorrectfold AT deoliveirasaulohp rfqamodelrandomforestqualityassessmenttoidentifyapredictedproteinstructureinthecorrectfold AT deanecharlottem rfqamodelrandomforestqualityassessmenttoidentifyapredictedproteinstructureinthecorrectfold

RFQAmodel: Random Forest Quality Assessment to identify a predicted protein structure in the correct fold

Ejemplares similares