Cargando…

Predicting protein model correctness in Coot using machine learning

Manually identifying and correcting errors in protein models can be a slow process, but improvements in validation tools and automated model-building software can contribute to reducing this burden. This article presents a new correctness score that is produced by combining multiple sources of infor...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bond, Paul S., Wilson, Keith S., Cowtan, Kevin D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	International Union of Crystallography 2020
Materias:	Ccp4
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7397494/ https://www.ncbi.nlm.nih.gov/pubmed/32744253 http://dx.doi.org/10.1107/S2059798320009080

_version_	1783565786795737088
author	Bond, Paul S. Wilson, Keith S. Cowtan, Kevin D.
author_facet	Bond, Paul S. Wilson, Keith S. Cowtan, Kevin D.
author_sort	Bond, Paul S.
collection	PubMed
description	Manually identifying and correcting errors in protein models can be a slow process, but improvements in validation tools and automated model-building software can contribute to reducing this burden. This article presents a new correctness score that is produced by combining multiple sources of information using a neural network. The residues in 639 automatically built models were marked as correct or incorrect by comparing them with the coordinates deposited in the PDB. A number of features were also calculated for each residue using Coot, including map-to-model correlation, density values, B factors, clashes, Ramachandran scores, rotamer scores and resolution. Two neural networks were created using these features as inputs: one to predict the correctness of main-chain atoms and the other for side chains. The 639 structures were split into 511 that were used to train the neural networks and 128 that were used to test performance. The predicted correctness scores could correctly categorize 92.3% of the main-chain atoms and 87.6% of the side chains. A Coot ML Correctness script was written to display the scores in a graphical user interface as well as for the automatic pruning of chains, residues and side chains with low scores. The automatic pruning function was added to the CCP4i2 Buccaneer automated model-building pipeline, leading to significant improvements, especially for high-resolution structures.
format	Online Article Text
id	pubmed-7397494
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	International Union of Crystallography
record_format	MEDLINE/PubMed
spelling	pubmed-73974942020-08-11 Predicting protein model correctness in Coot using machine learning Bond, Paul S. Wilson, Keith S. Cowtan, Kevin D. Acta Crystallogr D Struct Biol Ccp4 Manually identifying and correcting errors in protein models can be a slow process, but improvements in validation tools and automated model-building software can contribute to reducing this burden. This article presents a new correctness score that is produced by combining multiple sources of information using a neural network. The residues in 639 automatically built models were marked as correct or incorrect by comparing them with the coordinates deposited in the PDB. A number of features were also calculated for each residue using Coot, including map-to-model correlation, density values, B factors, clashes, Ramachandran scores, rotamer scores and resolution. Two neural networks were created using these features as inputs: one to predict the correctness of main-chain atoms and the other for side chains. The 639 structures were split into 511 that were used to train the neural networks and 128 that were used to test performance. The predicted correctness scores could correctly categorize 92.3% of the main-chain atoms and 87.6% of the side chains. A Coot ML Correctness script was written to display the scores in a graphical user interface as well as for the automatic pruning of chains, residues and side chains with low scores. The automatic pruning function was added to the CCP4i2 Buccaneer automated model-building pipeline, leading to significant improvements, especially for high-resolution structures. International Union of Crystallography 2020-07-27 /pmc/articles/PMC7397494/ /pubmed/32744253 http://dx.doi.org/10.1107/S2059798320009080 Text en © Bond et al. 2020 http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.http://creativecommons.org/licenses/by/4.0/
spellingShingle	Ccp4 Bond, Paul S. Wilson, Keith S. Cowtan, Kevin D. Predicting protein model correctness in Coot using machine learning
title	Predicting protein model correctness in Coot using machine learning
title_full	Predicting protein model correctness in Coot using machine learning
title_fullStr	Predicting protein model correctness in Coot using machine learning
title_full_unstemmed	Predicting protein model correctness in Coot using machine learning
title_short	Predicting protein model correctness in Coot using machine learning
title_sort	predicting protein model correctness in coot using machine learning
topic	Ccp4
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7397494/ https://www.ncbi.nlm.nih.gov/pubmed/32744253 http://dx.doi.org/10.1107/S2059798320009080
work_keys_str_mv	AT bondpauls predictingproteinmodelcorrectnessincootusingmachinelearning AT wilsonkeiths predictingproteinmodelcorrectnessincootusingmachinelearning AT cowtankevind predictingproteinmodelcorrectnessincootusingmachinelearning

Predicting protein model correctness in Coot using machine learning

Ejemplares similares