Cargando…

The predictive power of data-processing statistics

This study describes a method to estimate the likelihood of success in determining a macromolecular structure by X-ray crystallography and experimental single-wavelength anomalous dispersion (SAD) or multiple-wavelength anomalous dispersion (MAD) phasing based on initial data-processing statistics a...

Descripción completa

Detalles Bibliográficos
Autores principales: Vollmar, Melanie, Parkhurst, James M., Jaques, Dominic, Baslé, Arnaud, Murshudov, Garib N., Waterman, David G., Evans, Gwyndaf
Formato: Online Artículo Texto
Lenguaje:English
Publicado: International Union of Crystallography 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7055369/
https://www.ncbi.nlm.nih.gov/pubmed/32148861
http://dx.doi.org/10.1107/S2052252520000895
_version_ 1783503362754347008
author Vollmar, Melanie
Parkhurst, James M.
Jaques, Dominic
Baslé, Arnaud
Murshudov, Garib N.
Waterman, David G.
Evans, Gwyndaf
author_facet Vollmar, Melanie
Parkhurst, James M.
Jaques, Dominic
Baslé, Arnaud
Murshudov, Garib N.
Waterman, David G.
Evans, Gwyndaf
author_sort Vollmar, Melanie
collection PubMed
description This study describes a method to estimate the likelihood of success in determining a macromolecular structure by X-ray crystallography and experimental single-wavelength anomalous dispersion (SAD) or multiple-wavelength anomalous dispersion (MAD) phasing based on initial data-processing statistics and sample crystal properties. Such a predictive tool can rapidly assess the usefulness of data and guide the collection of an optimal data set. The increase in data rates from modern macromolecular crystallography beamlines, together with a demand from users for real-time feedback, has led to pressure on computational resources and a need for smarter data handling. Statistical and machine-learning methods have been applied to construct a classifier that displays 95% accuracy for training and testing data sets compiled from 440 solved structures. Applying this classifier to new data achieved 79% accuracy. These scores already provide clear guidance as to the effective use of computing resources and offer a starting point for a personalized data-collection assistant.
format Online
Article
Text
id pubmed-7055369
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher International Union of Crystallography
record_format MEDLINE/PubMed
spelling pubmed-70553692020-03-06 The predictive power of data-processing statistics Vollmar, Melanie Parkhurst, James M. Jaques, Dominic Baslé, Arnaud Murshudov, Garib N. Waterman, David G. Evans, Gwyndaf IUCrJ Research Papers This study describes a method to estimate the likelihood of success in determining a macromolecular structure by X-ray crystallography and experimental single-wavelength anomalous dispersion (SAD) or multiple-wavelength anomalous dispersion (MAD) phasing based on initial data-processing statistics and sample crystal properties. Such a predictive tool can rapidly assess the usefulness of data and guide the collection of an optimal data set. The increase in data rates from modern macromolecular crystallography beamlines, together with a demand from users for real-time feedback, has led to pressure on computational resources and a need for smarter data handling. Statistical and machine-learning methods have been applied to construct a classifier that displays 95% accuracy for training and testing data sets compiled from 440 solved structures. Applying this classifier to new data achieved 79% accuracy. These scores already provide clear guidance as to the effective use of computing resources and offer a starting point for a personalized data-collection assistant. International Union of Crystallography 2020-02-27 /pmc/articles/PMC7055369/ /pubmed/32148861 http://dx.doi.org/10.1107/S2052252520000895 Text en © Melanie Vollmar et al. 2020 http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.http://creativecommons.org/licenses/by/4.0/
spellingShingle Research Papers
Vollmar, Melanie
Parkhurst, James M.
Jaques, Dominic
Baslé, Arnaud
Murshudov, Garib N.
Waterman, David G.
Evans, Gwyndaf
The predictive power of data-processing statistics
title The predictive power of data-processing statistics
title_full The predictive power of data-processing statistics
title_fullStr The predictive power of data-processing statistics
title_full_unstemmed The predictive power of data-processing statistics
title_short The predictive power of data-processing statistics
title_sort predictive power of data-processing statistics
topic Research Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7055369/
https://www.ncbi.nlm.nih.gov/pubmed/32148861
http://dx.doi.org/10.1107/S2052252520000895
work_keys_str_mv AT vollmarmelanie thepredictivepowerofdataprocessingstatistics
AT parkhurstjamesm thepredictivepowerofdataprocessingstatistics
AT jaquesdominic thepredictivepowerofdataprocessingstatistics
AT baslearnaud thepredictivepowerofdataprocessingstatistics
AT murshudovgaribn thepredictivepowerofdataprocessingstatistics
AT watermandavidg thepredictivepowerofdataprocessingstatistics
AT evansgwyndaf thepredictivepowerofdataprocessingstatistics
AT vollmarmelanie predictivepowerofdataprocessingstatistics
AT parkhurstjamesm predictivepowerofdataprocessingstatistics
AT jaquesdominic predictivepowerofdataprocessingstatistics
AT baslearnaud predictivepowerofdataprocessingstatistics
AT murshudovgaribn predictivepowerofdataprocessingstatistics
AT watermandavidg predictivepowerofdataprocessingstatistics
AT evansgwyndaf predictivepowerofdataprocessingstatistics