Cargando…

A Python package based on robust statistical analysis for serial crystallography data processing

The term robustness in statistics refers to methods that are generally insensitive to deviations from model assumptions. In other words, robust methods are able to preserve their accuracy even when the data do not perfectly fit the statistical models. Robust statistical analyses are particularly eff...

Descripción completa

Detalles Bibliográficos
Autores principales: Hadian-Jazi, Marjan, Sadri, Alireza
Formato: Online Artículo Texto
Lenguaje:English
Publicado: International Union of Crystallography 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10478633/
https://www.ncbi.nlm.nih.gov/pubmed/37584428
http://dx.doi.org/10.1107/S2059798323005855
_version_ 1785101396705542144
author Hadian-Jazi, Marjan
Sadri, Alireza
author_facet Hadian-Jazi, Marjan
Sadri, Alireza
author_sort Hadian-Jazi, Marjan
collection PubMed
description The term robustness in statistics refers to methods that are generally insensitive to deviations from model assumptions. In other words, robust methods are able to preserve their accuracy even when the data do not perfectly fit the statistical models. Robust statistical analyses are particularly effective when analysing mixtures of probability distributions. Therefore, these methods enable the discretization of X-ray serial crystallography data into two probability distributions: a group comprising true data points (for example the background intensities) and another group comprising outliers (for example Bragg peaks or bad pixels on an X-ray detector). These characteristics of robust statistical analysis are beneficial for the ever-increasing volume of serial crystallography (SX) data sets produced at synchrotron and X-ray free-electron laser (XFEL) sources. The key advantage of the use of robust statistics for some applications in SX data analysis is that it requires minimal parameter tuning because of its insensitivity to the input parameters. In this paper, a software package called Robust Gaussian Fitting library (RGFlib) is introduced that is based on the concept of robust statistics. Two methods are presented based on the concept of robust statistics and RGFlib for two SX data-analysis tasks: (i) a robust peak-finding algorithm and (ii) an automated robust method to detect bad pixels on X-ray pixel detectors.
format Online
Article
Text
id pubmed-10478633
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher International Union of Crystallography
record_format MEDLINE/PubMed
spelling pubmed-104786332023-09-06 A Python package based on robust statistical analysis for serial crystallography data processing Hadian-Jazi, Marjan Sadri, Alireza Acta Crystallogr D Struct Biol Ccp4 The term robustness in statistics refers to methods that are generally insensitive to deviations from model assumptions. In other words, robust methods are able to preserve their accuracy even when the data do not perfectly fit the statistical models. Robust statistical analyses are particularly effective when analysing mixtures of probability distributions. Therefore, these methods enable the discretization of X-ray serial crystallography data into two probability distributions: a group comprising true data points (for example the background intensities) and another group comprising outliers (for example Bragg peaks or bad pixels on an X-ray detector). These characteristics of robust statistical analysis are beneficial for the ever-increasing volume of serial crystallography (SX) data sets produced at synchrotron and X-ray free-electron laser (XFEL) sources. The key advantage of the use of robust statistics for some applications in SX data analysis is that it requires minimal parameter tuning because of its insensitivity to the input parameters. In this paper, a software package called Robust Gaussian Fitting library (RGFlib) is introduced that is based on the concept of robust statistics. Two methods are presented based on the concept of robust statistics and RGFlib for two SX data-analysis tasks: (i) a robust peak-finding algorithm and (ii) an automated robust method to detect bad pixels on X-ray pixel detectors. International Union of Crystallography 2023-08-16 /pmc/articles/PMC10478633/ /pubmed/37584428 http://dx.doi.org/10.1107/S2059798323005855 Text en © Hadian-Jazi and Sadri 2023 https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.
spellingShingle Ccp4
Hadian-Jazi, Marjan
Sadri, Alireza
A Python package based on robust statistical analysis for serial crystallography data processing
title A Python package based on robust statistical analysis for serial crystallography data processing
title_full A Python package based on robust statistical analysis for serial crystallography data processing
title_fullStr A Python package based on robust statistical analysis for serial crystallography data processing
title_full_unstemmed A Python package based on robust statistical analysis for serial crystallography data processing
title_short A Python package based on robust statistical analysis for serial crystallography data processing
title_sort python package based on robust statistical analysis for serial crystallography data processing
topic Ccp4
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10478633/
https://www.ncbi.nlm.nih.gov/pubmed/37584428
http://dx.doi.org/10.1107/S2059798323005855
work_keys_str_mv AT hadianjazimarjan apythonpackagebasedonrobuststatisticalanalysisforserialcrystallographydataprocessing
AT sadrialireza apythonpackagebasedonrobuststatisticalanalysisforserialcrystallographydataprocessing
AT hadianjazimarjan pythonpackagebasedonrobuststatisticalanalysisforserialcrystallographydataprocessing
AT sadrialireza pythonpackagebasedonrobuststatisticalanalysisforserialcrystallographydataprocessing