Cargando…

Serial crystallography with multi-stage merging of thousands of images

KAMO and BLEND provide particularly effective tools to automatically manage the merging of large numbers of data sets from serial crystallography. The requirement for manual intervention in the process can be reduced by extending BLEND to support additional clustering options such as the use of more...

Descripción completa

Detalles Bibliográficos
Autores principales: Soares, Alexei S., Yamada, Yusuke, Jakoncic, Jean, McSweeney, Sean, Sweet, Robert M., Skinner, John, Foadi, James, Fuchs, Martin R., Schneider, Dieter K., Shi, Wuxian, Andi, Babak, Andrews, Lawrence C., Bernstein, Herbert J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: International Union of Crystallography 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9254899/
https://www.ncbi.nlm.nih.gov/pubmed/35787556
http://dx.doi.org/10.1107/S2053230X22006422
_version_ 1784740811400806400
author Soares, Alexei S.
Yamada, Yusuke
Jakoncic, Jean
McSweeney, Sean
Sweet, Robert M.
Skinner, John
Foadi, James
Fuchs, Martin R.
Schneider, Dieter K.
Shi, Wuxian
Andi, Babak
Andrews, Lawrence C.
Bernstein, Herbert J.
author_facet Soares, Alexei S.
Yamada, Yusuke
Jakoncic, Jean
McSweeney, Sean
Sweet, Robert M.
Skinner, John
Foadi, James
Fuchs, Martin R.
Schneider, Dieter K.
Shi, Wuxian
Andi, Babak
Andrews, Lawrence C.
Bernstein, Herbert J.
author_sort Soares, Alexei S.
collection PubMed
description KAMO and BLEND provide particularly effective tools to automatically manage the merging of large numbers of data sets from serial crystallography. The requirement for manual intervention in the process can be reduced by extending BLEND to support additional clustering options such as the use of more accurate cell distance metrics and the use of reflection-intensity correlation coefficients to infer ‘distances’ among sets of reflections. This increases the sensitivity to differences in unit-cell parameters and allows clustering to assemble nearly complete data sets on the basis of intensity or amplitude differences. If the data sets are already sufficiently complete to permit it, one applies KAMO once and clusters the data using intensities only. When starting from incomplete data sets, one applies KAMO twice, first using unit-cell parameters. In this step, either the simple cell vector distance of the original BLEND or the more sensitive NCDist is used. This step tends to find clusters of sufficient size such that, when merged, each cluster is sufficiently complete to allow reflection intensities or amplitudes to be compared. One then uses KAMO again using the correlation between reflections with a common hkl to merge clusters in a way that is sensitive to structural differences that may not have perturbed the unit-cell parameters sufficiently to make meaningful clusters. Many groups have developed effective clustering algorithms that use a measurable physical parameter from each diffraction still or wedge to cluster the data into categories which then can be merged, one hopes, to yield the electron density from a single protein form. Since these physical parameters are often largely independent of one another, it should be possible to greatly improve the efficacy of data-clustering software by using a multi-stage partitioning strategy. Here, one possible approach to multi-stage data clustering is demonstrated. The strategy is to use unit-cell clustering until the merged data are sufficiently complete and then to use intensity-based clustering. Using this strategy, it is demonstrated that it is possible to accurately cluster data sets from crystals that have subtle differences.
format Online
Article
Text
id pubmed-9254899
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher International Union of Crystallography
record_format MEDLINE/PubMed
spelling pubmed-92548992022-07-14 Serial crystallography with multi-stage merging of thousands of images Soares, Alexei S. Yamada, Yusuke Jakoncic, Jean McSweeney, Sean Sweet, Robert M. Skinner, John Foadi, James Fuchs, Martin R. Schneider, Dieter K. Shi, Wuxian Andi, Babak Andrews, Lawrence C. Bernstein, Herbert J. Acta Crystallogr F Struct Biol Commun Method Communications KAMO and BLEND provide particularly effective tools to automatically manage the merging of large numbers of data sets from serial crystallography. The requirement for manual intervention in the process can be reduced by extending BLEND to support additional clustering options such as the use of more accurate cell distance metrics and the use of reflection-intensity correlation coefficients to infer ‘distances’ among sets of reflections. This increases the sensitivity to differences in unit-cell parameters and allows clustering to assemble nearly complete data sets on the basis of intensity or amplitude differences. If the data sets are already sufficiently complete to permit it, one applies KAMO once and clusters the data using intensities only. When starting from incomplete data sets, one applies KAMO twice, first using unit-cell parameters. In this step, either the simple cell vector distance of the original BLEND or the more sensitive NCDist is used. This step tends to find clusters of sufficient size such that, when merged, each cluster is sufficiently complete to allow reflection intensities or amplitudes to be compared. One then uses KAMO again using the correlation between reflections with a common hkl to merge clusters in a way that is sensitive to structural differences that may not have perturbed the unit-cell parameters sufficiently to make meaningful clusters. Many groups have developed effective clustering algorithms that use a measurable physical parameter from each diffraction still or wedge to cluster the data into categories which then can be merged, one hopes, to yield the electron density from a single protein form. Since these physical parameters are often largely independent of one another, it should be possible to greatly improve the efficacy of data-clustering software by using a multi-stage partitioning strategy. Here, one possible approach to multi-stage data clustering is demonstrated. The strategy is to use unit-cell clustering until the merged data are sufficiently complete and then to use intensity-based clustering. Using this strategy, it is demonstrated that it is possible to accurately cluster data sets from crystals that have subtle differences. International Union of Crystallography 2022-07-04 /pmc/articles/PMC9254899/ /pubmed/35787556 http://dx.doi.org/10.1107/S2053230X22006422 Text en © Alexei S. Soares et al. 2022 https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.
spellingShingle Method Communications
Soares, Alexei S.
Yamada, Yusuke
Jakoncic, Jean
McSweeney, Sean
Sweet, Robert M.
Skinner, John
Foadi, James
Fuchs, Martin R.
Schneider, Dieter K.
Shi, Wuxian
Andi, Babak
Andrews, Lawrence C.
Bernstein, Herbert J.
Serial crystallography with multi-stage merging of thousands of images
title Serial crystallography with multi-stage merging of thousands of images
title_full Serial crystallography with multi-stage merging of thousands of images
title_fullStr Serial crystallography with multi-stage merging of thousands of images
title_full_unstemmed Serial crystallography with multi-stage merging of thousands of images
title_short Serial crystallography with multi-stage merging of thousands of images
title_sort serial crystallography with multi-stage merging of thousands of images
topic Method Communications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9254899/
https://www.ncbi.nlm.nih.gov/pubmed/35787556
http://dx.doi.org/10.1107/S2053230X22006422
work_keys_str_mv AT soaresalexeis serialcrystallographywithmultistagemergingofthousandsofimages
AT yamadayusuke serialcrystallographywithmultistagemergingofthousandsofimages
AT jakoncicjean serialcrystallographywithmultistagemergingofthousandsofimages
AT mcsweeneysean serialcrystallographywithmultistagemergingofthousandsofimages
AT sweetrobertm serialcrystallographywithmultistagemergingofthousandsofimages
AT skinnerjohn serialcrystallographywithmultistagemergingofthousandsofimages
AT foadijames serialcrystallographywithmultistagemergingofthousandsofimages
AT fuchsmartinr serialcrystallographywithmultistagemergingofthousandsofimages
AT schneiderdieterk serialcrystallographywithmultistagemergingofthousandsofimages
AT shiwuxian serialcrystallographywithmultistagemergingofthousandsofimages
AT andibabak serialcrystallographywithmultistagemergingofthousandsofimages
AT andrewslawrencec serialcrystallographywithmultistagemergingofthousandsofimages
AT bernsteinherbertj serialcrystallographywithmultistagemergingofthousandsofimages