Cargando…

On bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy

Cryo-electron microscopy (cryoEM) has become a well established technique to elucidate the 3D structures of biological macromolecules. Projection images from thousands of macromolecules that are assumed to be structurally identical are combined into a single 3D map representing the Coulomb potential...

Descripción completa

Detalles Bibliográficos
Autores principales: Sorzano, C. O. S., Jiménez-Moreno, A., Maluenda, D., Martínez, M., Ramírez-Aportela, E., Krieger, J., Melero, R., Cuervo, A., Conesa, J., Filipovic, J., Conesa, P., del Caño, L., Fonseca, Y. C., Jiménez-de la Morena, J., Losana, P., Sánchez-García, R., Strelak, D., Fernández-Giménez, E., de Isidro-Gómez, F. P., Herreros, D., Vilas, J. L., Marabini, R., Carazo, J. M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: International Union of Crystallography 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8972802/
https://www.ncbi.nlm.nih.gov/pubmed/35362465
http://dx.doi.org/10.1107/S2059798322001978
_version_ 1784679926839902208
author Sorzano, C. O. S.
Jiménez-Moreno, A.
Maluenda, D.
Martínez, M.
Ramírez-Aportela, E.
Krieger, J.
Melero, R.
Cuervo, A.
Conesa, J.
Filipovic, J.
Conesa, P.
del Caño, L.
Fonseca, Y. C.
Jiménez-de la Morena, J.
Losana, P.
Sánchez-García, R.
Strelak, D.
Fernández-Giménez, E.
de Isidro-Gómez, F. P.
Herreros, D.
Vilas, J. L.
Marabini, R.
Carazo, J. M.
author_facet Sorzano, C. O. S.
Jiménez-Moreno, A.
Maluenda, D.
Martínez, M.
Ramírez-Aportela, E.
Krieger, J.
Melero, R.
Cuervo, A.
Conesa, J.
Filipovic, J.
Conesa, P.
del Caño, L.
Fonseca, Y. C.
Jiménez-de la Morena, J.
Losana, P.
Sánchez-García, R.
Strelak, D.
Fernández-Giménez, E.
de Isidro-Gómez, F. P.
Herreros, D.
Vilas, J. L.
Marabini, R.
Carazo, J. M.
author_sort Sorzano, C. O. S.
collection PubMed
description Cryo-electron microscopy (cryoEM) has become a well established technique to elucidate the 3D structures of biological macromolecules. Projection images from thousands of macromolecules that are assumed to be structurally identical are combined into a single 3D map representing the Coulomb potential of the macromolecule under study. This article discusses possible caveats along the image-processing path and how to avoid them to obtain a reliable 3D structure. Some of these problems are very well known in the community. These may be referred to as sample-related (such as specimen denaturation at interfaces or non-uniform projection geometry leading to underrepresented projection directions). The rest are related to the algorithms used. While some have been discussed in depth in the literature, such as the use of an incorrect initial volume, others have received much less attention. However, they are fundamental in any data-analysis approach. Chiefly among them, instabilities in estimating many of the key parameters that are required for a correct 3D reconstruction that occur all along the processing workflow are referred to, which may significantly affect the reliability of the whole process. In the field, the term overfitting has been coined to refer to some particular kinds of artifacts. It is argued that overfitting is a statistical bias in key parameter-estimation steps in the 3D reconstruction process, including intrinsic algorithmic bias. It is also shown that common tools (Fourier shell correlation) and strategies (gold standard) that are normally used to detect or prevent overfitting do not fully protect against it. Alternatively, it is proposed that detecting the bias that leads to overfitting is much easier when addressed at the level of parameter estimation, rather than detecting it once the particle images have been combined into a 3D map. Comparing the results from multiple algorithms (or at least, independent executions of the same algorithm) can detect parameter bias. These multiple executions could then be averaged to give a lower variance estimate of the underlying parameters.
format Online
Article
Text
id pubmed-8972802
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher International Union of Crystallography
record_format MEDLINE/PubMed
spelling pubmed-89728022022-04-28 On bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy Sorzano, C. O. S. Jiménez-Moreno, A. Maluenda, D. Martínez, M. Ramírez-Aportela, E. Krieger, J. Melero, R. Cuervo, A. Conesa, J. Filipovic, J. Conesa, P. del Caño, L. Fonseca, Y. C. Jiménez-de la Morena, J. Losana, P. Sánchez-García, R. Strelak, D. Fernández-Giménez, E. de Isidro-Gómez, F. P. Herreros, D. Vilas, J. L. Marabini, R. Carazo, J. M. Acta Crystallogr D Struct Biol Ccp-EM Cryo-electron microscopy (cryoEM) has become a well established technique to elucidate the 3D structures of biological macromolecules. Projection images from thousands of macromolecules that are assumed to be structurally identical are combined into a single 3D map representing the Coulomb potential of the macromolecule under study. This article discusses possible caveats along the image-processing path and how to avoid them to obtain a reliable 3D structure. Some of these problems are very well known in the community. These may be referred to as sample-related (such as specimen denaturation at interfaces or non-uniform projection geometry leading to underrepresented projection directions). The rest are related to the algorithms used. While some have been discussed in depth in the literature, such as the use of an incorrect initial volume, others have received much less attention. However, they are fundamental in any data-analysis approach. Chiefly among them, instabilities in estimating many of the key parameters that are required for a correct 3D reconstruction that occur all along the processing workflow are referred to, which may significantly affect the reliability of the whole process. In the field, the term overfitting has been coined to refer to some particular kinds of artifacts. It is argued that overfitting is a statistical bias in key parameter-estimation steps in the 3D reconstruction process, including intrinsic algorithmic bias. It is also shown that common tools (Fourier shell correlation) and strategies (gold standard) that are normally used to detect or prevent overfitting do not fully protect against it. Alternatively, it is proposed that detecting the bias that leads to overfitting is much easier when addressed at the level of parameter estimation, rather than detecting it once the particle images have been combined into a 3D map. Comparing the results from multiple algorithms (or at least, independent executions of the same algorithm) can detect parameter bias. These multiple executions could then be averaged to give a lower variance estimate of the underlying parameters. International Union of Crystallography 2022-03-16 /pmc/articles/PMC8972802/ /pubmed/35362465 http://dx.doi.org/10.1107/S2059798322001978 Text en © C. O. S. Sorzano et al. 2022 https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution (CC-BY) Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are cited.
spellingShingle Ccp-EM
Sorzano, C. O. S.
Jiménez-Moreno, A.
Maluenda, D.
Martínez, M.
Ramírez-Aportela, E.
Krieger, J.
Melero, R.
Cuervo, A.
Conesa, J.
Filipovic, J.
Conesa, P.
del Caño, L.
Fonseca, Y. C.
Jiménez-de la Morena, J.
Losana, P.
Sánchez-García, R.
Strelak, D.
Fernández-Giménez, E.
de Isidro-Gómez, F. P.
Herreros, D.
Vilas, J. L.
Marabini, R.
Carazo, J. M.
On bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy
title On bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy
title_full On bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy
title_fullStr On bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy
title_full_unstemmed On bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy
title_short On bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy
title_sort on bias, variance, overfitting, gold standard and consensus in single-particle analysis by cryo-electron microscopy
topic Ccp-EM
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8972802/
https://www.ncbi.nlm.nih.gov/pubmed/35362465
http://dx.doi.org/10.1107/S2059798322001978
work_keys_str_mv AT sorzanocos onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT jimenezmorenoa onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT maluendad onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT martinezm onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT ramirezaportelae onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT kriegerj onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT meleror onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT cuervoa onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT conesaj onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT filipovicj onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT conesap onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT delcanol onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT fonsecayc onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT jimenezdelamorenaj onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT losanap onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT sanchezgarciar onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT strelakd onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT fernandezgimeneze onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT deisidrogomezfp onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT herrerosd onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT vilasjl onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT marabinir onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy
AT carazojm onbiasvarianceoverfittinggoldstandardandconsensusinsingleparticleanalysisbycryoelectronmicroscopy