Statistical distortion of supervised learning predictions in optical microscopy induced by image compression

The growth of data throughput in optical microscopy has triggered the extensive use of supervised learning (SL) models on compressed datasets for automated analysis. Investigating the effects of image compression on SL predictions is therefore pivotal to assess their reliability, especially for clin...

Descripción completa

Detalles Bibliográficos
Autores principales: Pomarico, Enrico, Schmidt, Cédric, Chays, Florian, Nguyen, David, Planchette, Arielle, Tissot, Audrey, Roux, Adrien, Pagès, Stéphane, Batti, Laura, Clausen, Christoph, Lasser, Theo, Radenovic, Aleksandra, Sanguinetti, Bruno, Extermann, Jérôme
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8891276/
https://www.ncbi.nlm.nih.gov/pubmed/35236913
http://dx.doi.org/10.1038/s41598-022-07445-4
Descripción
Sumario:The growth of data throughput in optical microscopy has triggered the extensive use of supervised learning (SL) models on compressed datasets for automated analysis. Investigating the effects of image compression on SL predictions is therefore pivotal to assess their reliability, especially for clinical use. We quantify the statistical distortions induced by compression through the comparison of predictions on compressed data to the raw predictive uncertainty, numerically estimated from the raw noise statistics measured via sensor calibration. Predictions on cell segmentation parameters are altered by up to 15% and more than 10 standard deviations after 16-to-8 bits pixel depth reduction and 10:1 JPEG compression. JPEG formats with higher compression ratios show significantly larger distortions. Interestingly, a recent metrologically accurate algorithm, offering up to 10:1 compression ratio, provides a prediction spread equivalent to that stemming from raw noise. The method described here allows to set a lower bound to the predictive uncertainty of a SL task and can be generalized to determine the statistical distortions originated from a variety of processing pipelines in AI-assisted fields.