Cargando…

Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images

Since preparative chromatography is a sustainability challenge due to large amounts of consumables used in downstream processing of biomolecules, protein crystallization offers a promising alternative as a purification method. While the limited crystallizability of proteins often restricts a broad a...

Descripción completa

Detalles Bibliográficos
Autores principales: Bischoff, Daniel, Walla, Brigitte, Weuster-Botz, Dirk
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Berlin Heidelberg 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9372129/
https://www.ncbi.nlm.nih.gov/pubmed/35661232
http://dx.doi.org/10.1007/s00216-022-04101-8
_version_ 1784767313038278656
author Bischoff, Daniel
Walla, Brigitte
Weuster-Botz, Dirk
author_facet Bischoff, Daniel
Walla, Brigitte
Weuster-Botz, Dirk
author_sort Bischoff, Daniel
collection PubMed
description Since preparative chromatography is a sustainability challenge due to large amounts of consumables used in downstream processing of biomolecules, protein crystallization offers a promising alternative as a purification method. While the limited crystallizability of proteins often restricts a broad application of crystallization as a purification method, advances in molecular biology, as well as computational methods are pushing the applicability towards integration in biotechnological downstream processes. However, in industrial and academic settings, monitoring protein crystallization processes non-invasively by microscopic photography and automated image evaluation remains a challenging problem. Recently, the identification of single crystal objects using deep learning has been the subject of increased attention for various model systems. However, the advancement of crystal detection using deep learning for biotechnological applications is limited: robust models obtained through supervised machine learning tasks require large-scale and high-quality data sets usually obtained in large projects through extensive manual labeling, an approach that is highly error-prone for dense systems of transparent crystals. For the first time, recent trends involving the use of synthetic data sets for supervised learning are transferred, thus generating photorealistic images of virtual protein crystals in suspension (PCS) through the use of ray tracing algorithms, accompanied by specialized data augmentations modelling experimental noise. Further, it is demonstrated that state-of-the-art models trained with the large-scale synthetic PCS data set outperform similar fine-tuned models based on the average precision metric on a validation data set, followed by experimental validation using high-resolution photomicrographs from stirred tank protein crystallization processes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00216-022-04101-8.
format Online
Article
Text
id pubmed-9372129
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer Berlin Heidelberg
record_format MEDLINE/PubMed
spelling pubmed-93721292022-08-13 Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images Bischoff, Daniel Walla, Brigitte Weuster-Botz, Dirk Anal Bioanal Chem Research Paper Since preparative chromatography is a sustainability challenge due to large amounts of consumables used in downstream processing of biomolecules, protein crystallization offers a promising alternative as a purification method. While the limited crystallizability of proteins often restricts a broad application of crystallization as a purification method, advances in molecular biology, as well as computational methods are pushing the applicability towards integration in biotechnological downstream processes. However, in industrial and academic settings, monitoring protein crystallization processes non-invasively by microscopic photography and automated image evaluation remains a challenging problem. Recently, the identification of single crystal objects using deep learning has been the subject of increased attention for various model systems. However, the advancement of crystal detection using deep learning for biotechnological applications is limited: robust models obtained through supervised machine learning tasks require large-scale and high-quality data sets usually obtained in large projects through extensive manual labeling, an approach that is highly error-prone for dense systems of transparent crystals. For the first time, recent trends involving the use of synthetic data sets for supervised learning are transferred, thus generating photorealistic images of virtual protein crystals in suspension (PCS) through the use of ray tracing algorithms, accompanied by specialized data augmentations modelling experimental noise. Further, it is demonstrated that state-of-the-art models trained with the large-scale synthetic PCS data set outperform similar fine-tuned models based on the average precision metric on a validation data set, followed by experimental validation using high-resolution photomicrographs from stirred tank protein crystallization processes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s00216-022-04101-8. Springer Berlin Heidelberg 2022-06-04 2022 /pmc/articles/PMC9372129/ /pubmed/35661232 http://dx.doi.org/10.1007/s00216-022-04101-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Paper
Bischoff, Daniel
Walla, Brigitte
Weuster-Botz, Dirk
Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images
title Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images
title_full Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images
title_fullStr Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images
title_full_unstemmed Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images
title_short Machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images
title_sort machine learning-based protein crystal detection for monitoring of crystallization processes enabled with large-scale synthetic data sets of photorealistic images
topic Research Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9372129/
https://www.ncbi.nlm.nih.gov/pubmed/35661232
http://dx.doi.org/10.1007/s00216-022-04101-8
work_keys_str_mv AT bischoffdaniel machinelearningbasedproteincrystaldetectionformonitoringofcrystallizationprocessesenabledwithlargescalesyntheticdatasetsofphotorealisticimages
AT wallabrigitte machinelearningbasedproteincrystaldetectionformonitoringofcrystallizationprocessesenabledwithlargescalesyntheticdatasetsofphotorealisticimages
AT weusterbotzdirk machinelearningbasedproteincrystaldetectionformonitoringofcrystallizationprocessesenabledwithlargescalesyntheticdatasetsofphotorealisticimages