Cargando…

Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection

Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, onl...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kupczyk, Erwin, Schorpp, Kenji, Hadian, Kamyar, Lin, Sean, Tziotis, Dimitrios, Schmitt-Kopplin, Philippe, Mueller, Constanze
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Research Network of Computational and Structural Biotechnology 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9530837/ https://www.ncbi.nlm.nih.gov/pubmed/36212538 http://dx.doi.org/10.1016/j.csbj.2022.09.023

_version_	1784801769494151168
author	Kupczyk, Erwin Schorpp, Kenji Hadian, Kamyar Lin, Sean Tziotis, Dimitrios Schmitt-Kopplin, Philippe Mueller, Constanze
author_facet	Kupczyk, Erwin Schorpp, Kenji Hadian, Kamyar Lin, Sean Tziotis, Dimitrios Schmitt-Kopplin, Philippe Mueller, Constanze
author_sort	Kupczyk, Erwin
collection	PubMed
description	Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, only some machine learning (ML) applications for data analysis are available and these usually require a profound knowledge of the underlying cell biology. Unfortunately, there are no applications that simply predict if samples are biologically active or not (any kind of bioactivity). Within this work, we benchmark ML algorithms for binary classification, starting with classical ML models, which are the standard classifiers of the scikit-learn library or ensemble models of these classifiers (a total of 92 models tested). Followed by a partial least square regression (PLSR)-based classification (44 tested models in total) and simple artificial neural networks (ANNs) with dense layers (72 tested models in total). In addition, a novelty detection (ND) was examined, which is supposed to handle unknown patterns. For the final analysis the models, with and without upstream ND, were tested with two independent data sets. In our analysis, a stacking model, an ensamble model of class ML algorithms, performed best to predict new and unknown data. ND improved the predictions of the models and was useful to handle unknown patterns. Importantly, the classifier presented here can be easily rebuilt and be adapted to the data and demands of other groups. The hit detector (ND + stacking model) is universal and suitable for a broader application to support the search for new drug candidates.
format	Online Article Text
id	pubmed-9530837
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Research Network of Computational and Structural Biotechnology
record_format	MEDLINE/PubMed
spelling	pubmed-95308372022-10-06 Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection Kupczyk, Erwin Schorpp, Kenji Hadian, Kamyar Lin, Sean Tziotis, Dimitrios Schmitt-Kopplin, Philippe Mueller, Constanze Comput Struct Biotechnol J Research Article Complex mixtures containing natural products are still an interesting source of novel drug candidates. High content screening (HCS) is a popular tool to screen for such. In particular, multiplexed HCS assays promise comprehensive bioactivity profiles, but generate also high amounts of data. Yet, only some machine learning (ML) applications for data analysis are available and these usually require a profound knowledge of the underlying cell biology. Unfortunately, there are no applications that simply predict if samples are biologically active or not (any kind of bioactivity). Within this work, we benchmark ML algorithms for binary classification, starting with classical ML models, which are the standard classifiers of the scikit-learn library or ensemble models of these classifiers (a total of 92 models tested). Followed by a partial least square regression (PLSR)-based classification (44 tested models in total) and simple artificial neural networks (ANNs) with dense layers (72 tested models in total). In addition, a novelty detection (ND) was examined, which is supposed to handle unknown patterns. For the final analysis the models, with and without upstream ND, were tested with two independent data sets. In our analysis, a stacking model, an ensamble model of class ML algorithms, performed best to predict new and unknown data. ND improved the predictions of the models and was useful to handle unknown patterns. Importantly, the classifier presented here can be easily rebuilt and be adapted to the data and demands of other groups. The hit detector (ND + stacking model) is universal and suitable for a broader application to support the search for new drug candidates. Research Network of Computational and Structural Biotechnology 2022-09-27 /pmc/articles/PMC9530837/ /pubmed/36212538 http://dx.doi.org/10.1016/j.csbj.2022.09.023 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Research Article Kupczyk, Erwin Schorpp, Kenji Hadian, Kamyar Lin, Sean Tziotis, Dimitrios Schmitt-Kopplin, Philippe Mueller, Constanze Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title	Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_full	Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_fullStr	Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_full_unstemmed	Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_short	Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection
title_sort	unleashing high content screening in hit detection – benchmarking ai workflows including novelty detection
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9530837/ https://www.ncbi.nlm.nih.gov/pubmed/36212538 http://dx.doi.org/10.1016/j.csbj.2022.09.023
work_keys_str_mv	AT kupczykerwin unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT schorppkenji unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT hadiankamyar unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT linsean unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT tziotisdimitrios unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT schmittkopplinphilippe unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection AT muellerconstanze unleashinghighcontentscreeninginhitdetectionbenchmarkingaiworkflowsincludingnoveltydetection

Unleashing high content screening in hit detection – Benchmarking AI workflows including novelty detection

Ejemplares similares