Cargando…

Unsupervised selection of optimal single-molecule time series idealization criterion

Single-molecule (SM) approaches have provided valuable mechanistic information on many biophysical systems. As technological advances lead to ever-larger data sets, tools for rapid analysis and identification of molecules exhibiting the behavior of interest are increasingly important. In many cases...

Descripción completa

Detalles Bibliográficos
Autores principales: Bandyopadhyay, Argha, Goldschen-Ohm, Marcel P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Biophysical Society 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8553667/
https://www.ncbi.nlm.nih.gov/pubmed/34487708
http://dx.doi.org/10.1016/j.bpj.2021.08.045
_version_ 1784591626640818176
author Bandyopadhyay, Argha
Goldschen-Ohm, Marcel P.
author_facet Bandyopadhyay, Argha
Goldschen-Ohm, Marcel P.
author_sort Bandyopadhyay, Argha
collection PubMed
description Single-molecule (SM) approaches have provided valuable mechanistic information on many biophysical systems. As technological advances lead to ever-larger data sets, tools for rapid analysis and identification of molecules exhibiting the behavior of interest are increasingly important. In many cases the underlying mechanism is unknown, making unsupervised techniques desirable. The divisive segmentation and clustering (DISC) algorithm is one such unsupervised method that idealizes noisy SM time series much faster than computationally intensive approaches without sacrificing accuracy. However, DISC relies on a user-selected objective criterion (OC) to guide its estimation of the ideal time series. Here, we explore how different OCs affect DISC’s performance for data typical of SM fluorescence imaging experiments. We find that OCs differing in their penalty for model complexity each optimize DISC’s performance for time series with different properties such as signal/noise and number of sample points. Using a machine learning approach, we generate a decision boundary that allows unsupervised selection of OCs based on the input time series to maximize performance for different types of data. This is particularly relevant for SM fluorescence data sets, which often have signal/noise near the derived decision boundary and include time series of nonuniform length because of stochastic bleaching. Our approach, AutoDISC, allows unsupervised per-molecule optimization of DISC, which will substantially assist in the rapid analysis of high-throughput SM data sets with noisy samples and nonuniform time windows.
format Online
Article
Text
id pubmed-8553667
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher The Biophysical Society
record_format MEDLINE/PubMed
spelling pubmed-85536672022-10-19 Unsupervised selection of optimal single-molecule time series idealization criterion Bandyopadhyay, Argha Goldschen-Ohm, Marcel P. Biophys J Articles Single-molecule (SM) approaches have provided valuable mechanistic information on many biophysical systems. As technological advances lead to ever-larger data sets, tools for rapid analysis and identification of molecules exhibiting the behavior of interest are increasingly important. In many cases the underlying mechanism is unknown, making unsupervised techniques desirable. The divisive segmentation and clustering (DISC) algorithm is one such unsupervised method that idealizes noisy SM time series much faster than computationally intensive approaches without sacrificing accuracy. However, DISC relies on a user-selected objective criterion (OC) to guide its estimation of the ideal time series. Here, we explore how different OCs affect DISC’s performance for data typical of SM fluorescence imaging experiments. We find that OCs differing in their penalty for model complexity each optimize DISC’s performance for time series with different properties such as signal/noise and number of sample points. Using a machine learning approach, we generate a decision boundary that allows unsupervised selection of OCs based on the input time series to maximize performance for different types of data. This is particularly relevant for SM fluorescence data sets, which often have signal/noise near the derived decision boundary and include time series of nonuniform length because of stochastic bleaching. Our approach, AutoDISC, allows unsupervised per-molecule optimization of DISC, which will substantially assist in the rapid analysis of high-throughput SM data sets with noisy samples and nonuniform time windows. The Biophysical Society 2021-10-19 2021-09-04 /pmc/articles/PMC8553667/ /pubmed/34487708 http://dx.doi.org/10.1016/j.bpj.2021.08.045 Text en © 2021 Biophysical Society. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Articles
Bandyopadhyay, Argha
Goldschen-Ohm, Marcel P.
Unsupervised selection of optimal single-molecule time series idealization criterion
title Unsupervised selection of optimal single-molecule time series idealization criterion
title_full Unsupervised selection of optimal single-molecule time series idealization criterion
title_fullStr Unsupervised selection of optimal single-molecule time series idealization criterion
title_full_unstemmed Unsupervised selection of optimal single-molecule time series idealization criterion
title_short Unsupervised selection of optimal single-molecule time series idealization criterion
title_sort unsupervised selection of optimal single-molecule time series idealization criterion
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8553667/
https://www.ncbi.nlm.nih.gov/pubmed/34487708
http://dx.doi.org/10.1016/j.bpj.2021.08.045
work_keys_str_mv AT bandyopadhyayargha unsupervisedselectionofoptimalsinglemoleculetimeseriesidealizationcriterion
AT goldschenohmmarcelp unsupervisedselectionofoptimalsinglemoleculetimeseriesidealizationcriterion