Cargando…

Unsupervised selection of optimal single-molecule time series idealization criterion

Single-molecule (SM) approaches have provided valuable mechanistic information on many biophysical systems. As technological advances lead to ever-larger data sets, tools for rapid analysis and identification of molecules exhibiting the behavior of interest are increasingly important. In many cases...

Descripción completa

Detalles Bibliográficos
Autores principales: Bandyopadhyay, Argha, Goldschen-Ohm, Marcel P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Biophysical Society 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8553667/
https://www.ncbi.nlm.nih.gov/pubmed/34487708
http://dx.doi.org/10.1016/j.bpj.2021.08.045
Descripción
Sumario:Single-molecule (SM) approaches have provided valuable mechanistic information on many biophysical systems. As technological advances lead to ever-larger data sets, tools for rapid analysis and identification of molecules exhibiting the behavior of interest are increasingly important. In many cases the underlying mechanism is unknown, making unsupervised techniques desirable. The divisive segmentation and clustering (DISC) algorithm is one such unsupervised method that idealizes noisy SM time series much faster than computationally intensive approaches without sacrificing accuracy. However, DISC relies on a user-selected objective criterion (OC) to guide its estimation of the ideal time series. Here, we explore how different OCs affect DISC’s performance for data typical of SM fluorescence imaging experiments. We find that OCs differing in their penalty for model complexity each optimize DISC’s performance for time series with different properties such as signal/noise and number of sample points. Using a machine learning approach, we generate a decision boundary that allows unsupervised selection of OCs based on the input time series to maximize performance for different types of data. This is particularly relevant for SM fluorescence data sets, which often have signal/noise near the derived decision boundary and include time series of nonuniform length because of stochastic bleaching. Our approach, AutoDISC, allows unsupervised per-molecule optimization of DISC, which will substantially assist in the rapid analysis of high-throughput SM data sets with noisy samples and nonuniform time windows.