Cargando…
Unsupervised selection of optimal single-molecule time series idealization criterion
Single-molecule (SM) approaches have provided valuable mechanistic information on many biophysical systems. As technological advances lead to ever-larger data sets, tools for rapid analysis and identification of molecules exhibiting the behavior of interest are increasingly important. In many cases...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Biophysical Society
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8553667/ https://www.ncbi.nlm.nih.gov/pubmed/34487708 http://dx.doi.org/10.1016/j.bpj.2021.08.045 |
Sumario: | Single-molecule (SM) approaches have provided valuable mechanistic information on many biophysical systems. As technological advances lead to ever-larger data sets, tools for rapid analysis and identification of molecules exhibiting the behavior of interest are increasingly important. In many cases the underlying mechanism is unknown, making unsupervised techniques desirable. The divisive segmentation and clustering (DISC) algorithm is one such unsupervised method that idealizes noisy SM time series much faster than computationally intensive approaches without sacrificing accuracy. However, DISC relies on a user-selected objective criterion (OC) to guide its estimation of the ideal time series. Here, we explore how different OCs affect DISC’s performance for data typical of SM fluorescence imaging experiments. We find that OCs differing in their penalty for model complexity each optimize DISC’s performance for time series with different properties such as signal/noise and number of sample points. Using a machine learning approach, we generate a decision boundary that allows unsupervised selection of OCs based on the input time series to maximize performance for different types of data. This is particularly relevant for SM fluorescence data sets, which often have signal/noise near the derived decision boundary and include time series of nonuniform length because of stochastic bleaching. Our approach, AutoDISC, allows unsupervised per-molecule optimization of DISC, which will substantially assist in the rapid analysis of high-throughput SM data sets with noisy samples and nonuniform time windows. |
---|