Cargando…

Estimating error rates for single molecule protein sequencing experiments

The practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorose...

Descripción completa

Detalles Bibliográficos
Autores principales:	Smith, Matthew Beauregard, VanderVelden, Kent, Blom, Thomas, Stout, Heather D., Mapes, James H., Folsom, Tucker M., Martin, Christopher, Bardo, Angela M., Marcotte, Edward M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cold Spring Harbor Laboratory 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10370102/ https://www.ncbi.nlm.nih.gov/pubmed/37502879 http://dx.doi.org/10.1101/2023.07.18.549591

_version_	1785077889349189632
author	Smith, Matthew Beauregard VanderVelden, Kent Blom, Thomas Stout, Heather D. Mapes, James H. Folsom, Tucker M. Martin, Christopher Bardo, Angela M. Marcotte, Edward M.
author_facet	Smith, Matthew Beauregard VanderVelden, Kent Blom, Thomas Stout, Heather D. Mapes, James H. Folsom, Tucker M. Martin, Christopher Bardo, Angela M. Marcotte, Edward M.
author_sort	Smith, Matthew Beauregard
collection	PubMed
description	The practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorosequencing. A Hidden Markov Model (HMM) based approach, extends whatprot, where we previously used HMMs for SMPS peptide-read matching. This extension offers a principled approach for estimating key parameters for fluorosequencing experiments, including missed amino acid cleavages, dye loss, and peptide detachment. Specifically, we adapted the Baum-Welch algorithm, a standard technique to estimate transition probabilities for an HMM using expectation maximization, but modified here to estimate a small number of parameter values directly rather than estimating every transition probability independently, which should help prevent overfitting. We demonstrate a high degree of accuracy on simulated data, but on experimental datasets, we observed that the model needed to be augmented with an additional error type, N-terminal blocking. This, in combination with data pre-processing, results in reasonable parameterizations of experimental datasets that agree with controlled experimental perturbations. A second independent implementation using a hybrid of DIRECT and Powell’s method to reduce the root mean squared error (RMSE) between simulations and the real dataset was also developed. We compare these methods on both simulated and real data, finding that our Baum-Welch based approach outperforms DIRECT and Powell’s method by most, but not all, criteria. Although some discrepancies between the results exist, we also find that both approaches provide similar error rate estimates from experimental single molecule fluorosequencing datasets.
format	Online Article Text
id	pubmed-10370102
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cold Spring Harbor Laboratory
record_format	MEDLINE/PubMed
spelling	pubmed-103701022023-07-27 Estimating error rates for single molecule protein sequencing experiments Smith, Matthew Beauregard VanderVelden, Kent Blom, Thomas Stout, Heather D. Mapes, James H. Folsom, Tucker M. Martin, Christopher Bardo, Angela M. Marcotte, Edward M. bioRxiv Article The practical application of new single molecule protein sequencing (SMPS) technologies requires accurate estimates of their associated sequencing error rates. Here, we describe the development and application of two distinct parameter estimation methods for analyzing SMPS reads produced by fluorosequencing. A Hidden Markov Model (HMM) based approach, extends whatprot, where we previously used HMMs for SMPS peptide-read matching. This extension offers a principled approach for estimating key parameters for fluorosequencing experiments, including missed amino acid cleavages, dye loss, and peptide detachment. Specifically, we adapted the Baum-Welch algorithm, a standard technique to estimate transition probabilities for an HMM using expectation maximization, but modified here to estimate a small number of parameter values directly rather than estimating every transition probability independently, which should help prevent overfitting. We demonstrate a high degree of accuracy on simulated data, but on experimental datasets, we observed that the model needed to be augmented with an additional error type, N-terminal blocking. This, in combination with data pre-processing, results in reasonable parameterizations of experimental datasets that agree with controlled experimental perturbations. A second independent implementation using a hybrid of DIRECT and Powell’s method to reduce the root mean squared error (RMSE) between simulations and the real dataset was also developed. We compare these methods on both simulated and real data, finding that our Baum-Welch based approach outperforms DIRECT and Powell’s method by most, but not all, criteria. Although some discrepancies between the results exist, we also find that both approaches provide similar error rate estimates from experimental single molecule fluorosequencing datasets. Cold Spring Harbor Laboratory 2023-07-19 /pmc/articles/PMC10370102/ /pubmed/37502879 http://dx.doi.org/10.1101/2023.07.18.549591 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle	Article Smith, Matthew Beauregard VanderVelden, Kent Blom, Thomas Stout, Heather D. Mapes, James H. Folsom, Tucker M. Martin, Christopher Bardo, Angela M. Marcotte, Edward M. Estimating error rates for single molecule protein sequencing experiments
title	Estimating error rates for single molecule protein sequencing experiments
title_full	Estimating error rates for single molecule protein sequencing experiments
title_fullStr	Estimating error rates for single molecule protein sequencing experiments
title_full_unstemmed	Estimating error rates for single molecule protein sequencing experiments
title_short	Estimating error rates for single molecule protein sequencing experiments
title_sort	estimating error rates for single molecule protein sequencing experiments
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10370102/ https://www.ncbi.nlm.nih.gov/pubmed/37502879 http://dx.doi.org/10.1101/2023.07.18.549591
work_keys_str_mv	AT smithmatthewbeauregard estimatingerrorratesforsinglemoleculeproteinsequencingexperiments AT vanderveldenkent estimatingerrorratesforsinglemoleculeproteinsequencingexperiments AT blomthomas estimatingerrorratesforsinglemoleculeproteinsequencingexperiments AT stoutheatherd estimatingerrorratesforsinglemoleculeproteinsequencingexperiments AT mapesjamesh estimatingerrorratesforsinglemoleculeproteinsequencingexperiments AT folsomtuckerm estimatingerrorratesforsinglemoleculeproteinsequencingexperiments AT martinchristopher estimatingerrorratesforsinglemoleculeproteinsequencingexperiments AT bardoangelam estimatingerrorratesforsinglemoleculeproteinsequencingexperiments AT marcotteedwardm estimatingerrorratesforsinglemoleculeproteinsequencingexperiments

Estimating error rates for single molecule protein sequencing experiments

Ejemplares similares