Cargando…

A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments

Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semi-supervised algorithms to learn models directly from the datasets that they analyze. Although these methods...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fondrie, William E., Noble, William S.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8455073/ https://www.ncbi.nlm.nih.gov/pubmed/32009418 http://dx.doi.org/10.1021/acs.jproteome.9b00780

_version_	1784570601190457344
author	Fondrie, William E. Noble, William S.
author_facet	Fondrie, William E. Noble, William S.
author_sort	Fondrie, William E.
collection	PubMed
description	Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semi-supervised algorithms to learn models directly from the datasets that they analyze. Although these methods are effective for many proteomics experiments, we suspected that they may be suboptimal for experiments of smaller scale. In this work, we found that the power and consistency of Percolator results was reduced as the size of the experiment was decreased. As an alternative, we propose a different operating mode for Percolator: learn a model with Percolator from a large dataset and use the learned model to evaluate the small-scale experiment. We call this a “static modeling” approach, in contrast to Percolator’s usual “dynamic model” that is trained anew for each dataset. We applied this static modeling approach to two settings: small, gel-based experiments and single-cell proteomics. In both cases, static models increased the yield of detected peptides and eliminated the model-induced variability of the standard dynamic approach. These results suggest that static models are a powerful tool for bringing the full benefits of Percolator and other semi-supervised algorithms to small-scale experiments.
format	Online Article Text
id	pubmed-8455073
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-84550732021-09-21 A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments Fondrie, William E. Noble, William S. J Proteome Res Article Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semi-supervised algorithms to learn models directly from the datasets that they analyze. Although these methods are effective for many proteomics experiments, we suspected that they may be suboptimal for experiments of smaller scale. In this work, we found that the power and consistency of Percolator results was reduced as the size of the experiment was decreased. As an alternative, we propose a different operating mode for Percolator: learn a model with Percolator from a large dataset and use the learned model to evaluate the small-scale experiment. We call this a “static modeling” approach, in contrast to Percolator’s usual “dynamic model” that is trained anew for each dataset. We applied this static modeling approach to two settings: small, gel-based experiments and single-cell proteomics. In both cases, static models increased the yield of detected peptides and eliminated the model-induced variability of the standard dynamic approach. These results suggest that static models are a powerful tool for bringing the full benefits of Percolator and other semi-supervised algorithms to small-scale experiments. 2020-02-17 2020-03-06 /pmc/articles/PMC8455073/ /pubmed/32009418 http://dx.doi.org/10.1021/acs.jproteome.9b00780 Text en https://creativecommons.org/licenses/by/4.0/It is made available under a CC-BY 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Fondrie, William E. Noble, William S. A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
title	A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
title_full	A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
title_fullStr	A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
title_full_unstemmed	A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
title_short	A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
title_sort	machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8455073/ https://www.ncbi.nlm.nih.gov/pubmed/32009418 http://dx.doi.org/10.1021/acs.jproteome.9b00780
work_keys_str_mv	AT fondriewilliame amachinelearningstrategythatleverageslargedatasetstobooststatisticalpowerinsmallscaleexperiments AT noblewilliams amachinelearningstrategythatleverageslargedatasetstobooststatisticalpowerinsmallscaleexperiments AT fondriewilliame machinelearningstrategythatleverageslargedatasetstobooststatisticalpowerinsmallscaleexperiments AT noblewilliams machinelearningstrategythatleverageslargedatasetstobooststatisticalpowerinsmallscaleexperiments

A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments

Ejemplares similares