Cargando…

A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments

Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semi-supervised algorithms to learn models directly from the datasets that they analyze. Although these methods...

Descripción completa

Detalles Bibliográficos
Autores principales: Fondrie, William E., Noble, William S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8455073/
https://www.ncbi.nlm.nih.gov/pubmed/32009418
http://dx.doi.org/10.1021/acs.jproteome.9b00780
_version_ 1784570601190457344
author Fondrie, William E.
Noble, William S.
author_facet Fondrie, William E.
Noble, William S.
author_sort Fondrie, William E.
collection PubMed
description Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semi-supervised algorithms to learn models directly from the datasets that they analyze. Although these methods are effective for many proteomics experiments, we suspected that they may be suboptimal for experiments of smaller scale. In this work, we found that the power and consistency of Percolator results was reduced as the size of the experiment was decreased. As an alternative, we propose a different operating mode for Percolator: learn a model with Percolator from a large dataset and use the learned model to evaluate the small-scale experiment. We call this a “static modeling” approach, in contrast to Percolator’s usual “dynamic model” that is trained anew for each dataset. We applied this static modeling approach to two settings: small, gel-based experiments and single-cell proteomics. In both cases, static models increased the yield of detected peptides and eliminated the model-induced variability of the standard dynamic approach. These results suggest that static models are a powerful tool for bringing the full benefits of Percolator and other semi-supervised algorithms to small-scale experiments.
format Online
Article
Text
id pubmed-8455073
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-84550732021-09-21 A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments Fondrie, William E. Noble, William S. J Proteome Res Article Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semi-supervised algorithms to learn models directly from the datasets that they analyze. Although these methods are effective for many proteomics experiments, we suspected that they may be suboptimal for experiments of smaller scale. In this work, we found that the power and consistency of Percolator results was reduced as the size of the experiment was decreased. As an alternative, we propose a different operating mode for Percolator: learn a model with Percolator from a large dataset and use the learned model to evaluate the small-scale experiment. We call this a “static modeling” approach, in contrast to Percolator’s usual “dynamic model” that is trained anew for each dataset. We applied this static modeling approach to two settings: small, gel-based experiments and single-cell proteomics. In both cases, static models increased the yield of detected peptides and eliminated the model-induced variability of the standard dynamic approach. These results suggest that static models are a powerful tool for bringing the full benefits of Percolator and other semi-supervised algorithms to small-scale experiments. 2020-02-17 2020-03-06 /pmc/articles/PMC8455073/ /pubmed/32009418 http://dx.doi.org/10.1021/acs.jproteome.9b00780 Text en https://creativecommons.org/licenses/by/4.0/It is made available under a CC-BY 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Fondrie, William E.
Noble, William S.
A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
title A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
title_full A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
title_fullStr A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
title_full_unstemmed A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
title_short A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
title_sort machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8455073/
https://www.ncbi.nlm.nih.gov/pubmed/32009418
http://dx.doi.org/10.1021/acs.jproteome.9b00780
work_keys_str_mv AT fondriewilliame amachinelearningstrategythatleverageslargedatasetstobooststatisticalpowerinsmallscaleexperiments
AT noblewilliams amachinelearningstrategythatleverageslargedatasetstobooststatisticalpowerinsmallscaleexperiments
AT fondriewilliame machinelearningstrategythatleverageslargedatasetstobooststatisticalpowerinsmallscaleexperiments
AT noblewilliams machinelearningstrategythatleverageslargedatasetstobooststatisticalpowerinsmallscaleexperiments