Cargando…
A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments
Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semi-supervised algorithms to learn models directly from the datasets that they analyze. Although these methods...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8455073/ https://www.ncbi.nlm.nih.gov/pubmed/32009418 http://dx.doi.org/10.1021/acs.jproteome.9b00780 |
_version_ | 1784570601190457344 |
---|---|
author | Fondrie, William E. Noble, William S. |
author_facet | Fondrie, William E. Noble, William S. |
author_sort | Fondrie, William E. |
collection | PubMed |
description | Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semi-supervised algorithms to learn models directly from the datasets that they analyze. Although these methods are effective for many proteomics experiments, we suspected that they may be suboptimal for experiments of smaller scale. In this work, we found that the power and consistency of Percolator results was reduced as the size of the experiment was decreased. As an alternative, we propose a different operating mode for Percolator: learn a model with Percolator from a large dataset and use the learned model to evaluate the small-scale experiment. We call this a “static modeling” approach, in contrast to Percolator’s usual “dynamic model” that is trained anew for each dataset. We applied this static modeling approach to two settings: small, gel-based experiments and single-cell proteomics. In both cases, static models increased the yield of detected peptides and eliminated the model-induced variability of the standard dynamic approach. These results suggest that static models are a powerful tool for bringing the full benefits of Percolator and other semi-supervised algorithms to small-scale experiments. |
format | Online Article Text |
id | pubmed-8455073 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-84550732021-09-21 A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments Fondrie, William E. Noble, William S. J Proteome Res Article Machine learning methods have proven invaluable for increasing the sensitivity of peptide detection in proteomics experiments. Most modern tools, such as Percolator and PeptideProphet, use semi-supervised algorithms to learn models directly from the datasets that they analyze. Although these methods are effective for many proteomics experiments, we suspected that they may be suboptimal for experiments of smaller scale. In this work, we found that the power and consistency of Percolator results was reduced as the size of the experiment was decreased. As an alternative, we propose a different operating mode for Percolator: learn a model with Percolator from a large dataset and use the learned model to evaluate the small-scale experiment. We call this a “static modeling” approach, in contrast to Percolator’s usual “dynamic model” that is trained anew for each dataset. We applied this static modeling approach to two settings: small, gel-based experiments and single-cell proteomics. In both cases, static models increased the yield of detected peptides and eliminated the model-induced variability of the standard dynamic approach. These results suggest that static models are a powerful tool for bringing the full benefits of Percolator and other semi-supervised algorithms to small-scale experiments. 2020-02-17 2020-03-06 /pmc/articles/PMC8455073/ /pubmed/32009418 http://dx.doi.org/10.1021/acs.jproteome.9b00780 Text en https://creativecommons.org/licenses/by/4.0/It is made available under a CC-BY 4.0 International license (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Fondrie, William E. Noble, William S. A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments |
title | A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments |
title_full | A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments |
title_fullStr | A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments |
title_full_unstemmed | A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments |
title_short | A machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments |
title_sort | machine learning strategy that leverages large datasets to boost statistical power in small-scale experiments |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8455073/ https://www.ncbi.nlm.nih.gov/pubmed/32009418 http://dx.doi.org/10.1021/acs.jproteome.9b00780 |
work_keys_str_mv | AT fondriewilliame amachinelearningstrategythatleverageslargedatasetstobooststatisticalpowerinsmallscaleexperiments AT noblewilliams amachinelearningstrategythatleverageslargedatasetstobooststatisticalpowerinsmallscaleexperiments AT fondriewilliame machinelearningstrategythatleverageslargedatasetstobooststatisticalpowerinsmallscaleexperiments AT noblewilliams machinelearningstrategythatleverageslargedatasetstobooststatisticalpowerinsmallscaleexperiments |