Cargando…

Variability analysis of LC-MS experimental factors and their impact on machine learning

BACKGROUND: Machine learning (ML) technologies, especially deep learning (DL), have gained increasing attention in predictive mass spectrometry (MS) for enhancing the data-processing pipeline from raw data analysis to end-user predictions and rescoring. ML models need large-scale datasets for traini...

Descripción completa

Detalles Bibliográficos
Autores principales: Rehfeldt, Tobias Greisager, Krawczyk, Konrad, Echers, Simon Gregersen, Marcatili, Paolo, Palczynski, Pawel, Röttger, Richard, Schwämmle, Veit
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10659119/
https://www.ncbi.nlm.nih.gov/pubmed/37983748
http://dx.doi.org/10.1093/gigascience/giad096
_version_ 1785148283801305088
author Rehfeldt, Tobias Greisager
Krawczyk, Konrad
Echers, Simon Gregersen
Marcatili, Paolo
Palczynski, Pawel
Röttger, Richard
Schwämmle, Veit
author_facet Rehfeldt, Tobias Greisager
Krawczyk, Konrad
Echers, Simon Gregersen
Marcatili, Paolo
Palczynski, Pawel
Röttger, Richard
Schwämmle, Veit
author_sort Rehfeldt, Tobias Greisager
collection PubMed
description BACKGROUND: Machine learning (ML) technologies, especially deep learning (DL), have gained increasing attention in predictive mass spectrometry (MS) for enhancing the data-processing pipeline from raw data analysis to end-user predictions and rescoring. ML models need large-scale datasets for training and repurposing, which can be obtained from a range of public data repositories. However, applying ML to public MS datasets on larger scales is challenging, as they vary widely in terms of data acquisition methods, biological systems, and experimental designs. RESULTS: We aim to facilitate ML efforts in MS data by conducting a systematic analysis of the potential sources of variability in public MS repositories. We also examine how these factors affect ML performance and perform a comprehensive transfer learning to evaluate the benefits of current best practice methods in the field for transfer learning. CONCLUSIONS: Our findings show significantly higher levels of homogeneity within a project than between projects, which indicates that it is important to construct datasets most closely resembling future test cases, as transferability is severely limited for unseen datasets. We also found that transfer learning, although it did increase model performance, did not increase model performance compared to a non-pretrained model.
format Online
Article
Text
id pubmed-10659119
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106591192023-11-20 Variability analysis of LC-MS experimental factors and their impact on machine learning Rehfeldt, Tobias Greisager Krawczyk, Konrad Echers, Simon Gregersen Marcatili, Paolo Palczynski, Pawel Röttger, Richard Schwämmle, Veit Gigascience Research BACKGROUND: Machine learning (ML) technologies, especially deep learning (DL), have gained increasing attention in predictive mass spectrometry (MS) for enhancing the data-processing pipeline from raw data analysis to end-user predictions and rescoring. ML models need large-scale datasets for training and repurposing, which can be obtained from a range of public data repositories. However, applying ML to public MS datasets on larger scales is challenging, as they vary widely in terms of data acquisition methods, biological systems, and experimental designs. RESULTS: We aim to facilitate ML efforts in MS data by conducting a systematic analysis of the potential sources of variability in public MS repositories. We also examine how these factors affect ML performance and perform a comprehensive transfer learning to evaluate the benefits of current best practice methods in the field for transfer learning. CONCLUSIONS: Our findings show significantly higher levels of homogeneity within a project than between projects, which indicates that it is important to construct datasets most closely resembling future test cases, as transferability is severely limited for unseen datasets. We also found that transfer learning, although it did increase model performance, did not increase model performance compared to a non-pretrained model. Oxford University Press 2023-11-20 /pmc/articles/PMC10659119/ /pubmed/37983748 http://dx.doi.org/10.1093/gigascience/giad096 Text en © The Author(s) 2023. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Rehfeldt, Tobias Greisager
Krawczyk, Konrad
Echers, Simon Gregersen
Marcatili, Paolo
Palczynski, Pawel
Röttger, Richard
Schwämmle, Veit
Variability analysis of LC-MS experimental factors and their impact on machine learning
title Variability analysis of LC-MS experimental factors and their impact on machine learning
title_full Variability analysis of LC-MS experimental factors and their impact on machine learning
title_fullStr Variability analysis of LC-MS experimental factors and their impact on machine learning
title_full_unstemmed Variability analysis of LC-MS experimental factors and their impact on machine learning
title_short Variability analysis of LC-MS experimental factors and their impact on machine learning
title_sort variability analysis of lc-ms experimental factors and their impact on machine learning
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10659119/
https://www.ncbi.nlm.nih.gov/pubmed/37983748
http://dx.doi.org/10.1093/gigascience/giad096
work_keys_str_mv AT rehfeldttobiasgreisager variabilityanalysisoflcmsexperimentalfactorsandtheirimpactonmachinelearning
AT krawczykkonrad variabilityanalysisoflcmsexperimentalfactorsandtheirimpactonmachinelearning
AT echerssimongregersen variabilityanalysisoflcmsexperimentalfactorsandtheirimpactonmachinelearning
AT marcatilipaolo variabilityanalysisoflcmsexperimentalfactorsandtheirimpactonmachinelearning
AT palczynskipawel variabilityanalysisoflcmsexperimentalfactorsandtheirimpactonmachinelearning
AT rottgerrichard variabilityanalysisoflcmsexperimentalfactorsandtheirimpactonmachinelearning
AT schwammleveit variabilityanalysisoflcmsexperimentalfactorsandtheirimpactonmachinelearning