Cargando…

The Local-Balanced Model for Improved Machine Learning Outcomes on Mass Spectrometry Data Sets and Other Instrumental Data

One unifying challenge when classifying biological samples with mass spectrometry data is overcoming the obstacle of sample-to-sample variability so that differences between groups, such as between a healthy set and a disease set, can be identified. Similarly, when the same sample is re-analyzed und...

Descripción completa

Detalles Bibliográficos
Autores principales: Desaire, Heather, Patabandige, Milani Wijeweera, Hua, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8516084/
https://www.ncbi.nlm.nih.gov/pubmed/33580828
http://dx.doi.org/10.1007/s00216-020-03117-2
_version_ 1784583730254315520
author Desaire, Heather
Patabandige, Milani Wijeweera
Hua, David
author_facet Desaire, Heather
Patabandige, Milani Wijeweera
Hua, David
author_sort Desaire, Heather
collection PubMed
description One unifying challenge when classifying biological samples with mass spectrometry data is overcoming the obstacle of sample-to-sample variability so that differences between groups, such as between a healthy set and a disease set, can be identified. Similarly, when the same sample is re-analyzed under identical conditions, instrument signals can fluctuate by more than 10%. This signal inconsistency imposes difficulties in identifying subtle differences across a set of samples, and it weakens the mass spectrometrist’s ability to effectively leverage data in domains as diverse as proteomics, metabolomics, glycomics, and imaging. We selected challenging data sets in the fields of glycomics, mass spectrometry imaging, and bacterial typing to study the problem of within-group signal variability and adapted a 30 year old statistical approach to address the problem. The solution, “local-balanced model,” relies on using balanced subsets of training data to classify test samples. This analysis strategy was assessed on ESI-MS data of IgG-based glycopeptides and MALDI-MS imaging data of endogenous lipids, and MALDI-MS data of bacterial proteins. Two preliminary examples on non-mass spectrometry data sets are also included to show the potential generality of the method outside the field of MS analysis. We demonstrate that this approach is superior to simple normalization methods, generalizable to multiple mass spectrometry domains, and potentially appropriate in fields as diverse as physics and satellite imaging. In some cases, improvements in classification can be dramatic, with accuracy escalating from 60% with normalization alone to over 90% with the additional development described herein.
format Online
Article
Text
id pubmed-8516084
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-85160842022-03-01 The Local-Balanced Model for Improved Machine Learning Outcomes on Mass Spectrometry Data Sets and Other Instrumental Data Desaire, Heather Patabandige, Milani Wijeweera Hua, David Anal Bioanal Chem Article One unifying challenge when classifying biological samples with mass spectrometry data is overcoming the obstacle of sample-to-sample variability so that differences between groups, such as between a healthy set and a disease set, can be identified. Similarly, when the same sample is re-analyzed under identical conditions, instrument signals can fluctuate by more than 10%. This signal inconsistency imposes difficulties in identifying subtle differences across a set of samples, and it weakens the mass spectrometrist’s ability to effectively leverage data in domains as diverse as proteomics, metabolomics, glycomics, and imaging. We selected challenging data sets in the fields of glycomics, mass spectrometry imaging, and bacterial typing to study the problem of within-group signal variability and adapted a 30 year old statistical approach to address the problem. The solution, “local-balanced model,” relies on using balanced subsets of training data to classify test samples. This analysis strategy was assessed on ESI-MS data of IgG-based glycopeptides and MALDI-MS imaging data of endogenous lipids, and MALDI-MS data of bacterial proteins. Two preliminary examples on non-mass spectrometry data sets are also included to show the potential generality of the method outside the field of MS analysis. We demonstrate that this approach is superior to simple normalization methods, generalizable to multiple mass spectrometry domains, and potentially appropriate in fields as diverse as physics and satellite imaging. In some cases, improvements in classification can be dramatic, with accuracy escalating from 60% with normalization alone to over 90% with the additional development described herein. 2021-02-13 2021-03 /pmc/articles/PMC8516084/ /pubmed/33580828 http://dx.doi.org/10.1007/s00216-020-03117-2 Text en https://creativecommons.org/licenses/by/4.0/Terms of use and reuse: academic research for non-commercial purposes, see here for full terms. http://www.springer.com/gb/open-access/authors-rights/aam-terms-v1
spellingShingle Article
Desaire, Heather
Patabandige, Milani Wijeweera
Hua, David
The Local-Balanced Model for Improved Machine Learning Outcomes on Mass Spectrometry Data Sets and Other Instrumental Data
title The Local-Balanced Model for Improved Machine Learning Outcomes on Mass Spectrometry Data Sets and Other Instrumental Data
title_full The Local-Balanced Model for Improved Machine Learning Outcomes on Mass Spectrometry Data Sets and Other Instrumental Data
title_fullStr The Local-Balanced Model for Improved Machine Learning Outcomes on Mass Spectrometry Data Sets and Other Instrumental Data
title_full_unstemmed The Local-Balanced Model for Improved Machine Learning Outcomes on Mass Spectrometry Data Sets and Other Instrumental Data
title_short The Local-Balanced Model for Improved Machine Learning Outcomes on Mass Spectrometry Data Sets and Other Instrumental Data
title_sort local-balanced model for improved machine learning outcomes on mass spectrometry data sets and other instrumental data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8516084/
https://www.ncbi.nlm.nih.gov/pubmed/33580828
http://dx.doi.org/10.1007/s00216-020-03117-2
work_keys_str_mv AT desaireheather thelocalbalancedmodelforimprovedmachinelearningoutcomesonmassspectrometrydatasetsandotherinstrumentaldata
AT patabandigemilaniwijeweera thelocalbalancedmodelforimprovedmachinelearningoutcomesonmassspectrometrydatasetsandotherinstrumentaldata
AT huadavid thelocalbalancedmodelforimprovedmachinelearningoutcomesonmassspectrometrydatasetsandotherinstrumentaldata
AT desaireheather localbalancedmodelforimprovedmachinelearningoutcomesonmassspectrometrydatasetsandotherinstrumentaldata
AT patabandigemilaniwijeweera localbalancedmodelforimprovedmachinelearningoutcomesonmassspectrometrydatasetsandotherinstrumentaldata
AT huadavid localbalancedmodelforimprovedmachinelearningoutcomesonmassspectrometrydatasetsandotherinstrumentaldata