Cargando…

On the feasibility of deep learning applications using raw mass spectrometry data

SUMMARY: In recent years, SWATH-MS has become the proteomic method of choice for data-independent–acquisition, as it enables high proteome coverage, accuracy and reproducibility. However, data analysis is convoluted and requires prior information and expert curation. Furthermore, as quantification i...

Descripción completa

Detalles Bibliográficos
Autores principales: Cadow, Joris, Manica, Matteo, Mathis, Roland, Reddel, Roger R, Robinson, Phillip J, Wild, Peter J, Hains, Peter G, Lucas, Natasha, Zhong, Qing, Guo, Tiannan, Aebersold, Ruedi, Rodríguez Martínez, María
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275322/
https://www.ncbi.nlm.nih.gov/pubmed/34252933
http://dx.doi.org/10.1093/bioinformatics/btab311
_version_ 1783721689360629760
author Cadow, Joris
Manica, Matteo
Mathis, Roland
Reddel, Roger R
Robinson, Phillip J
Wild, Peter J
Hains, Peter G
Lucas, Natasha
Zhong, Qing
Guo, Tiannan
Aebersold, Ruedi
Rodríguez Martínez, María
author_facet Cadow, Joris
Manica, Matteo
Mathis, Roland
Reddel, Roger R
Robinson, Phillip J
Wild, Peter J
Hains, Peter G
Lucas, Natasha
Zhong, Qing
Guo, Tiannan
Aebersold, Ruedi
Rodríguez Martínez, María
author_sort Cadow, Joris
collection PubMed
description SUMMARY: In recent years, SWATH-MS has become the proteomic method of choice for data-independent–acquisition, as it enables high proteome coverage, accuracy and reproducibility. However, data analysis is convoluted and requires prior information and expert curation. Furthermore, as quantification is limited to a small set of peptides, potentially important biological information may be discarded. Here we demonstrate that deep learning can be used to learn discriminative features directly from raw MS data, eliminating hence the need of elaborate data processing pipelines. Using transfer learning to overcome sample sparsity, we exploit a collection of publicly available deep learning models already trained for the task of natural image classification. These models are used to produce feature vectors from each mass spectrometry (MS) raw image, which are later used as input for a classifier trained to distinguish tumor from normal prostate biopsies. Although the deep learning models were originally trained for a completely different classification task and no additional fine-tuning is performed on them, we achieve a highly remarkable classification performance of 0.876 AUC. We investigate different types of image preprocessing and encoding. We also investigate whether the inclusion of the secondary MS2 spectra improves the classification performance. Throughout all tested models, we use standard protein expression vectors as gold standards. Even with our naïve implementation, our results suggest that the application of deep learning and transfer learning techniques might pave the way to the broader usage of raw mass spectrometry data in real-time diagnosis. AVAILABILITY AND IMPLEMENTATION: The open source code used to generate the results from MS images is available on GitHub: https://ibm.biz/mstransc. The data, including the MS images, their encodings, classification labels and results, can be accessed at the following link: https://ibm.ent.box.com/v/mstc-supplementary SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8275322
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82753222021-07-13 On the feasibility of deep learning applications using raw mass spectrometry data Cadow, Joris Manica, Matteo Mathis, Roland Reddel, Roger R Robinson, Phillip J Wild, Peter J Hains, Peter G Lucas, Natasha Zhong, Qing Guo, Tiannan Aebersold, Ruedi Rodríguez Martínez, María Bioinformatics Macromolecular Sequence, Structure, and Function SUMMARY: In recent years, SWATH-MS has become the proteomic method of choice for data-independent–acquisition, as it enables high proteome coverage, accuracy and reproducibility. However, data analysis is convoluted and requires prior information and expert curation. Furthermore, as quantification is limited to a small set of peptides, potentially important biological information may be discarded. Here we demonstrate that deep learning can be used to learn discriminative features directly from raw MS data, eliminating hence the need of elaborate data processing pipelines. Using transfer learning to overcome sample sparsity, we exploit a collection of publicly available deep learning models already trained for the task of natural image classification. These models are used to produce feature vectors from each mass spectrometry (MS) raw image, which are later used as input for a classifier trained to distinguish tumor from normal prostate biopsies. Although the deep learning models were originally trained for a completely different classification task and no additional fine-tuning is performed on them, we achieve a highly remarkable classification performance of 0.876 AUC. We investigate different types of image preprocessing and encoding. We also investigate whether the inclusion of the secondary MS2 spectra improves the classification performance. Throughout all tested models, we use standard protein expression vectors as gold standards. Even with our naïve implementation, our results suggest that the application of deep learning and transfer learning techniques might pave the way to the broader usage of raw mass spectrometry data in real-time diagnosis. AVAILABILITY AND IMPLEMENTATION: The open source code used to generate the results from MS images is available on GitHub: https://ibm.biz/mstransc. The data, including the MS images, their encodings, classification labels and results, can be accessed at the following link: https://ibm.ent.box.com/v/mstc-supplementary SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-07-12 /pmc/articles/PMC8275322/ /pubmed/34252933 http://dx.doi.org/10.1093/bioinformatics/btab311 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Macromolecular Sequence, Structure, and Function
Cadow, Joris
Manica, Matteo
Mathis, Roland
Reddel, Roger R
Robinson, Phillip J
Wild, Peter J
Hains, Peter G
Lucas, Natasha
Zhong, Qing
Guo, Tiannan
Aebersold, Ruedi
Rodríguez Martínez, María
On the feasibility of deep learning applications using raw mass spectrometry data
title On the feasibility of deep learning applications using raw mass spectrometry data
title_full On the feasibility of deep learning applications using raw mass spectrometry data
title_fullStr On the feasibility of deep learning applications using raw mass spectrometry data
title_full_unstemmed On the feasibility of deep learning applications using raw mass spectrometry data
title_short On the feasibility of deep learning applications using raw mass spectrometry data
title_sort on the feasibility of deep learning applications using raw mass spectrometry data
topic Macromolecular Sequence, Structure, and Function
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8275322/
https://www.ncbi.nlm.nih.gov/pubmed/34252933
http://dx.doi.org/10.1093/bioinformatics/btab311
work_keys_str_mv AT cadowjoris onthefeasibilityofdeeplearningapplicationsusingrawmassspectrometrydata
AT manicamatteo onthefeasibilityofdeeplearningapplicationsusingrawmassspectrometrydata
AT mathisroland onthefeasibilityofdeeplearningapplicationsusingrawmassspectrometrydata
AT reddelrogerr onthefeasibilityofdeeplearningapplicationsusingrawmassspectrometrydata
AT robinsonphillipj onthefeasibilityofdeeplearningapplicationsusingrawmassspectrometrydata
AT wildpeterj onthefeasibilityofdeeplearningapplicationsusingrawmassspectrometrydata
AT hainspeterg onthefeasibilityofdeeplearningapplicationsusingrawmassspectrometrydata
AT lucasnatasha onthefeasibilityofdeeplearningapplicationsusingrawmassspectrometrydata
AT zhongqing onthefeasibilityofdeeplearningapplicationsusingrawmassspectrometrydata
AT guotiannan onthefeasibilityofdeeplearningapplicationsusingrawmassspectrometrydata
AT aebersoldruedi onthefeasibilityofdeeplearningapplicationsusingrawmassspectrometrydata
AT rodriguezmartinezmaria onthefeasibilityofdeeplearningapplicationsusingrawmassspectrometrydata