Cargando…

Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm

BACKGROUND: Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sourc...

Descripción completa

Detalles Bibliográficos
Autores principales: Stanford, Tyman E., Bagley, Christopher J., Solomon, Patty J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142289/
https://www.ncbi.nlm.nih.gov/pubmed/27980460
http://dx.doi.org/10.1186/s12953-016-0107-8
_version_ 1782472746005954560
author Stanford, Tyman E.
Bagley, Christopher J.
Solomon, Patty J.
author_facet Stanford, Tyman E.
Bagley, Christopher J.
Solomon, Patty J.
author_sort Stanford, Tyman E.
collection PubMed
description BACKGROUND: Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios (m/z), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel ‘continuous’ line segment algorithm that efficiently operates over data with a transformed m/z-axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed m/z scale. RESULTS: The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel ‘continuous’ line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets. CONCLUSIONS: The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12953-016-0107-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5142289
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-51422892016-12-15 Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm Stanford, Tyman E. Bagley, Christopher J. Solomon, Patty J. Proteome Sci Methodology BACKGROUND: Proteomic matrix-assisted laser desorption/ionisation (MALDI) linear time-of-flight (TOF) mass spectrometry (MS) may be used to produce protein profiles from biological samples with the aim of discovering biomarkers for disease. However, the raw protein profiles suffer from several sources of bias or systematic variation which need to be removed via pre-processing before meaningful downstream analysis of the data can be undertaken. Baseline subtraction, an early pre-processing step that removes the non-peptide signal from the spectra, is complicated by the following: (i) each spectrum has, on average, wider peaks for peptides with higher mass-to-charge ratios (m/z), and (ii) the time-consuming and error-prone trial-and-error process for optimising the baseline subtraction input arguments. With reference to the aforementioned complications, we present an automated pipeline that includes (i) a novel ‘continuous’ line segment algorithm that efficiently operates over data with a transformed m/z-axis to remove the relationship between peptide mass and peak width, and (ii) an input-free algorithm to estimate peak widths on the transformed m/z scale. RESULTS: The automated baseline subtraction method was deployed on six publicly available proteomic MS datasets using six different m/z-axis transformations. Optimality of the automated baseline subtraction pipeline was assessed quantitatively using the mean absolute scaled error (MASE) when compared to a gold-standard baseline subtracted signal. Several of the transformations investigated were able to reduce, if not entirely remove, the peak width and peak location relationship resulting in near-optimal baseline subtraction using the automated pipeline. The proposed novel ‘continuous’ line segment algorithm is shown to far outperform naive sliding window algorithms with regard to the computational time required. The improvement in computational time was at least four-fold on real MALDI TOF-MS data and at least an order of magnitude on many simulated datasets. CONCLUSIONS: The advantages of the proposed pipeline include informed and data specific input arguments for baseline subtraction methods, the avoidance of time-intensive and subjective piecewise baseline subtraction, and the ability to automate baseline subtraction completely. Moreover, individual steps can be adopted as stand-alone routines. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12953-016-0107-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-07 /pmc/articles/PMC5142289/ /pubmed/27980460 http://dx.doi.org/10.1186/s12953-016-0107-8 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Stanford, Tyman E.
Bagley, Christopher J.
Solomon, Patty J.
Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm
title Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm
title_full Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm
title_fullStr Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm
title_full_unstemmed Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm
title_short Informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm
title_sort informed baseline subtraction of proteomic mass spectrometry data aided by a novel sliding window algorithm
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5142289/
https://www.ncbi.nlm.nih.gov/pubmed/27980460
http://dx.doi.org/10.1186/s12953-016-0107-8
work_keys_str_mv AT stanfordtymane informedbaselinesubtractionofproteomicmassspectrometrydataaidedbyanovelslidingwindowalgorithm
AT bagleychristopherj informedbaselinesubtractionofproteomicmassspectrometrydataaidedbyanovelslidingwindowalgorithm
AT solomonpattyj informedbaselinesubtractionofproteomicmassspectrometrydataaidedbyanovelslidingwindowalgorithm