Cargando…

Matching isotopic distributions from metabolically labeled samples

Motivation: In recent years stable isotopic labeling has become a standard approach for quantitative proteomic analyses. Among the many available isotopic labeling strategies, metabolic labeling is attractive for the excellent internal control it provides. However, analysis of data from metabolic la...

Descripción completa

Detalles Bibliográficos
Autores principales: McIlwain, Sean, Page, David, Huttlin, Edward L., Sussman, Michael R.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718665/
https://www.ncbi.nlm.nih.gov/pubmed/18586733
http://dx.doi.org/10.1093/bioinformatics/btn190
_version_ 1782170011475902464
author McIlwain, Sean
Page, David
Huttlin, Edward L.
Sussman, Michael R.
author_facet McIlwain, Sean
Page, David
Huttlin, Edward L.
Sussman, Michael R.
author_sort McIlwain, Sean
collection PubMed
description Motivation: In recent years stable isotopic labeling has become a standard approach for quantitative proteomic analyses. Among the many available isotopic labeling strategies, metabolic labeling is attractive for the excellent internal control it provides. However, analysis of data from metabolic labeling experiments can be complicated because the spacing between labeled and unlabeled forms of each peptide depends on its sequence, and is thus variable from analyte to analyte. As a result, one generally needs to know the sequence of a peptide to identify its matching isotopic distributions in an automated fashion. In some experimental situations it would be necessary or desirable to match pairs of labeled and unlabeled peaks from peptides of unknown sequence. This article addresses this largely overlooked problem in the analysis of quantitative mass spectrometry data by presenting an algorithm that not only identifies isotopic distributions within a mass spectrum, but also annotates matches between natural abundance light isotopic distributions and their metabolically labeled counterparts. This algorithm is designed in two stages: first we annotate the isotopic peaks using a modified version of the IDM algorithm described last year; then we use a probabilistic classifier that is supplemented by dynamic programming to find the metabolically labeled matched isotopic pairs. Such a method is needed for high-throughput quantitative proteomic metabolomic experiments measured via mass spectrometry. Results: The primary result of this article is that the dynamic programming approach performs well given perfect isotopic distribution annotations. Our algorithm achieves a true positive rate of 99% and a false positive rate of 1% using perfect isotopic distribution annotations. When the isotopic distributions are annotated given ‘expert’ selected peaks, the same algorithm gets a true positive rate of 77% and a false positive rate of 1%. Finally, when annotating using machine selected peaks, which may contain noise, the dynamic programming algorithm gives a true positive rate of 36% and a false positive rate of 1%. It is important to mention that these rates arise from the requirement of exact annotations of both the light and heavy isotopic distributions. In our evaluations, a match is considered ‘entirely incorrect’ if it is missing even one peak or containing an extraneous peak. If we only require that the ‘monoisotopic’ peaks exist within the two matched distributions, our algorithm obtains a positive rate of 45% and a false positive rate of 1% on the ‘machine’ selected data. Changes to the algorithm's scoring function and training example generation improves our ‘monoisotopic’ peak score true positive rate to 65% while obtaining a false positive rate of 2%. All results were obtained within 10-fold cross-validation of 41 mass spectra with a mass-to-charge range of 800–4000m/z. There are a total of 713 isotopic distributions and 255 matched isotopic pairs that are hand-annotated for this study. Availability: Programs are available via http://www.cs.wisc.edu/~mcilwain/IDM/ Contact:mcilwain@cs.wisc.edu
format Text
id pubmed-2718665
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-27186652009-07-31 Matching isotopic distributions from metabolically labeled samples McIlwain, Sean Page, David Huttlin, Edward L. Sussman, Michael R. Bioinformatics Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto Motivation: In recent years stable isotopic labeling has become a standard approach for quantitative proteomic analyses. Among the many available isotopic labeling strategies, metabolic labeling is attractive for the excellent internal control it provides. However, analysis of data from metabolic labeling experiments can be complicated because the spacing between labeled and unlabeled forms of each peptide depends on its sequence, and is thus variable from analyte to analyte. As a result, one generally needs to know the sequence of a peptide to identify its matching isotopic distributions in an automated fashion. In some experimental situations it would be necessary or desirable to match pairs of labeled and unlabeled peaks from peptides of unknown sequence. This article addresses this largely overlooked problem in the analysis of quantitative mass spectrometry data by presenting an algorithm that not only identifies isotopic distributions within a mass spectrum, but also annotates matches between natural abundance light isotopic distributions and their metabolically labeled counterparts. This algorithm is designed in two stages: first we annotate the isotopic peaks using a modified version of the IDM algorithm described last year; then we use a probabilistic classifier that is supplemented by dynamic programming to find the metabolically labeled matched isotopic pairs. Such a method is needed for high-throughput quantitative proteomic metabolomic experiments measured via mass spectrometry. Results: The primary result of this article is that the dynamic programming approach performs well given perfect isotopic distribution annotations. Our algorithm achieves a true positive rate of 99% and a false positive rate of 1% using perfect isotopic distribution annotations. When the isotopic distributions are annotated given ‘expert’ selected peaks, the same algorithm gets a true positive rate of 77% and a false positive rate of 1%. Finally, when annotating using machine selected peaks, which may contain noise, the dynamic programming algorithm gives a true positive rate of 36% and a false positive rate of 1%. It is important to mention that these rates arise from the requirement of exact annotations of both the light and heavy isotopic distributions. In our evaluations, a match is considered ‘entirely incorrect’ if it is missing even one peak or containing an extraneous peak. If we only require that the ‘monoisotopic’ peaks exist within the two matched distributions, our algorithm obtains a positive rate of 45% and a false positive rate of 1% on the ‘machine’ selected data. Changes to the algorithm's scoring function and training example generation improves our ‘monoisotopic’ peak score true positive rate to 65% while obtaining a false positive rate of 2%. All results were obtained within 10-fold cross-validation of 41 mass spectra with a mass-to-charge range of 800–4000m/z. There are a total of 713 isotopic distributions and 255 matched isotopic pairs that are hand-annotated for this study. Availability: Programs are available via http://www.cs.wisc.edu/~mcilwain/IDM/ Contact:mcilwain@cs.wisc.edu Oxford University Press 2008-07-01 /pmc/articles/PMC2718665/ /pubmed/18586733 http://dx.doi.org/10.1093/bioinformatics/btn190 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto
McIlwain, Sean
Page, David
Huttlin, Edward L.
Sussman, Michael R.
Matching isotopic distributions from metabolically labeled samples
title Matching isotopic distributions from metabolically labeled samples
title_full Matching isotopic distributions from metabolically labeled samples
title_fullStr Matching isotopic distributions from metabolically labeled samples
title_full_unstemmed Matching isotopic distributions from metabolically labeled samples
title_short Matching isotopic distributions from metabolically labeled samples
title_sort matching isotopic distributions from metabolically labeled samples
topic Ismb 2008 Conference Proceedings 19–23 July 2008, Toronto
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2718665/
https://www.ncbi.nlm.nih.gov/pubmed/18586733
http://dx.doi.org/10.1093/bioinformatics/btn190
work_keys_str_mv AT mcilwainsean matchingisotopicdistributionsfrommetabolicallylabeledsamples
AT pagedavid matchingisotopicdistributionsfrommetabolicallylabeledsamples
AT huttlinedwardl matchingisotopicdistributionsfrommetabolicallylabeledsamples
AT sussmanmichaelr matchingisotopicdistributionsfrommetabolicallylabeledsamples