Cargando…

A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses

BACKGROUND: Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the ideal de novo peptide sequencing problem: Given all prefix and suffix masses, determine the string of am...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tschager, Thomas, Rösch, Simon, Gillet, Ludovic, Widmayer, Peter
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464308/ https://www.ncbi.nlm.nih.gov/pubmed/28603547 http://dx.doi.org/10.1186/s13015-017-0104-1

_version_	1783242762627317760
author	Tschager, Thomas Rösch, Simon Gillet, Ludovic Widmayer, Peter
author_facet	Tschager, Thomas Rösch, Simon Gillet, Ludovic Widmayer, Peter
author_sort	Tschager, Thomas
collection	PubMed
description	BACKGROUND: Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the ideal de novo peptide sequencing problem: Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. The (real, noisy) de novo peptide sequencing problem therefore has a noisy input: a few of the prefix and suffix masses of the peptide are missing and a few other masses are given in addition. For this setting, we ask for an amino acid string that explains the given masses as accurately as possible. RESULTS: Past approaches interpreted accuracy by searching for a string that explains as many masses as possible. We feel, however, that it is not only bad to not explain a mass that appears, but also to explain a mass that does not appear. We propose to minimize the symmetric difference between the set of given masses and the set of masses that the string explains. For this new optimization problem, we propose an efficient algorithm that computes both the best and the k best solutions. Proof-of-concept experiments on measurements of synthesized peptides show that our approach leads to better results compared to finding a string that explains as many given masses as possible. CONCLUSIONS: We conclude that considering the symmetric difference as optimization goal can improve the identification rates for de novo peptide sequencing. A preliminary version of this work has been presented at WABI 2016. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13015-017-0104-1) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5464308
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-54643082017-06-09 A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses Tschager, Thomas Rösch, Simon Gillet, Ludovic Widmayer, Peter Algorithms Mol Biol Research BACKGROUND: Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the ideal de novo peptide sequencing problem: Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. The (real, noisy) de novo peptide sequencing problem therefore has a noisy input: a few of the prefix and suffix masses of the peptide are missing and a few other masses are given in addition. For this setting, we ask for an amino acid string that explains the given masses as accurately as possible. RESULTS: Past approaches interpreted accuracy by searching for a string that explains as many masses as possible. We feel, however, that it is not only bad to not explain a mass that appears, but also to explain a mass that does not appear. We propose to minimize the symmetric difference between the set of given masses and the set of masses that the string explains. For this new optimization problem, we propose an efficient algorithm that computes both the best and the k best solutions. Proof-of-concept experiments on measurements of synthesized peptides show that our approach leads to better results compared to finding a string that explains as many given masses as possible. CONCLUSIONS: We conclude that considering the symmetric difference as optimization goal can improve the identification rates for de novo peptide sequencing. A preliminary version of this work has been presented at WABI 2016. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13015-017-0104-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-11 /pmc/articles/PMC5464308/ /pubmed/28603547 http://dx.doi.org/10.1186/s13015-017-0104-1 Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Tschager, Thomas Rösch, Simon Gillet, Ludovic Widmayer, Peter A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses
title	A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses
title_full	A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses
title_fullStr	A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses
title_full_unstemmed	A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses
title_short	A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses
title_sort	better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5464308/ https://www.ncbi.nlm.nih.gov/pubmed/28603547 http://dx.doi.org/10.1186/s13015-017-0104-1
work_keys_str_mv	AT tschagerthomas abetterscoringmodelfordenovopeptidesequencingthesymmetricdifferencebetweenexplainedandmeasuredmasses AT roschsimon abetterscoringmodelfordenovopeptidesequencingthesymmetricdifferencebetweenexplainedandmeasuredmasses AT gilletludovic abetterscoringmodelfordenovopeptidesequencingthesymmetricdifferencebetweenexplainedandmeasuredmasses AT widmayerpeter abetterscoringmodelfordenovopeptidesequencingthesymmetricdifferencebetweenexplainedandmeasuredmasses AT tschagerthomas betterscoringmodelfordenovopeptidesequencingthesymmetricdifferencebetweenexplainedandmeasuredmasses AT roschsimon betterscoringmodelfordenovopeptidesequencingthesymmetricdifferencebetweenexplainedandmeasuredmasses AT gilletludovic betterscoringmodelfordenovopeptidesequencingthesymmetricdifferencebetweenexplainedandmeasuredmasses AT widmayerpeter betterscoringmodelfordenovopeptidesequencingthesymmetricdifferencebetweenexplainedandmeasuredmasses

A better scoring model for de novo peptide sequencing: the symmetric difference between explained and measured masses

Ejemplares similares