Cargando…

DISMS2: A flexible algorithm for direct proteome- wide distance calculation of LC-MS/MS runs

BACKGROUND: The classification of samples on a molecular level has manifold applications, from patient classification regarding cancer treatment to phylogenetics for identifying evolutionary relationships between species. Modern methods employ the alignment of DNA or amino acid sequences, mostly not...

Descripción completa

Detalles Bibliográficos
Autores principales: Rieder, Vera, Blank-Landeshammer, Bernhard, Stuhr, Marleen, Schell, Tilman, Biß, Karsten, Kollipara, Laxmikanth, Meyer, Achim, Pfenninger, Markus, Westphal, Hildegard, Sickmann, Albert, Rahnenführer, Jörg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5335755/
https://www.ncbi.nlm.nih.gov/pubmed/28253837
http://dx.doi.org/10.1186/s12859-017-1514-2
_version_ 1782512099966058496
author Rieder, Vera
Blank-Landeshammer, Bernhard
Stuhr, Marleen
Schell, Tilman
Biß, Karsten
Kollipara, Laxmikanth
Meyer, Achim
Pfenninger, Markus
Westphal, Hildegard
Sickmann, Albert
Rahnenführer, Jörg
author_facet Rieder, Vera
Blank-Landeshammer, Bernhard
Stuhr, Marleen
Schell, Tilman
Biß, Karsten
Kollipara, Laxmikanth
Meyer, Achim
Pfenninger, Markus
Westphal, Hildegard
Sickmann, Albert
Rahnenführer, Jörg
author_sort Rieder, Vera
collection PubMed
description BACKGROUND: The classification of samples on a molecular level has manifold applications, from patient classification regarding cancer treatment to phylogenetics for identifying evolutionary relationships between species. Modern methods employ the alignment of DNA or amino acid sequences, mostly not genome-wide but only on selected parts of the genome. Recently proteomics-based approaches have become popular. An established method for the identification of peptides and proteins is liquid chromatography-tandem mass spectrometry (LC-MS/MS). First, protein sequences from MS/MS spectra are identified by means of database searches, given samples with known genome-wide sequence information, then sequence based methods are applied. Alternatively, de novo peptide sequencing algorithms annotate MS/MS spectra and deduce peptide/protein information without a database. A newer approach independent of additional information is to directly compare unidentified tandem mass spectra. The challenge then is to compute the distance between pairwise MS/MS runs consisting of thousands of spectra. METHODS: We present DISMS2, a new algorithm to calculate proteome-wide distances directly from MS/MS data, extending the algorithm compareMS2, an approach that also uses a spectral comparison pipeline. RESULTS: Our new more flexible algorithm, DISMS2, allows for the choice of the spectrum distance measure and includes different spectra preprocessing and filtering steps that can be tailored to specific situations by parameter optimization. CONCLUSIONS: DISMS2 performs well for samples from species with and without database annotation and thus has clear advantages over methods that are purely based on database search. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1514-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5335755
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53357552017-03-07 DISMS2: A flexible algorithm for direct proteome- wide distance calculation of LC-MS/MS runs Rieder, Vera Blank-Landeshammer, Bernhard Stuhr, Marleen Schell, Tilman Biß, Karsten Kollipara, Laxmikanth Meyer, Achim Pfenninger, Markus Westphal, Hildegard Sickmann, Albert Rahnenführer, Jörg BMC Bioinformatics Research Article BACKGROUND: The classification of samples on a molecular level has manifold applications, from patient classification regarding cancer treatment to phylogenetics for identifying evolutionary relationships between species. Modern methods employ the alignment of DNA or amino acid sequences, mostly not genome-wide but only on selected parts of the genome. Recently proteomics-based approaches have become popular. An established method for the identification of peptides and proteins is liquid chromatography-tandem mass spectrometry (LC-MS/MS). First, protein sequences from MS/MS spectra are identified by means of database searches, given samples with known genome-wide sequence information, then sequence based methods are applied. Alternatively, de novo peptide sequencing algorithms annotate MS/MS spectra and deduce peptide/protein information without a database. A newer approach independent of additional information is to directly compare unidentified tandem mass spectra. The challenge then is to compute the distance between pairwise MS/MS runs consisting of thousands of spectra. METHODS: We present DISMS2, a new algorithm to calculate proteome-wide distances directly from MS/MS data, extending the algorithm compareMS2, an approach that also uses a spectral comparison pipeline. RESULTS: Our new more flexible algorithm, DISMS2, allows for the choice of the spectrum distance measure and includes different spectra preprocessing and filtering steps that can be tailored to specific situations by parameter optimization. CONCLUSIONS: DISMS2 performs well for samples from species with and without database annotation and thus has clear advantages over methods that are purely based on database search. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1514-2) contains supplementary material, which is available to authorized users. BioMed Central 2017-03-03 /pmc/articles/PMC5335755/ /pubmed/28253837 http://dx.doi.org/10.1186/s12859-017-1514-2 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Rieder, Vera
Blank-Landeshammer, Bernhard
Stuhr, Marleen
Schell, Tilman
Biß, Karsten
Kollipara, Laxmikanth
Meyer, Achim
Pfenninger, Markus
Westphal, Hildegard
Sickmann, Albert
Rahnenführer, Jörg
DISMS2: A flexible algorithm for direct proteome- wide distance calculation of LC-MS/MS runs
title DISMS2: A flexible algorithm for direct proteome- wide distance calculation of LC-MS/MS runs
title_full DISMS2: A flexible algorithm for direct proteome- wide distance calculation of LC-MS/MS runs
title_fullStr DISMS2: A flexible algorithm for direct proteome- wide distance calculation of LC-MS/MS runs
title_full_unstemmed DISMS2: A flexible algorithm for direct proteome- wide distance calculation of LC-MS/MS runs
title_short DISMS2: A flexible algorithm for direct proteome- wide distance calculation of LC-MS/MS runs
title_sort disms2: a flexible algorithm for direct proteome- wide distance calculation of lc-ms/ms runs
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5335755/
https://www.ncbi.nlm.nih.gov/pubmed/28253837
http://dx.doi.org/10.1186/s12859-017-1514-2
work_keys_str_mv AT riedervera disms2aflexiblealgorithmfordirectproteomewidedistancecalculationoflcmsmsruns
AT blanklandeshammerbernhard disms2aflexiblealgorithmfordirectproteomewidedistancecalculationoflcmsmsruns
AT stuhrmarleen disms2aflexiblealgorithmfordirectproteomewidedistancecalculationoflcmsmsruns
AT schelltilman disms2aflexiblealgorithmfordirectproteomewidedistancecalculationoflcmsmsruns
AT bißkarsten disms2aflexiblealgorithmfordirectproteomewidedistancecalculationoflcmsmsruns
AT kolliparalaxmikanth disms2aflexiblealgorithmfordirectproteomewidedistancecalculationoflcmsmsruns
AT meyerachim disms2aflexiblealgorithmfordirectproteomewidedistancecalculationoflcmsmsruns
AT pfenningermarkus disms2aflexiblealgorithmfordirectproteomewidedistancecalculationoflcmsmsruns
AT westphalhildegard disms2aflexiblealgorithmfordirectproteomewidedistancecalculationoflcmsmsruns
AT sickmannalbert disms2aflexiblealgorithmfordirectproteomewidedistancecalculationoflcmsmsruns
AT rahnenfuhrerjorg disms2aflexiblealgorithmfordirectproteomewidedistancecalculationoflcmsmsruns