Cargando…

Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets

BACKGROUND: Modern analytical methods in biology and chemistry use separation techniques coupled to sensitive detectors, such as gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS). These hyphenated methods provide high-dimensional data. Comparing such da...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hoffmann, Nils, Keck, Matthias, Neuweger, Heiko, Wilhelm, Mathias, Högy, Petra, Niehaus, Karsten, Stoye, Jens
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3546004/ https://www.ncbi.nlm.nih.gov/pubmed/22920415 http://dx.doi.org/10.1186/1471-2105-13-214

_version_	1782255979890475008
author	Hoffmann, Nils Keck, Matthias Neuweger, Heiko Wilhelm, Mathias Högy, Petra Niehaus, Karsten Stoye, Jens
author_facet	Hoffmann, Nils Keck, Matthias Neuweger, Heiko Wilhelm, Mathias Högy, Petra Niehaus, Karsten Stoye, Jens
author_sort	Hoffmann, Nils
collection	PubMed
description	BACKGROUND: Modern analytical methods in biology and chemistry use separation techniques coupled to sensitive detectors, such as gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS). These hyphenated methods provide high-dimensional data. Comparing such data manually to find corresponding signals is a laborious task, as each experiment usually consists of thousands of individual scans, each containing hundreds or even thousands of distinct signals. In order to allow for successful identification of metabolites or proteins within such data, especially in the context of metabolomics and proteomics, an accurate alignment and matching of corresponding features between two or more experiments is required. Such a matching algorithm should capture fluctuations in the chromatographic system which lead to non-linear distortions on the time axis, as well as systematic changes in recorded intensities. Many different algorithms for the retention time alignment of GC-MS and LC-MS data have been proposed and published, but all of them focus either on aligning previously extracted peak features or on aligning and comparing the complete raw data containing all available features. RESULTS: In this paper we introduce two algorithms for retention time alignment of multiple GC-MS datasets: multiple alignment by bidirectional best hits peak assignment and cluster extension (BIPACE) and center-star multiple alignment by pairwise partitioned dynamic time warping (CeMAPP-DTW). We show how the similarity-based peak group matching method BIPACE may be used for multiple alignment calculation individually and how it can be used as a preprocessing step for the pairwise alignments performed by CeMAPP-DTW. We evaluate the algorithms individually and in combination on a previously published small GC-MS dataset studying the Leishmania parasite and on a larger GC-MS dataset studying grains of wheat (Triticum aestivum). CONCLUSIONS: We have shown that BIPACE achieves very high precision and recall and a very low number of false positive peak assignments on both evaluation datasets. CeMAPP-DTW finds a high number of true positives when executed on its own, but achieves even better results when BIPACE is used to constrain its search space. The source code of both algorithms is included in the OpenSource software framework Maltcms, which is available from http://maltcms.sf.net. The evaluation scripts of the present study are available from the same source.
format	Online Article Text
id	pubmed-3546004
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35460042013-01-17 Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets Hoffmann, Nils Keck, Matthias Neuweger, Heiko Wilhelm, Mathias Högy, Petra Niehaus, Karsten Stoye, Jens BMC Bioinformatics Methodology Article BACKGROUND: Modern analytical methods in biology and chemistry use separation techniques coupled to sensitive detectors, such as gas chromatography-mass spectrometry (GC-MS) and liquid chromatography-mass spectrometry (LC-MS). These hyphenated methods provide high-dimensional data. Comparing such data manually to find corresponding signals is a laborious task, as each experiment usually consists of thousands of individual scans, each containing hundreds or even thousands of distinct signals. In order to allow for successful identification of metabolites or proteins within such data, especially in the context of metabolomics and proteomics, an accurate alignment and matching of corresponding features between two or more experiments is required. Such a matching algorithm should capture fluctuations in the chromatographic system which lead to non-linear distortions on the time axis, as well as systematic changes in recorded intensities. Many different algorithms for the retention time alignment of GC-MS and LC-MS data have been proposed and published, but all of them focus either on aligning previously extracted peak features or on aligning and comparing the complete raw data containing all available features. RESULTS: In this paper we introduce two algorithms for retention time alignment of multiple GC-MS datasets: multiple alignment by bidirectional best hits peak assignment and cluster extension (BIPACE) and center-star multiple alignment by pairwise partitioned dynamic time warping (CeMAPP-DTW). We show how the similarity-based peak group matching method BIPACE may be used for multiple alignment calculation individually and how it can be used as a preprocessing step for the pairwise alignments performed by CeMAPP-DTW. We evaluate the algorithms individually and in combination on a previously published small GC-MS dataset studying the Leishmania parasite and on a larger GC-MS dataset studying grains of wheat (Triticum aestivum). CONCLUSIONS: We have shown that BIPACE achieves very high precision and recall and a very low number of false positive peak assignments on both evaluation datasets. CeMAPP-DTW finds a high number of true positives when executed on its own, but achieves even better results when BIPACE is used to constrain its search space. The source code of both algorithms is included in the OpenSource software framework Maltcms, which is available from http://maltcms.sf.net. The evaluation scripts of the present study are available from the same source. BioMed Central 2012-08-27 /pmc/articles/PMC3546004/ /pubmed/22920415 http://dx.doi.org/10.1186/1471-2105-13-214 Text en Copyright ©2012 Hoffmann et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Hoffmann, Nils Keck, Matthias Neuweger, Heiko Wilhelm, Mathias Högy, Petra Niehaus, Karsten Stoye, Jens Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets
title	Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets
title_full	Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets
title_fullStr	Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets
title_full_unstemmed	Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets
title_short	Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets
title_sort	combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3546004/ https://www.ncbi.nlm.nih.gov/pubmed/22920415 http://dx.doi.org/10.1186/1471-2105-13-214
work_keys_str_mv	AT hoffmannnils combiningpeakandchromatogrambasedretentiontimealignmentalgorithmsformultiplechromatographymassspectrometrydatasets AT keckmatthias combiningpeakandchromatogrambasedretentiontimealignmentalgorithmsformultiplechromatographymassspectrometrydatasets AT neuwegerheiko combiningpeakandchromatogrambasedretentiontimealignmentalgorithmsformultiplechromatographymassspectrometrydatasets AT wilhelmmathias combiningpeakandchromatogrambasedretentiontimealignmentalgorithmsformultiplechromatographymassspectrometrydatasets AT hogypetra combiningpeakandchromatogrambasedretentiontimealignmentalgorithmsformultiplechromatographymassspectrometrydatasets AT niehauskarsten combiningpeakandchromatogrambasedretentiontimealignmentalgorithmsformultiplechromatographymassspectrometrydatasets AT stoyejens combiningpeakandchromatogrambasedretentiontimealignmentalgorithmsformultiplechromatographymassspectrometrydatasets

Combining peak- and chromatogram-based retention time alignment algorithms for multiple chromatography-mass spectrometry datasets

Ejemplares similares