Cargando…

CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis

BACKGROUND: The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected numb...

Descripción completa

Detalles Bibliográficos
Autores principales: Permiakova, Olga, Guibert, Romain, Kraut, Alexandra, Fortin, Thomas, Hesse, Anne-Marie, Burger, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7881590/
https://www.ncbi.nlm.nih.gov/pubmed/33579189
http://dx.doi.org/10.1186/s12859-021-03969-0
_version_ 1783650908741042176
author Permiakova, Olga
Guibert, Romain
Kraut, Alexandra
Fortin, Thomas
Hesse, Anne-Marie
Burger, Thomas
author_facet Permiakova, Olga
Guibert, Romain
Kraut, Alexandra
Fortin, Thomas
Hesse, Anne-Marie
Burger, Thomas
author_sort Permiakova, Olga
collection PubMed
description BACKGROUND: The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by classical machine learning algorithms, so that most of the state-of-the-art relies on single pass linkage-based algorithms. RESULTS: We propose a clustering algorithm that solves the powerful but computationally demanding kernel k-means objective function in a scalable way. As a result, it can process LC-MS data in an acceptable time on a multicore machine. To do so, we combine three essential features: a compressive data representation, Nyström approximation and a hierarchical strategy. In addition, we propose new kernels based on optimal transport, which interprets as intuitive similarity measures between chromatographic elution profiles. CONCLUSIONS: Our method, referred to as CHICKN, is evaluated on proteomics data produced in our lab, as well as on benchmark data coming from the literature. From a computational viewpoint, it is particularly efficient on raw LC-MS data. From a data analysis viewpoint, it provides clusters which differ from those resulting from state-of-the-art methods, while achieving similar performances. This highlights the complementarity of differently principle algorithms to extract the best from complex LC-MS data.
format Online
Article
Text
id pubmed-7881590
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-78815902021-02-17 CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis Permiakova, Olga Guibert, Romain Kraut, Alexandra Fortin, Thomas Hesse, Anne-Marie Burger, Thomas BMC Bioinformatics Methodology Article BACKGROUND: The clustering of data produced by liquid chromatography coupled to mass spectrometry analyses (LC-MS data) has recently gained interest to extract meaningful chemical or biological patterns. However, recent instrumental pipelines deliver data which size, dimensionality and expected number of clusters are too large to be processed by classical machine learning algorithms, so that most of the state-of-the-art relies on single pass linkage-based algorithms. RESULTS: We propose a clustering algorithm that solves the powerful but computationally demanding kernel k-means objective function in a scalable way. As a result, it can process LC-MS data in an acceptable time on a multicore machine. To do so, we combine three essential features: a compressive data representation, Nyström approximation and a hierarchical strategy. In addition, we propose new kernels based on optimal transport, which interprets as intuitive similarity measures between chromatographic elution profiles. CONCLUSIONS: Our method, referred to as CHICKN, is evaluated on proteomics data produced in our lab, as well as on benchmark data coming from the literature. From a computational viewpoint, it is particularly efficient on raw LC-MS data. From a data analysis viewpoint, it provides clusters which differ from those resulting from state-of-the-art methods, while achieving similar performances. This highlights the complementarity of differently principle algorithms to extract the best from complex LC-MS data. BioMed Central 2021-02-12 /pmc/articles/PMC7881590/ /pubmed/33579189 http://dx.doi.org/10.1186/s12859-021-03969-0 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Permiakova, Olga
Guibert, Romain
Kraut, Alexandra
Fortin, Thomas
Hesse, Anne-Marie
Burger, Thomas
CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
title CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
title_full CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
title_fullStr CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
title_full_unstemmed CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
title_short CHICKN: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of Wasserstein compressive hierarchical cluster analysis
title_sort chickn: extraction of peptide chromatographic elution profiles from large scale mass spectrometry data by means of wasserstein compressive hierarchical cluster analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7881590/
https://www.ncbi.nlm.nih.gov/pubmed/33579189
http://dx.doi.org/10.1186/s12859-021-03969-0
work_keys_str_mv AT permiakovaolga chicknextractionofpeptidechromatographicelutionprofilesfromlargescalemassspectrometrydatabymeansofwassersteincompressivehierarchicalclusteranalysis
AT guibertromain chicknextractionofpeptidechromatographicelutionprofilesfromlargescalemassspectrometrydatabymeansofwassersteincompressivehierarchicalclusteranalysis
AT krautalexandra chicknextractionofpeptidechromatographicelutionprofilesfromlargescalemassspectrometrydatabymeansofwassersteincompressivehierarchicalclusteranalysis
AT fortinthomas chicknextractionofpeptidechromatographicelutionprofilesfromlargescalemassspectrometrydatabymeansofwassersteincompressivehierarchicalclusteranalysis
AT hesseannemarie chicknextractionofpeptidechromatographicelutionprofilesfromlargescalemassspectrometrydatabymeansofwassersteincompressivehierarchicalclusteranalysis
AT burgerthomas chicknextractionofpeptidechromatographicelutionprofilesfromlargescalemassspectrometrydatabymeansofwassersteincompressivehierarchicalclusteranalysis