Cargando…

A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations

MOTIVATION: Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in...

Descripción completa

Detalles Bibliográficos
Autores principales: Horsch, Salome, Kopczynski, Dominik, Kuthe, Elias, Baumbach, Jörg Ingo, Rahmann, Sven, Rahnenführer, Jörg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5598980/
https://www.ncbi.nlm.nih.gov/pubmed/28910313
http://dx.doi.org/10.1371/journal.pone.0184321
_version_ 1783264013404078080
author Horsch, Salome
Kopczynski, Dominik
Kuthe, Elias
Baumbach, Jörg Ingo
Rahmann, Sven
Rahnenführer, Jörg
author_facet Horsch, Salome
Kopczynski, Dominik
Kuthe, Elias
Baumbach, Jörg Ingo
Rahmann, Sven
Rahnenführer, Jörg
author_sort Horsch, Salome
collection PubMed
description MOTIVATION: Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. METHOD: We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. RESULTS: The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology.
format Online
Article
Text
id pubmed-5598980
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55989802017-09-22 A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations Horsch, Salome Kopczynski, Dominik Kuthe, Elias Baumbach, Jörg Ingo Rahmann, Sven Rahnenführer, Jörg PLoS One Research Article MOTIVATION: Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. METHOD: We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. RESULTS: The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology. Public Library of Science 2017-09-14 /pmc/articles/PMC5598980/ /pubmed/28910313 http://dx.doi.org/10.1371/journal.pone.0184321 Text en © 2017 Horsch et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Horsch, Salome
Kopczynski, Dominik
Kuthe, Elias
Baumbach, Jörg Ingo
Rahmann, Sven
Rahnenführer, Jörg
A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations
title A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations
title_full A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations
title_fullStr A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations
title_full_unstemmed A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations
title_short A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations
title_sort detailed comparison of analysis processes for mcc-ims data in disease classification—automated methods can replace manual peak annotations
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5598980/
https://www.ncbi.nlm.nih.gov/pubmed/28910313
http://dx.doi.org/10.1371/journal.pone.0184321
work_keys_str_mv AT horschsalome adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT kopczynskidominik adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT kutheelias adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT baumbachjorgingo adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT rahmannsven adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT rahnenfuhrerjorg adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT horschsalome detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT kopczynskidominik detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT kutheelias detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT baumbachjorgingo detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT rahmannsven detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations
AT rahnenfuhrerjorg detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations