Cargando…
A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations
MOTIVATION: Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5598980/ https://www.ncbi.nlm.nih.gov/pubmed/28910313 http://dx.doi.org/10.1371/journal.pone.0184321 |
_version_ | 1783264013404078080 |
---|---|
author | Horsch, Salome Kopczynski, Dominik Kuthe, Elias Baumbach, Jörg Ingo Rahmann, Sven Rahnenführer, Jörg |
author_facet | Horsch, Salome Kopczynski, Dominik Kuthe, Elias Baumbach, Jörg Ingo Rahmann, Sven Rahnenführer, Jörg |
author_sort | Horsch, Salome |
collection | PubMed |
description | MOTIVATION: Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. METHOD: We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. RESULTS: The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology. |
format | Online Article Text |
id | pubmed-5598980 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-55989802017-09-22 A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations Horsch, Salome Kopczynski, Dominik Kuthe, Elias Baumbach, Jörg Ingo Rahmann, Sven Rahnenführer, Jörg PLoS One Research Article MOTIVATION: Disease classification from molecular measurements typically requires an analysis pipeline from raw noisy measurements to final classification results. Multi capillary column—ion mobility spectrometry (MCC-IMS) is a promising technology for the detection of volatile organic compounds in the air of exhaled breath. From raw measurements, the peak regions representing the compounds have to be identified, quantified, and clustered across different experiments. Currently, several steps of this analysis process require manual intervention of human experts. Our goal is to identify a fully automatic pipeline that yields competitive disease classification results compared to an established but subjective and tedious semi-manual process. METHOD: We combine a large number of modern methods for peak detection, peak clustering, and multivariate classification into analysis pipelines for raw MCC-IMS data. We evaluate all combinations on three different real datasets in an unbiased cross-validation setting. We determine which specific algorithmic combinations lead to high AUC values in disease classifications across the different medical application scenarios. RESULTS: The best fully automated analysis process achieves even better classification results than the established manual process. The best algorithms for the three analysis steps are (i) SGLTR (Savitzky-Golay Laplace-operator filter thresholding regions) and LM (Local Maxima) for automated peak identification, (ii) EM clustering (Expectation Maximization) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) for the clustering step and (iii) RF (Random Forest) for multivariate classification. Thus, automated methods can replace the manual steps in the analysis process to enable an unbiased high throughput use of the technology. Public Library of Science 2017-09-14 /pmc/articles/PMC5598980/ /pubmed/28910313 http://dx.doi.org/10.1371/journal.pone.0184321 Text en © 2017 Horsch et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Horsch, Salome Kopczynski, Dominik Kuthe, Elias Baumbach, Jörg Ingo Rahmann, Sven Rahnenführer, Jörg A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations |
title | A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations |
title_full | A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations |
title_fullStr | A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations |
title_full_unstemmed | A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations |
title_short | A detailed comparison of analysis processes for MCC-IMS data in disease classification—Automated methods can replace manual peak annotations |
title_sort | detailed comparison of analysis processes for mcc-ims data in disease classification—automated methods can replace manual peak annotations |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5598980/ https://www.ncbi.nlm.nih.gov/pubmed/28910313 http://dx.doi.org/10.1371/journal.pone.0184321 |
work_keys_str_mv | AT horschsalome adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations AT kopczynskidominik adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations AT kutheelias adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations AT baumbachjorgingo adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations AT rahmannsven adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations AT rahnenfuhrerjorg adetailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations AT horschsalome detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations AT kopczynskidominik detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations AT kutheelias detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations AT baumbachjorgingo detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations AT rahmannsven detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations AT rahnenfuhrerjorg detailedcomparisonofanalysisprocessesformccimsdataindiseaseclassificationautomatedmethodscanreplacemanualpeakannotations |