Cargando…

Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization

A detailed characterization of the chemical composition of complex substances, such as products of petroleum refining and environmental mixtures, is greatly needed in exposure assessment and manufacturing. The inherent complexity and variability in the composition of complex substances obfuscate the...

Descripción completa

Detalles Bibliográficos
Autores principales: Onel, Melis, Beykal, Burcu, Ferguson, Kyle, Chiu, Weihsueh A., McDonald, Thomas J., Zhou, Lan, House, John S., Wright, Fred A., Sheen, David A., Rusyn, Ivan, Pistikopoulos, Efstratios N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6786635/
https://www.ncbi.nlm.nih.gov/pubmed/31600275
http://dx.doi.org/10.1371/journal.pone.0223517
_version_ 1783458104955895808
author Onel, Melis
Beykal, Burcu
Ferguson, Kyle
Chiu, Weihsueh A.
McDonald, Thomas J.
Zhou, Lan
House, John S.
Wright, Fred A.
Sheen, David A.
Rusyn, Ivan
Pistikopoulos, Efstratios N.
author_facet Onel, Melis
Beykal, Burcu
Ferguson, Kyle
Chiu, Weihsueh A.
McDonald, Thomas J.
Zhou, Lan
House, John S.
Wright, Fred A.
Sheen, David A.
Rusyn, Ivan
Pistikopoulos, Efstratios N.
author_sort Onel, Melis
collection PubMed
description A detailed characterization of the chemical composition of complex substances, such as products of petroleum refining and environmental mixtures, is greatly needed in exposure assessment and manufacturing. The inherent complexity and variability in the composition of complex substances obfuscate the choices for their detailed analytical characterization. Yet, in lieu of exact chemical composition of complex substances, evaluation of the degree of similarity is a sensible path toward decision-making in environmental health regulations. Grouping of similar complex substances is a challenge that can be addressed via advanced analytical methods and streamlined data analysis and visualization techniques. Here, we propose a framework with unsupervised and supervised analyses to optimally group complex substances based on their analytical features. We test two data sets of complex oil-derived substances. The first data set is from gas chromatography-mass spectrometry (GC-MS) analysis of 20 Standard Reference Materials representing crude oils and oil refining products. The second data set consists of 15 samples of various gas oils analyzed using three analytical techniques: GC-MS, GC×GC-flame ionization detection (FID), and ion mobility spectrometry-mass spectrometry (IM-MS). We use hierarchical clustering using Pearson correlation as a similarity metric for the unsupervised analysis and build classification models using the Random Forest algorithm for the supervised analysis. We present a quantitative comparative assessment of clustering results via Fowlkes–Mallows index, and classification results via model accuracies in predicting the group of an unknown complex substance. We demonstrate the effect of (i) different grouping methodologies, (ii) data set size, and (iii) dimensionality reduction on the grouping quality, and (iv) different analytical techniques on the characterization of the complex substances. While the complexity and variability in chemical composition are an inherent feature of complex substances, we demonstrate how the choices of the data analysis and visualization methods can impact the communication of their characteristics to delineate sufficient similarity.
format Online
Article
Text
id pubmed-6786635
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67866352019-10-19 Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization Onel, Melis Beykal, Burcu Ferguson, Kyle Chiu, Weihsueh A. McDonald, Thomas J. Zhou, Lan House, John S. Wright, Fred A. Sheen, David A. Rusyn, Ivan Pistikopoulos, Efstratios N. PLoS One Research Article A detailed characterization of the chemical composition of complex substances, such as products of petroleum refining and environmental mixtures, is greatly needed in exposure assessment and manufacturing. The inherent complexity and variability in the composition of complex substances obfuscate the choices for their detailed analytical characterization. Yet, in lieu of exact chemical composition of complex substances, evaluation of the degree of similarity is a sensible path toward decision-making in environmental health regulations. Grouping of similar complex substances is a challenge that can be addressed via advanced analytical methods and streamlined data analysis and visualization techniques. Here, we propose a framework with unsupervised and supervised analyses to optimally group complex substances based on their analytical features. We test two data sets of complex oil-derived substances. The first data set is from gas chromatography-mass spectrometry (GC-MS) analysis of 20 Standard Reference Materials representing crude oils and oil refining products. The second data set consists of 15 samples of various gas oils analyzed using three analytical techniques: GC-MS, GC×GC-flame ionization detection (FID), and ion mobility spectrometry-mass spectrometry (IM-MS). We use hierarchical clustering using Pearson correlation as a similarity metric for the unsupervised analysis and build classification models using the Random Forest algorithm for the supervised analysis. We present a quantitative comparative assessment of clustering results via Fowlkes–Mallows index, and classification results via model accuracies in predicting the group of an unknown complex substance. We demonstrate the effect of (i) different grouping methodologies, (ii) data set size, and (iii) dimensionality reduction on the grouping quality, and (iv) different analytical techniques on the characterization of the complex substances. While the complexity and variability in chemical composition are an inherent feature of complex substances, we demonstrate how the choices of the data analysis and visualization methods can impact the communication of their characteristics to delineate sufficient similarity. Public Library of Science 2019-10-10 /pmc/articles/PMC6786635/ /pubmed/31600275 http://dx.doi.org/10.1371/journal.pone.0223517 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Onel, Melis
Beykal, Burcu
Ferguson, Kyle
Chiu, Weihsueh A.
McDonald, Thomas J.
Zhou, Lan
House, John S.
Wright, Fred A.
Sheen, David A.
Rusyn, Ivan
Pistikopoulos, Efstratios N.
Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization
title Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization
title_full Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization
title_fullStr Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization
title_full_unstemmed Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization
title_short Grouping of complex substances using analytical chemistry data: A framework for quantitative evaluation and visualization
title_sort grouping of complex substances using analytical chemistry data: a framework for quantitative evaluation and visualization
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6786635/
https://www.ncbi.nlm.nih.gov/pubmed/31600275
http://dx.doi.org/10.1371/journal.pone.0223517
work_keys_str_mv AT onelmelis groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT beykalburcu groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT fergusonkyle groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT chiuweihsueha groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT mcdonaldthomasj groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT zhoulan groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT housejohns groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT wrightfreda groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT sheendavida groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT rusynivan groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization
AT pistikopoulosefstratiosn groupingofcomplexsubstancesusinganalyticalchemistrydataaframeworkforquantitativeevaluationandvisualization