Cargando…

Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical “-omics” Data with Multiple Explanatory Variables

[Image: see text] Biological volatilome analysis is inherently complex due to the considerable number of compounds (i.e., dimensions) and differences in peak areas by orders of magnitude, between and within compounds found within datasets. Traditional volatilome analysis relies on dimensionality red...

Descripción completa

Detalles Bibliográficos
Autores principales: Brown, Amber O., Green, Peter J., Frankham, Greta J., Stuart, Barbara H., Ueland, Maiken
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10286096/
https://www.ncbi.nlm.nih.gov/pubmed/37360494
http://dx.doi.org/10.1021/acsomega.3c01613
_version_ 1785061681883250688
author Brown, Amber O.
Green, Peter J.
Frankham, Greta J.
Stuart, Barbara H.
Ueland, Maiken
author_facet Brown, Amber O.
Green, Peter J.
Frankham, Greta J.
Stuart, Barbara H.
Ueland, Maiken
author_sort Brown, Amber O.
collection PubMed
description [Image: see text] Biological volatilome analysis is inherently complex due to the considerable number of compounds (i.e., dimensions) and differences in peak areas by orders of magnitude, between and within compounds found within datasets. Traditional volatilome analysis relies on dimensionality reduction techniques which aid in the selection of compounds that are considered relevant to respective research questions prior to further analysis. Currently, compounds of interest are identified using either supervised or unsupervised statistical methods which assume the data residuals are normally distributed and exhibit linearity. However, biological data often violate the statistical assumptions of these models related to normality and the presence of multiple explanatory variables which are innate to biological samples. In an attempt to address deviations from normality, volatilome data can be log transformed. However, whether the effects of each assessed variable are additive or multiplicative should be considered prior to transformation, as this will impact the effect of each variable on the data. If assumptions of normality and variable effects are not investigated prior to dimensionality reduction, ineffective or erroneous compound dimensionality reduction can impact downstream analyses. It is the aim of this manuscript to assess the impact of single and multivariable statistical models with and without the log transformation to volatilome dimensionality reduction prior to any supervised or unsupervised classification analysis. As a proof of concept, Shingleback lizard (Tiliqua rugosa) volatilomes were collected across their species distribution and from captivity and were assessed. Shingleback volatilomes are suspected to be influenced by multiple explanatory variables related to habitat (Bioregion), sex, parasite presence, total body volume, and captive status. This work determined that the exclusion of relevant multiple explanatory variables from analysis overestimates the effect of Bioregion and the identification of significant compounds. The log transformation increased the number of compounds that were identified as significant, as did analyses that assumed that residuals were normally distributed. Among the methods considered in this work, the most conservative form of dimensionality reduction was achieved through analyzing untransformed data using Monte Carlo tests with multiple explanatory variables.
format Online
Article
Text
id pubmed-10286096
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-102860962023-06-23 Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical “-omics” Data with Multiple Explanatory Variables Brown, Amber O. Green, Peter J. Frankham, Greta J. Stuart, Barbara H. Ueland, Maiken ACS Omega [Image: see text] Biological volatilome analysis is inherently complex due to the considerable number of compounds (i.e., dimensions) and differences in peak areas by orders of magnitude, between and within compounds found within datasets. Traditional volatilome analysis relies on dimensionality reduction techniques which aid in the selection of compounds that are considered relevant to respective research questions prior to further analysis. Currently, compounds of interest are identified using either supervised or unsupervised statistical methods which assume the data residuals are normally distributed and exhibit linearity. However, biological data often violate the statistical assumptions of these models related to normality and the presence of multiple explanatory variables which are innate to biological samples. In an attempt to address deviations from normality, volatilome data can be log transformed. However, whether the effects of each assessed variable are additive or multiplicative should be considered prior to transformation, as this will impact the effect of each variable on the data. If assumptions of normality and variable effects are not investigated prior to dimensionality reduction, ineffective or erroneous compound dimensionality reduction can impact downstream analyses. It is the aim of this manuscript to assess the impact of single and multivariable statistical models with and without the log transformation to volatilome dimensionality reduction prior to any supervised or unsupervised classification analysis. As a proof of concept, Shingleback lizard (Tiliqua rugosa) volatilomes were collected across their species distribution and from captivity and were assessed. Shingleback volatilomes are suspected to be influenced by multiple explanatory variables related to habitat (Bioregion), sex, parasite presence, total body volume, and captive status. This work determined that the exclusion of relevant multiple explanatory variables from analysis overestimates the effect of Bioregion and the identification of significant compounds. The log transformation increased the number of compounds that were identified as significant, as did analyses that assumed that residuals were normally distributed. Among the methods considered in this work, the most conservative form of dimensionality reduction was achieved through analyzing untransformed data using Monte Carlo tests with multiple explanatory variables. American Chemical Society 2023-06-09 /pmc/articles/PMC10286096/ /pubmed/37360494 http://dx.doi.org/10.1021/acsomega.3c01613 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Brown, Amber O.
Green, Peter J.
Frankham, Greta J.
Stuart, Barbara H.
Ueland, Maiken
Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical “-omics” Data with Multiple Explanatory Variables
title Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical “-omics” Data with Multiple Explanatory Variables
title_full Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical “-omics” Data with Multiple Explanatory Variables
title_fullStr Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical “-omics” Data with Multiple Explanatory Variables
title_full_unstemmed Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical “-omics” Data with Multiple Explanatory Variables
title_short Insights into the Effects of Violating Statistical Assumptions for Dimensionality Reduction for Chemical “-omics” Data with Multiple Explanatory Variables
title_sort insights into the effects of violating statistical assumptions for dimensionality reduction for chemical “-omics” data with multiple explanatory variables
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10286096/
https://www.ncbi.nlm.nih.gov/pubmed/37360494
http://dx.doi.org/10.1021/acsomega.3c01613
work_keys_str_mv AT brownambero insightsintotheeffectsofviolatingstatisticalassumptionsfordimensionalityreductionforchemicalomicsdatawithmultipleexplanatoryvariables
AT greenpeterj insightsintotheeffectsofviolatingstatisticalassumptionsfordimensionalityreductionforchemicalomicsdatawithmultipleexplanatoryvariables
AT frankhamgretaj insightsintotheeffectsofviolatingstatisticalassumptionsfordimensionalityreductionforchemicalomicsdatawithmultipleexplanatoryvariables
AT stuartbarbarah insightsintotheeffectsofviolatingstatisticalassumptionsfordimensionalityreductionforchemicalomicsdatawithmultipleexplanatoryvariables
AT uelandmaiken insightsintotheeffectsofviolatingstatisticalassumptionsfordimensionalityreductionforchemicalomicsdatawithmultipleexplanatoryvariables