Cargando…

Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions

Standard approaches to evaluate the impact of single nucleotide polymorphisms (SNP) on quantitative phenotypes use linear models. However, these normal-based approaches may not optimally model phenotypes which are better represented by Gaussian mixture distributions (e.g., some metabolomics data). W...

Descripción completa

Detalles Bibliográficos
Autores principales: Westra, Jason, Hartman, Nicholas, Lake, Bethany, Shearer, Gregory, Tintle, Nathan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5757879/
https://www.ncbi.nlm.nih.gov/pubmed/29218908
_version_ 1783290904754257920
author Westra, Jason
Hartman, Nicholas
Lake, Bethany
Shearer, Gregory
Tintle, Nathan
author_facet Westra, Jason
Hartman, Nicholas
Lake, Bethany
Shearer, Gregory
Tintle, Nathan
author_sort Westra, Jason
collection PubMed
description Standard approaches to evaluate the impact of single nucleotide polymorphisms (SNP) on quantitative phenotypes use linear models. However, these normal-based approaches may not optimally model phenotypes which are better represented by Gaussian mixture distributions (e.g., some metabolomics data). We develop a likelihood ratio test on the mixing proportions of two-component Gaussian mixture distributions and consider more restrictive models to increase power in light of a priori biological knowledge. Data were simulated to validate the improved power of the likelihood ratio test and the restricted likelihood ratio test over a linear model and a log transformed linear model. Then, using real data from the Framingham Heart Study, we analyzed 20,315 SNPs on chromosome 11, demonstrating that the proposed likelihood ratio test identifies SNPs well known to participate in the desaturation of certain fatty acids. Our study both validates the approach of increasing power by using the likelihood ratio test that leverages Gaussian mixture models, and creates a model with improved sensitivity and interpretability.
format Online
Article
Text
id pubmed-5757879
institution National Center for Biotechnology Information
language English
publishDate 2018
record_format MEDLINE/PubMed
spelling pubmed-57578792018-01-08 Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions Westra, Jason Hartman, Nicholas Lake, Bethany Shearer, Gregory Tintle, Nathan Pac Symp Biocomput Article Standard approaches to evaluate the impact of single nucleotide polymorphisms (SNP) on quantitative phenotypes use linear models. However, these normal-based approaches may not optimally model phenotypes which are better represented by Gaussian mixture distributions (e.g., some metabolomics data). We develop a likelihood ratio test on the mixing proportions of two-component Gaussian mixture distributions and consider more restrictive models to increase power in light of a priori biological knowledge. Data were simulated to validate the improved power of the likelihood ratio test and the restricted likelihood ratio test over a linear model and a log transformed linear model. Then, using real data from the Framingham Heart Study, we analyzed 20,315 SNPs on chromosome 11, demonstrating that the proposed likelihood ratio test identifies SNPs well known to participate in the desaturation of certain fatty acids. Our study both validates the approach of increasing power by using the likelihood ratio test that leverages Gaussian mixture models, and creates a model with improved sensitivity and interpretability. 2018 /pmc/articles/PMC5757879/ /pubmed/29218908 Text en http://creativecommons.org/licenses/by-nc/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License.
spellingShingle Article
Westra, Jason
Hartman, Nicholas
Lake, Bethany
Shearer, Gregory
Tintle, Nathan
Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions
title Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions
title_full Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions
title_fullStr Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions
title_full_unstemmed Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions
title_short Analyzing metabolomics data for association with genotypes using two-component Gaussian mixture distributions
title_sort analyzing metabolomics data for association with genotypes using two-component gaussian mixture distributions
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5757879/
https://www.ncbi.nlm.nih.gov/pubmed/29218908
work_keys_str_mv AT westrajason analyzingmetabolomicsdataforassociationwithgenotypesusingtwocomponentgaussianmixturedistributions
AT hartmannicholas analyzingmetabolomicsdataforassociationwithgenotypesusingtwocomponentgaussianmixturedistributions
AT lakebethany analyzingmetabolomicsdataforassociationwithgenotypesusingtwocomponentgaussianmixturedistributions
AT shearergregory analyzingmetabolomicsdataforassociationwithgenotypesusingtwocomponentgaussianmixturedistributions
AT tintlenathan analyzingmetabolomicsdataforassociationwithgenotypesusingtwocomponentgaussianmixturedistributions