Cargando…

A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data

MOTIVATION: Recent advances in high dimensional phenotyping bring time as an extra dimension into the phenotypes. This promotes the quantitative trait locus (QTL) studies of function-valued traits such as those related to growth and development. Existing approaches for analyzing functional traits ut...

Descripción completa

Detalles Bibliográficos
Autores principales: Vanhatalo, Jarno, Li, Zitong, Sillanpää, Mikko J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6761969/
https://www.ncbi.nlm.nih.gov/pubmed/30850830
http://dx.doi.org/10.1093/bioinformatics/btz164
_version_ 1783454132574617600
author Vanhatalo, Jarno
Li, Zitong
Sillanpää, Mikko J
author_facet Vanhatalo, Jarno
Li, Zitong
Sillanpää, Mikko J
author_sort Vanhatalo, Jarno
collection PubMed
description MOTIVATION: Recent advances in high dimensional phenotyping bring time as an extra dimension into the phenotypes. This promotes the quantitative trait locus (QTL) studies of function-valued traits such as those related to growth and development. Existing approaches for analyzing functional traits utilize either parametric methods or semi-parametric approaches based on splines and wavelets. However, very limited choices of software tools are currently available for practical implementation of functional QTL mapping and variable selection. RESULTS: We propose a Bayesian Gaussian process (GP) approach for functional QTL mapping. We use GPs to model the continuously varying coefficients which describe how the effects of molecular markers on the quantitative trait are changing over time. We use an efficient gradient based algorithm to estimate the tuning parameters of GPs. Notably, the GP approach is directly applicable to the incomplete datasets having even larger than 50% missing data rate (among phenotypes). We further develop a stepwise algorithm to search through the model space in terms of genetic variants, and use a minimal increase of Bayesian posterior probability as a stopping rule to focus on only a small set of putative QTL. We also discuss the connection between GP and penalized B-splines and wavelets. On two simulated and three real datasets, our GP approach demonstrates great flexibility for modeling different types of phenotypic trajectories with low computational cost. The proposed model selection approach finds the most likely QTL reliably in tested datasets. AVAILABILITY AND IMPLEMENTATION: Software and simulated data are available as a MATLAB package ‘GPQTLmapping’, and they can be downloaded from GitHub (https://github.com/jpvanhat/GPQTLmapping). Real datasets used in case studies are publicly available at QTL Archive. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6761969
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-67619692019-10-02 A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data Vanhatalo, Jarno Li, Zitong Sillanpää, Mikko J Bioinformatics Original Papers MOTIVATION: Recent advances in high dimensional phenotyping bring time as an extra dimension into the phenotypes. This promotes the quantitative trait locus (QTL) studies of function-valued traits such as those related to growth and development. Existing approaches for analyzing functional traits utilize either parametric methods or semi-parametric approaches based on splines and wavelets. However, very limited choices of software tools are currently available for practical implementation of functional QTL mapping and variable selection. RESULTS: We propose a Bayesian Gaussian process (GP) approach for functional QTL mapping. We use GPs to model the continuously varying coefficients which describe how the effects of molecular markers on the quantitative trait are changing over time. We use an efficient gradient based algorithm to estimate the tuning parameters of GPs. Notably, the GP approach is directly applicable to the incomplete datasets having even larger than 50% missing data rate (among phenotypes). We further develop a stepwise algorithm to search through the model space in terms of genetic variants, and use a minimal increase of Bayesian posterior probability as a stopping rule to focus on only a small set of putative QTL. We also discuss the connection between GP and penalized B-splines and wavelets. On two simulated and three real datasets, our GP approach demonstrates great flexibility for modeling different types of phenotypic trajectories with low computational cost. The proposed model selection approach finds the most likely QTL reliably in tested datasets. AVAILABILITY AND IMPLEMENTATION: Software and simulated data are available as a MATLAB package ‘GPQTLmapping’, and they can be downloaded from GitHub (https://github.com/jpvanhat/GPQTLmapping). Real datasets used in case studies are publicly available at QTL Archive. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-10-01 2019-03-08 /pmc/articles/PMC6761969/ /pubmed/30850830 http://dx.doi.org/10.1093/bioinformatics/btz164 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Vanhatalo, Jarno
Li, Zitong
Sillanpää, Mikko J
A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data
title A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data
title_full A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data
title_fullStr A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data
title_full_unstemmed A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data
title_short A Gaussian process model and Bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data
title_sort gaussian process model and bayesian variable selection for mapping function-valued quantitative traits with incomplete phenotypic data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6761969/
https://www.ncbi.nlm.nih.gov/pubmed/30850830
http://dx.doi.org/10.1093/bioinformatics/btz164
work_keys_str_mv AT vanhatalojarno agaussianprocessmodelandbayesianvariableselectionformappingfunctionvaluedquantitativetraitswithincompletephenotypicdata
AT lizitong agaussianprocessmodelandbayesianvariableselectionformappingfunctionvaluedquantitativetraitswithincompletephenotypicdata
AT sillanpaamikkoj agaussianprocessmodelandbayesianvariableselectionformappingfunctionvaluedquantitativetraitswithincompletephenotypicdata
AT vanhatalojarno gaussianprocessmodelandbayesianvariableselectionformappingfunctionvaluedquantitativetraitswithincompletephenotypicdata
AT lizitong gaussianprocessmodelandbayesianvariableselectionformappingfunctionvaluedquantitativetraitswithincompletephenotypicdata
AT sillanpaamikkoj gaussianprocessmodelandbayesianvariableselectionformappingfunctionvaluedquantitativetraitswithincompletephenotypicdata