Cargando…

Data-driven encoding for quantitative genetic trait prediction

MOTIVATION: Given a set of biallelic molecular markers, such as SNPs, with genotype values on a collection of plant, animal or human samples, the goal of quantitative genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Quantitative gene...

Descripción completa

Detalles Bibliográficos
Autores principales:	He, Dan, Wang, Zhanyong, Parida, Laxmi
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4571493/ https://www.ncbi.nlm.nih.gov/pubmed/25707435 http://dx.doi.org/10.1186/1471-2105-16-S1-S10

_version_	1782390342190891008
author	He, Dan Wang, Zhanyong Parida, Laxmi
author_facet	He, Dan Wang, Zhanyong Parida, Laxmi
author_sort	He, Dan
collection	PubMed
description	MOTIVATION: Given a set of biallelic molecular markers, such as SNPs, with genotype values on a collection of plant, animal or human samples, the goal of quantitative genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Quantitative genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes: the three distinct genotype values, corresponding to one heterozygous and two homozygous alleles, are usually coded as integers, and manipulated algebraically in the model. Further, epistasis between multiple markers is modeled as multiplication between the markers: it is unclear that the regression model continues to be effective under this. In this work we investigate the effects of encodings to the quantitative genetic trait prediction problem. RESULTS: We first showed that different encodings lead to different prediction accuracies, in many test cases. We then proposed a data-driven encoding strategy, where we encode the genotypes according to their distribution in the phenotypes and we allow each marker to have different encodings. We show in our experiments that this encoding strategy is able to improve the performance of the genetic trait prediction method and it is more helpful for the oligogenic traits, whose values rely on a relatively small set of markers. To the best of our knowledge, this is the first paper that discusses the effects of encodings to the genetic trait prediction problem.
format	Online Article Text
id	pubmed-4571493
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-45714932015-09-22 Data-driven encoding for quantitative genetic trait prediction He, Dan Wang, Zhanyong Parida, Laxmi BMC Bioinformatics Proceedings MOTIVATION: Given a set of biallelic molecular markers, such as SNPs, with genotype values on a collection of plant, animal or human samples, the goal of quantitative genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Quantitative genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes: the three distinct genotype values, corresponding to one heterozygous and two homozygous alleles, are usually coded as integers, and manipulated algebraically in the model. Further, epistasis between multiple markers is modeled as multiplication between the markers: it is unclear that the regression model continues to be effective under this. In this work we investigate the effects of encodings to the quantitative genetic trait prediction problem. RESULTS: We first showed that different encodings lead to different prediction accuracies, in many test cases. We then proposed a data-driven encoding strategy, where we encode the genotypes according to their distribution in the phenotypes and we allow each marker to have different encodings. We show in our experiments that this encoding strategy is able to improve the performance of the genetic trait prediction method and it is more helpful for the oligogenic traits, whose values rely on a relatively small set of markers. To the best of our knowledge, this is the first paper that discusses the effects of encodings to the genetic trait prediction problem. BioMed Central 2015-02-18 /pmc/articles/PMC4571493/ /pubmed/25707435 http://dx.doi.org/10.1186/1471-2105-16-S1-S10 Text en Copyright © 2015 He et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings He, Dan Wang, Zhanyong Parida, Laxmi Data-driven encoding for quantitative genetic trait prediction
title	Data-driven encoding for quantitative genetic trait prediction
title_full	Data-driven encoding for quantitative genetic trait prediction
title_fullStr	Data-driven encoding for quantitative genetic trait prediction
title_full_unstemmed	Data-driven encoding for quantitative genetic trait prediction
title_short	Data-driven encoding for quantitative genetic trait prediction
title_sort	data-driven encoding for quantitative genetic trait prediction
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4571493/ https://www.ncbi.nlm.nih.gov/pubmed/25707435 http://dx.doi.org/10.1186/1471-2105-16-S1-S10
work_keys_str_mv	AT hedan datadrivenencodingforquantitativegenetictraitprediction AT wangzhanyong datadrivenencodingforquantitativegenetictraitprediction AT paridalaxmi datadrivenencodingforquantitativegenetictraitprediction

Data-driven encoding for quantitative genetic trait prediction

Ejemplares similares