Cargando…
Does encoding matter? A novel view on the quantitative genetic trait prediction problem
BACKGROUND: Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic tr...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4959353/ https://www.ncbi.nlm.nih.gov/pubmed/27454886 http://dx.doi.org/10.1186/s12859-016-1127-1 |
_version_ | 1782444388751769600 |
---|---|
author | He, Dan Parida, Laxmi |
author_facet | He, Dan Parida, Laxmi |
author_sort | He, Dan |
collection | PubMed |
description | BACKGROUND: Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes. There are lots of work on the prediction algorithms, but none of the existing work investigated the effects of the encodings on the genetic trait prediction problem. METHODS: In this work, we view the genetic trait prediction problem from a novel angle: a multiple regression on categorical data problem, which requires encoding the categorical data into numerical data. We further proposed two novel encoding methods and we show that they are able to generate numerical features with higher predictive power. RESULTS AND DISCUSSION: Our experiments show that our methods are superior to the other encoding methods for both single marker model and epistasis model. We showed that the quantitative genetic trait prediction problem heavily depends on the encoding of genotypes, for both single marker model and epistasis model. CONCLUSIONS: We conducted a detailed analysis on the performance of the hybrid encodings. To our knowledge, this is the first work that discusses the effects of encodings for genetic trait prediction problem. |
format | Online Article Text |
id | pubmed-4959353 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-49593532016-08-01 Does encoding matter? A novel view on the quantitative genetic trait prediction problem He, Dan Parida, Laxmi BMC Bioinformatics Research BACKGROUND: Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models which require quantitative encodings for the genotypes. There are lots of work on the prediction algorithms, but none of the existing work investigated the effects of the encodings on the genetic trait prediction problem. METHODS: In this work, we view the genetic trait prediction problem from a novel angle: a multiple regression on categorical data problem, which requires encoding the categorical data into numerical data. We further proposed two novel encoding methods and we show that they are able to generate numerical features with higher predictive power. RESULTS AND DISCUSSION: Our experiments show that our methods are superior to the other encoding methods for both single marker model and epistasis model. We showed that the quantitative genetic trait prediction problem heavily depends on the encoding of genotypes, for both single marker model and epistasis model. CONCLUSIONS: We conducted a detailed analysis on the performance of the hybrid encodings. To our knowledge, this is the first work that discusses the effects of encodings for genetic trait prediction problem. BioMed Central 2016-07-19 /pmc/articles/PMC4959353/ /pubmed/27454886 http://dx.doi.org/10.1186/s12859-016-1127-1 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research He, Dan Parida, Laxmi Does encoding matter? A novel view on the quantitative genetic trait prediction problem |
title | Does encoding matter? A novel view on the quantitative genetic trait prediction problem |
title_full | Does encoding matter? A novel view on the quantitative genetic trait prediction problem |
title_fullStr | Does encoding matter? A novel view on the quantitative genetic trait prediction problem |
title_full_unstemmed | Does encoding matter? A novel view on the quantitative genetic trait prediction problem |
title_short | Does encoding matter? A novel view on the quantitative genetic trait prediction problem |
title_sort | does encoding matter? a novel view on the quantitative genetic trait prediction problem |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4959353/ https://www.ncbi.nlm.nih.gov/pubmed/27454886 http://dx.doi.org/10.1186/s12859-016-1127-1 |
work_keys_str_mv | AT hedan doesencodingmatteranovelviewonthequantitativegenetictraitpredictionproblem AT paridalaxmi doesencodingmatteranovelviewonthequantitativegenetictraitpredictionproblem |