Cargando…

Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction

Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait predicti...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Dan, Kuhn, David, Parida, Laxmi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908333/
https://www.ncbi.nlm.nih.gov/pubmed/27307640
http://dx.doi.org/10.1093/bioinformatics/btw249
_version_ 1782437661435232256
author He, Dan
Kuhn, David
Parida, Laxmi
author_facet He, Dan
Kuhn, David
Parida, Laxmi
author_sort He, Dan
collection PubMed
description Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. Availability and implementation: The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. Contact: dhe@us.ibm.com
format Online
Article
Text
id pubmed-4908333
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49083332016-06-17 Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction He, Dan Kuhn, David Parida, Laxmi Bioinformatics Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida Given a set of biallelic molecular markers, such as SNPs, with genotype values encoded numerically on a collection of plant, animal or human samples, the goal of genetic trait prediction is to predict the quantitative trait values by simultaneously modeling all marker effects. Genetic trait prediction is usually represented as linear regression models. In many cases, for the same set of samples and markers, multiple traits are observed. Some of these traits might be correlated with each other. Therefore, modeling all the multiple traits together may improve the prediction accuracy. In this work, we view the multitrait prediction problem from a machine learning angle: as either a multitask learning problem or a multiple output regression problem, depending on whether different traits share the same genotype matrix or not. We then adapted multitask learning algorithms and multiple output regression algorithms to solve the multitrait prediction problem. We proposed a few strategies to improve the least square error of the prediction from these algorithms. Our experiments show that modeling multiple traits together could improve the prediction accuracy for correlated traits. Availability and implementation: The programs we used are either public or directly from the referred authors, such as MALSAR (http://www.public.asu.edu/~jye02/Software/MALSAR/) package. The Avocado data set has not been published yet and is available upon request. Contact: dhe@us.ibm.com Oxford University Press 2016-06-15 2016-06-11 /pmc/articles/PMC4908333/ /pubmed/27307640 http://dx.doi.org/10.1093/bioinformatics/btw249 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida
He, Dan
Kuhn, David
Parida, Laxmi
Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction
title Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction
title_full Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction
title_fullStr Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction
title_full_unstemmed Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction
title_short Novel applications of multitask learning and multiple output regression to multiple genetic trait prediction
title_sort novel applications of multitask learning and multiple output regression to multiple genetic trait prediction
topic Ismb 2016 Proceedings July 8 to July 12, 2016, Orlando, Florida
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4908333/
https://www.ncbi.nlm.nih.gov/pubmed/27307640
http://dx.doi.org/10.1093/bioinformatics/btw249
work_keys_str_mv AT hedan novelapplicationsofmultitasklearningandmultipleoutputregressiontomultiplegenetictraitprediction
AT kuhndavid novelapplicationsofmultitasklearningandmultipleoutputregressiontomultiplegenetictraitprediction
AT paridalaxmi novelapplicationsofmultitasklearningandmultipleoutputregressiontomultiplegenetictraitprediction