Cargando…

Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge

A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytopht...

Descripción completa

Detalles Bibliográficos
Autores principales: Loh, Po-Ru, Tucker, George, Berger, Bonnie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3247233/
https://www.ncbi.nlm.nih.gov/pubmed/22216175
http://dx.doi.org/10.1371/journal.pone.0029095
_version_ 1782220061571809280
author Loh, Po-Ru
Tucker, George
Berger, Bonnie
author_facet Loh, Po-Ru
Tucker, George
Berger, Bonnie
author_sort Loh, Po-Ru
collection PubMed
description A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytophthora sojae from training sets including phenotype, genotype, and gene expression data. The challenge test set was divided into three subcategories, one requiring prediction based on only genotype data, another on only gene expression data, and the third on both genotype and gene expression data. Here we present our approach, primarily using regularized regression, which received the best-performer award for subchallenge B2 (gene expression only). We found that despite the availability of 941 genotype markers and 28,395 gene expression features, optimal models determined by cross-validation experiments typically used fewer than ten predictors, underscoring the importance of strong regularization in noisy datasets with far more features than samples. We also present substantial analysis of the training and test setup of the challenge, identifying high variance in performance on the gold standard test sets.
format Online
Article
Text
id pubmed-3247233
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-32472332012-01-03 Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge Loh, Po-Ru Tucker, George Berger, Bonnie PLoS One Research Article A major goal of large-scale genomics projects is to enable the use of data from high-throughput experimental methods to predict complex phenotypes such as disease susceptibility. The DREAM5 Systems Genetics B Challenge solicited algorithms to predict soybean plant resistance to the pathogen Phytophthora sojae from training sets including phenotype, genotype, and gene expression data. The challenge test set was divided into three subcategories, one requiring prediction based on only genotype data, another on only gene expression data, and the third on both genotype and gene expression data. Here we present our approach, primarily using regularized regression, which received the best-performer award for subchallenge B2 (gene expression only). We found that despite the availability of 941 genotype markers and 28,395 gene expression features, optimal models determined by cross-validation experiments typically used fewer than ten predictors, underscoring the importance of strong regularization in noisy datasets with far more features than samples. We also present substantial analysis of the training and test setup of the challenge, identifying high variance in performance on the gold standard test sets. Public Library of Science 2011-12-28 /pmc/articles/PMC3247233/ /pubmed/22216175 http://dx.doi.org/10.1371/journal.pone.0029095 Text en Loh et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Loh, Po-Ru
Tucker, George
Berger, Bonnie
Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
title Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
title_full Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
title_fullStr Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
title_full_unstemmed Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
title_short Phenotype Prediction Using Regularized Regression on Genetic Data in the DREAM5 Systems Genetics B Challenge
title_sort phenotype prediction using regularized regression on genetic data in the dream5 systems genetics b challenge
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3247233/
https://www.ncbi.nlm.nih.gov/pubmed/22216175
http://dx.doi.org/10.1371/journal.pone.0029095
work_keys_str_mv AT lohporu phenotypepredictionusingregularizedregressionongeneticdatainthedream5systemsgeneticsbchallenge
AT tuckergeorge phenotypepredictionusingregularizedregressionongeneticdatainthedream5systemsgeneticsbchallenge
AT bergerbonnie phenotypepredictionusingregularizedregressionongeneticdatainthedream5systemsgeneticsbchallenge