Cargando…

Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods

Accurate prediction of complex traits based on whole-genome data is a computational problem of paramount importance, particularly to plant and animal breeders. However, the number of genetic markers is typically orders of magnitude larger than the number of samples (p >> n), amongst other chal...

Descripción completa

Detalles Bibliográficos
Autores principales: Haws, David C., Rish, Irina, Teyssedre, Simon, He, Dan, Lozano, Aurelie C., Kambadur, Prabhanjan, Karaman, Zivan, Parida, Laxmi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4595020/
https://www.ncbi.nlm.nih.gov/pubmed/26439851
http://dx.doi.org/10.1371/journal.pone.0138903
_version_ 1782393520139534336
author Haws, David C.
Rish, Irina
Teyssedre, Simon
He, Dan
Lozano, Aurelie C.
Kambadur, Prabhanjan
Karaman, Zivan
Parida, Laxmi
author_facet Haws, David C.
Rish, Irina
Teyssedre, Simon
He, Dan
Lozano, Aurelie C.
Kambadur, Prabhanjan
Karaman, Zivan
Parida, Laxmi
author_sort Haws, David C.
collection PubMed
description Accurate prediction of complex traits based on whole-genome data is a computational problem of paramount importance, particularly to plant and animal breeders. However, the number of genetic markers is typically orders of magnitude larger than the number of samples (p >> n), amongst other challenges. We assessed the effectiveness of a diverse set of state-of-the-art methods on publicly accessible real data. The most surprising finding was that approaches with feature selection performed better than others on average, in contrast to the expectation in the community that variable selection is mostly ineffective, i.e. that it does not improve accuracy of prediction, in spite of p >> n. We observed superior performance despite a somewhat simplistic approach to variable selection, possibly suggesting an inherent robustness. This bodes well in general since the variable selection methods usually improve interpretability without loss of prediction power. Apart from identifying a set of benchmark data sets (including one simulated data), we also discuss the performance analysis for each data set in terms of the input characteristics.
format Online
Article
Text
id pubmed-4595020
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-45950202015-10-09 Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods Haws, David C. Rish, Irina Teyssedre, Simon He, Dan Lozano, Aurelie C. Kambadur, Prabhanjan Karaman, Zivan Parida, Laxmi PLoS One Research Article Accurate prediction of complex traits based on whole-genome data is a computational problem of paramount importance, particularly to plant and animal breeders. However, the number of genetic markers is typically orders of magnitude larger than the number of samples (p >> n), amongst other challenges. We assessed the effectiveness of a diverse set of state-of-the-art methods on publicly accessible real data. The most surprising finding was that approaches with feature selection performed better than others on average, in contrast to the expectation in the community that variable selection is mostly ineffective, i.e. that it does not improve accuracy of prediction, in spite of p >> n. We observed superior performance despite a somewhat simplistic approach to variable selection, possibly suggesting an inherent robustness. This bodes well in general since the variable selection methods usually improve interpretability without loss of prediction power. Apart from identifying a set of benchmark data sets (including one simulated data), we also discuss the performance analysis for each data set in terms of the input characteristics. Public Library of Science 2015-10-06 /pmc/articles/PMC4595020/ /pubmed/26439851 http://dx.doi.org/10.1371/journal.pone.0138903 Text en © 2015 Haws et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Haws, David C.
Rish, Irina
Teyssedre, Simon
He, Dan
Lozano, Aurelie C.
Kambadur, Prabhanjan
Karaman, Zivan
Parida, Laxmi
Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods
title Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods
title_full Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods
title_fullStr Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods
title_full_unstemmed Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods
title_short Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods
title_sort variable-selection emerges on top in empirical comparison of whole-genome complex-trait prediction methods
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4595020/
https://www.ncbi.nlm.nih.gov/pubmed/26439851
http://dx.doi.org/10.1371/journal.pone.0138903
work_keys_str_mv AT hawsdavidc variableselectionemergesontopinempiricalcomparisonofwholegenomecomplextraitpredictionmethods
AT rishirina variableselectionemergesontopinempiricalcomparisonofwholegenomecomplextraitpredictionmethods
AT teyssedresimon variableselectionemergesontopinempiricalcomparisonofwholegenomecomplextraitpredictionmethods
AT hedan variableselectionemergesontopinempiricalcomparisonofwholegenomecomplextraitpredictionmethods
AT lozanoaureliec variableselectionemergesontopinempiricalcomparisonofwholegenomecomplextraitpredictionmethods
AT kambadurprabhanjan variableselectionemergesontopinempiricalcomparisonofwholegenomecomplextraitpredictionmethods
AT karamanzivan variableselectionemergesontopinempiricalcomparisonofwholegenomecomplextraitpredictionmethods
AT paridalaxmi variableselectionemergesontopinempiricalcomparisonofwholegenomecomplextraitpredictionmethods