Cargando…

A Ranking Approach to Genomic Selection

BACKGROUND: Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual’s breeding value for a particular trait of interest, i.e., as a regression...

Descripción completa

Detalles Bibliográficos
Autores principales: Blondel, Mathieu, Onogi, Akio, Iwata, Hiroyoshi, Ueda, Naonori
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4466774/
https://www.ncbi.nlm.nih.gov/pubmed/26068103
http://dx.doi.org/10.1371/journal.pone.0128570
_version_ 1782376282798948352
author Blondel, Mathieu
Onogi, Akio
Iwata, Hiroyoshi
Ueda, Naonori
author_facet Blondel, Mathieu
Onogi, Akio
Iwata, Hiroyoshi
Ueda, Naonori
author_sort Blondel, Mathieu
collection PubMed
description BACKGROUND: Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual’s breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. CONTRIBUTIONS: In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value. RESULTS: We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS.
format Online
Article
Text
id pubmed-4466774
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-44667742015-06-22 A Ranking Approach to Genomic Selection Blondel, Mathieu Onogi, Akio Iwata, Hiroyoshi Ueda, Naonori PLoS One Research Article BACKGROUND: Genomic selection (GS) is a recent selective breeding method which uses predictive models based on whole-genome molecular markers. Until now, existing studies formulated GS as the problem of modeling an individual’s breeding value for a particular trait of interest, i.e., as a regression problem. To assess predictive accuracy of the model, the Pearson correlation between observed and predicted trait values was used. CONTRIBUTIONS: In this paper, we propose to formulate GS as the problem of ranking individuals according to their breeding value. Our proposed framework allows us to employ machine learning methods for ranking which had previously not been considered in the GS literature. To assess ranking accuracy of a model, we introduce a new measure originating from the information retrieval literature called normalized discounted cumulative gain (NDCG). NDCG rewards more strongly models which assign a high rank to individuals with high breeding value. Therefore, NDCG reflects a prerequisite objective in selective breeding: accurate selection of individuals with high breeding value. RESULTS: We conducted a comparison of 10 existing regression methods and 3 new ranking methods on 6 datasets, consisting of 4 plant species and 25 traits. Our experimental results suggest that tree-based ensemble methods including McRank, Random Forests and Gradient Boosting Regression Trees achieve excellent ranking accuracy. RKHS regression and RankSVM also achieve good accuracy when used with an RBF kernel. Traditional regression methods such as Bayesian lasso, wBSR and BayesC were found less suitable for ranking. Pearson correlation was found to correlate poorly with NDCG. Our study suggests two important messages. First, ranking methods are a promising research direction in GS. Second, NDCG can be a useful evaluation measure for GS. Public Library of Science 2015-06-12 /pmc/articles/PMC4466774/ /pubmed/26068103 http://dx.doi.org/10.1371/journal.pone.0128570 Text en © 2015 Blondel et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Blondel, Mathieu
Onogi, Akio
Iwata, Hiroyoshi
Ueda, Naonori
A Ranking Approach to Genomic Selection
title A Ranking Approach to Genomic Selection
title_full A Ranking Approach to Genomic Selection
title_fullStr A Ranking Approach to Genomic Selection
title_full_unstemmed A Ranking Approach to Genomic Selection
title_short A Ranking Approach to Genomic Selection
title_sort ranking approach to genomic selection
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4466774/
https://www.ncbi.nlm.nih.gov/pubmed/26068103
http://dx.doi.org/10.1371/journal.pone.0128570
work_keys_str_mv AT blondelmathieu arankingapproachtogenomicselection
AT onogiakio arankingapproachtogenomicselection
AT iwatahiroyoshi arankingapproachtogenomicselection
AT uedanaonori arankingapproachtogenomicselection
AT blondelmathieu rankingapproachtogenomicselection
AT onogiakio rankingapproachtogenomicselection
AT iwatahiroyoshi rankingapproachtogenomicselection
AT uedanaonori rankingapproachtogenomicselection