Cargando…

A comparison of random forests, boosting and support vector machines for genomic selection

BACKGROUND: Genomic selection (GS) involves estimating breeding values using molecular markers spanning the entire genome. Accurate prediction of genomic breeding values (GEBVs) presents a central challenge to contemporary plant and animal breeders. The existence of a wide array of marker-based appr...

Descripción completa

Detalles Bibliográficos
Autores principales: Ogutu, Joseph O, Piepho, Hans-Peter, Schulz-Streeck, Torben
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3103196/
https://www.ncbi.nlm.nih.gov/pubmed/21624167
http://dx.doi.org/10.1186/1753-6561-5-S3-S11
_version_ 1782204496472965120
author Ogutu, Joseph O
Piepho, Hans-Peter
Schulz-Streeck, Torben
author_facet Ogutu, Joseph O
Piepho, Hans-Peter
Schulz-Streeck, Torben
author_sort Ogutu, Joseph O
collection PubMed
description BACKGROUND: Genomic selection (GS) involves estimating breeding values using molecular markers spanning the entire genome. Accurate prediction of genomic breeding values (GEBVs) presents a central challenge to contemporary plant and animal breeders. The existence of a wide array of marker-based approaches for predicting breeding values makes it essential to evaluate and compare their relative predictive performances to identify approaches able to accurately predict breeding values. We evaluated the predictive accuracy of random forests (RF), stochastic gradient boosting (boosting) and support vector machines (SVMs) for predicting genomic breeding values using dense SNP markers and explored the utility of RF for ranking the predictive importance of markers for pre-screening markers or discovering chromosomal locations of QTLs. METHODS: We predicted GEBVs for one quantitative trait in a dataset simulated for the QTLMAS 2010 workshop. Predictive accuracy was measured as the Pearson correlation between GEBVs and observed values using 5-fold cross-validation and between predicted and true breeding values. The importance of each marker was ranked using RF and plotted against the position of the marker and associated QTLs on one of five simulated chromosomes. RESULTS: The correlations between the predicted and true breeding values were 0.547 for boosting, 0.497 for SVMs, and 0.483 for RF, indicating better performance for boosting than for SVMs and RF. CONCLUSIONS: Accuracy was highest for boosting, intermediate for SVMs and lowest for RF but differed little among the three methods and relative to ridge regression BLUP (RR-BLUP).
format Text
id pubmed-3103196
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31031962011-05-28 A comparison of random forests, boosting and support vector machines for genomic selection Ogutu, Joseph O Piepho, Hans-Peter Schulz-Streeck, Torben BMC Proc Proceedings BACKGROUND: Genomic selection (GS) involves estimating breeding values using molecular markers spanning the entire genome. Accurate prediction of genomic breeding values (GEBVs) presents a central challenge to contemporary plant and animal breeders. The existence of a wide array of marker-based approaches for predicting breeding values makes it essential to evaluate and compare their relative predictive performances to identify approaches able to accurately predict breeding values. We evaluated the predictive accuracy of random forests (RF), stochastic gradient boosting (boosting) and support vector machines (SVMs) for predicting genomic breeding values using dense SNP markers and explored the utility of RF for ranking the predictive importance of markers for pre-screening markers or discovering chromosomal locations of QTLs. METHODS: We predicted GEBVs for one quantitative trait in a dataset simulated for the QTLMAS 2010 workshop. Predictive accuracy was measured as the Pearson correlation between GEBVs and observed values using 5-fold cross-validation and between predicted and true breeding values. The importance of each marker was ranked using RF and plotted against the position of the marker and associated QTLs on one of five simulated chromosomes. RESULTS: The correlations between the predicted and true breeding values were 0.547 for boosting, 0.497 for SVMs, and 0.483 for RF, indicating better performance for boosting than for SVMs and RF. CONCLUSIONS: Accuracy was highest for boosting, intermediate for SVMs and lowest for RF but differed little among the three methods and relative to ridge regression BLUP (RR-BLUP). BioMed Central 2011-05-27 /pmc/articles/PMC3103196/ /pubmed/21624167 http://dx.doi.org/10.1186/1753-6561-5-S3-S11 Text en Copyright ©2011 Ogutu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Ogutu, Joseph O
Piepho, Hans-Peter
Schulz-Streeck, Torben
A comparison of random forests, boosting and support vector machines for genomic selection
title A comparison of random forests, boosting and support vector machines for genomic selection
title_full A comparison of random forests, boosting and support vector machines for genomic selection
title_fullStr A comparison of random forests, boosting and support vector machines for genomic selection
title_full_unstemmed A comparison of random forests, boosting and support vector machines for genomic selection
title_short A comparison of random forests, boosting and support vector machines for genomic selection
title_sort comparison of random forests, boosting and support vector machines for genomic selection
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3103196/
https://www.ncbi.nlm.nih.gov/pubmed/21624167
http://dx.doi.org/10.1186/1753-6561-5-S3-S11
work_keys_str_mv AT ogutujosepho acomparisonofrandomforestsboostingandsupportvectormachinesforgenomicselection
AT piephohanspeter acomparisonofrandomforestsboostingandsupportvectormachinesforgenomicselection
AT schulzstreecktorben acomparisonofrandomforestsboostingandsupportvectormachinesforgenomicselection
AT ogutujosepho comparisonofrandomforestsboostingandsupportvectormachinesforgenomicselection
AT piephohanspeter comparisonofrandomforestsboostingandsupportvectormachinesforgenomicselection
AT schulzstreecktorben comparisonofrandomforestsboostingandsupportvectormachinesforgenomicselection