Cargando…

Precision-mapping and statistical validation of quantitative trait loci by machine learning

BACKGROUND: We introduce a QTL-mapping algorithm based on Statistical Machine Learning (SML) that is conceptually quite different to existing methods as there is a strong focus on generalisation ability. Our approach combines ridge regression, recursive feature elimination, and estimation of general...

Descripción completa

Detalles Bibliográficos
Autores principales: Bedo, Justin, Wenzl, Peter, Kowalczyk, Adam, Kilian, Andrzej
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2409372/
https://www.ncbi.nlm.nih.gov/pubmed/18452626
http://dx.doi.org/10.1186/1471-2156-9-35
_version_ 1782155763191382016
author Bedo, Justin
Wenzl, Peter
Kowalczyk, Adam
Kilian, Andrzej
author_facet Bedo, Justin
Wenzl, Peter
Kowalczyk, Adam
Kilian, Andrzej
author_sort Bedo, Justin
collection PubMed
description BACKGROUND: We introduce a QTL-mapping algorithm based on Statistical Machine Learning (SML) that is conceptually quite different to existing methods as there is a strong focus on generalisation ability. Our approach combines ridge regression, recursive feature elimination, and estimation of generalisation performance and marker effects using bootstrap resampling. Model performance and marker effects are determined using independent testing samples (individuals), thus providing better estimates. We compare the performance of SML against Composite Interval Mapping (CIM), Bayesian Interval Mapping (BIM) and single Marker Regression (MR) on synthetic datasets and a multi-trait and multi-environment dataset of the progeny for a cross between two barley cultivars. RESULTS: In an analysis of the synthetic datasets, SML accurately predicted the number of QTL underlying a trait while BIM tended to underestimate the number of QTL. The QTL identified by SML for the barley dataset broadly coincided with known QTL locations. SML reported approximately half of the QTL reported by either CIM or MR, not unexpected given that neither CIM nor MR incorporates independent testing. The latter makes these two methods susceptible to producing overly optimistic estimates of QTL effects, as we demonstrate for MR. The QTL resolution (peak definition) afforded by SML was consistently superior to MR, CIM and BIM, with QTL detection power similar to BIM. The precision of SML was underscored by repeatedly identifying, at ≤ 1-cM precision, three QTL for four partially related traits (heading date, plant height, lodging and yield). The set of QTL obtained using a 'raw' and a 'curated' version of the same genotypic dataset were more similar to each other for SML than for CIM or MR. CONCLUSION: The SML algorithm produces better estimates of QTL effects because it eliminates the optimistic bias in the predictive performance of other QTL methods. It produces narrower peaks than other methods (except BIM) and hence identifies QTL with greater precision. It is more robust to genotyping and linkage mapping errors, and identifies markers linked to QTL in the absence of a genetic map.
format Text
id pubmed-2409372
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24093722008-06-04 Precision-mapping and statistical validation of quantitative trait loci by machine learning Bedo, Justin Wenzl, Peter Kowalczyk, Adam Kilian, Andrzej BMC Genet Research Article BACKGROUND: We introduce a QTL-mapping algorithm based on Statistical Machine Learning (SML) that is conceptually quite different to existing methods as there is a strong focus on generalisation ability. Our approach combines ridge regression, recursive feature elimination, and estimation of generalisation performance and marker effects using bootstrap resampling. Model performance and marker effects are determined using independent testing samples (individuals), thus providing better estimates. We compare the performance of SML against Composite Interval Mapping (CIM), Bayesian Interval Mapping (BIM) and single Marker Regression (MR) on synthetic datasets and a multi-trait and multi-environment dataset of the progeny for a cross between two barley cultivars. RESULTS: In an analysis of the synthetic datasets, SML accurately predicted the number of QTL underlying a trait while BIM tended to underestimate the number of QTL. The QTL identified by SML for the barley dataset broadly coincided with known QTL locations. SML reported approximately half of the QTL reported by either CIM or MR, not unexpected given that neither CIM nor MR incorporates independent testing. The latter makes these two methods susceptible to producing overly optimistic estimates of QTL effects, as we demonstrate for MR. The QTL resolution (peak definition) afforded by SML was consistently superior to MR, CIM and BIM, with QTL detection power similar to BIM. The precision of SML was underscored by repeatedly identifying, at ≤ 1-cM precision, three QTL for four partially related traits (heading date, plant height, lodging and yield). The set of QTL obtained using a 'raw' and a 'curated' version of the same genotypic dataset were more similar to each other for SML than for CIM or MR. CONCLUSION: The SML algorithm produces better estimates of QTL effects because it eliminates the optimistic bias in the predictive performance of other QTL methods. It produces narrower peaks than other methods (except BIM) and hence identifies QTL with greater precision. It is more robust to genotyping and linkage mapping errors, and identifies markers linked to QTL in the absence of a genetic map. BioMed Central 2008-05-02 /pmc/articles/PMC2409372/ /pubmed/18452626 http://dx.doi.org/10.1186/1471-2156-9-35 Text en Copyright © 2008 Bedo et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Bedo, Justin
Wenzl, Peter
Kowalczyk, Adam
Kilian, Andrzej
Precision-mapping and statistical validation of quantitative trait loci by machine learning
title Precision-mapping and statistical validation of quantitative trait loci by machine learning
title_full Precision-mapping and statistical validation of quantitative trait loci by machine learning
title_fullStr Precision-mapping and statistical validation of quantitative trait loci by machine learning
title_full_unstemmed Precision-mapping and statistical validation of quantitative trait loci by machine learning
title_short Precision-mapping and statistical validation of quantitative trait loci by machine learning
title_sort precision-mapping and statistical validation of quantitative trait loci by machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2409372/
https://www.ncbi.nlm.nih.gov/pubmed/18452626
http://dx.doi.org/10.1186/1471-2156-9-35
work_keys_str_mv AT bedojustin precisionmappingandstatisticalvalidationofquantitativetraitlocibymachinelearning
AT wenzlpeter precisionmappingandstatisticalvalidationofquantitativetraitlocibymachinelearning
AT kowalczykadam precisionmappingandstatisticalvalidationofquantitativetraitlocibymachinelearning
AT kilianandrzej precisionmappingandstatisticalvalidationofquantitativetraitlocibymachinelearning