Cargando…

An adaptive threshold determination method of feature screening for genomic selection

BACKGROUND: Although the dimension of the entire genome can be extremely large, only a parsimonious set of influential SNPs are correlated with a particular complex trait and are important to the prediction of the trait. Efficiently and accurately selecting these influential SNPs from millions of ca...

Descripción completa

Detalles Bibliográficos
Autores principales: Fu, Guifang, Wang, Gang, Dai, Xiaotian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389084/
https://www.ncbi.nlm.nih.gov/pubmed/28403836
http://dx.doi.org/10.1186/s12859-017-1617-9
_version_ 1782521227158487040
author Fu, Guifang
Wang, Gang
Dai, Xiaotian
author_facet Fu, Guifang
Wang, Gang
Dai, Xiaotian
author_sort Fu, Guifang
collection PubMed
description BACKGROUND: Although the dimension of the entire genome can be extremely large, only a parsimonious set of influential SNPs are correlated with a particular complex trait and are important to the prediction of the trait. Efficiently and accurately selecting these influential SNPs from millions of candidates is in high demand, but poses challenges. We propose a backward elimination iterative distance correlation (BE-IDC) procedure to select the smallest subset of SNPs that guarantees sufficient prediction accuracy, while also solving the unclear threshold issue for traditional feature screening approaches. RESULTS: Verified through six simulations, the adaptive threshold estimated by the BE-IDC performed uniformly better than fixed threshold methods that have been used in the current literature. We also applied BE-IDC to an Arabidopsis thaliana genome-wide data. Out of 216,130 SNPs, BE-IDC selected four influential SNPs, and confirmed the same FRIGIDA gene that was reported by two other traditional methods. CONCLUSIONS: BE-IDC accommodates both the prediction accuracy and the computational speed that are highly demanded in the genomic selection. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1617-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5389084
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53890842017-04-14 An adaptive threshold determination method of feature screening for genomic selection Fu, Guifang Wang, Gang Dai, Xiaotian BMC Bioinformatics Methodology Article BACKGROUND: Although the dimension of the entire genome can be extremely large, only a parsimonious set of influential SNPs are correlated with a particular complex trait and are important to the prediction of the trait. Efficiently and accurately selecting these influential SNPs from millions of candidates is in high demand, but poses challenges. We propose a backward elimination iterative distance correlation (BE-IDC) procedure to select the smallest subset of SNPs that guarantees sufficient prediction accuracy, while also solving the unclear threshold issue for traditional feature screening approaches. RESULTS: Verified through six simulations, the adaptive threshold estimated by the BE-IDC performed uniformly better than fixed threshold methods that have been used in the current literature. We also applied BE-IDC to an Arabidopsis thaliana genome-wide data. Out of 216,130 SNPs, BE-IDC selected four influential SNPs, and confirmed the same FRIGIDA gene that was reported by two other traditional methods. CONCLUSIONS: BE-IDC accommodates both the prediction accuracy and the computational speed that are highly demanded in the genomic selection. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1617-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-12 /pmc/articles/PMC5389084/ /pubmed/28403836 http://dx.doi.org/10.1186/s12859-017-1617-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Fu, Guifang
Wang, Gang
Dai, Xiaotian
An adaptive threshold determination method of feature screening for genomic selection
title An adaptive threshold determination method of feature screening for genomic selection
title_full An adaptive threshold determination method of feature screening for genomic selection
title_fullStr An adaptive threshold determination method of feature screening for genomic selection
title_full_unstemmed An adaptive threshold determination method of feature screening for genomic selection
title_short An adaptive threshold determination method of feature screening for genomic selection
title_sort adaptive threshold determination method of feature screening for genomic selection
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389084/
https://www.ncbi.nlm.nih.gov/pubmed/28403836
http://dx.doi.org/10.1186/s12859-017-1617-9
work_keys_str_mv AT fuguifang anadaptivethresholddeterminationmethodoffeaturescreeningforgenomicselection
AT wanggang anadaptivethresholddeterminationmethodoffeaturescreeningforgenomicselection
AT daixiaotian anadaptivethresholddeterminationmethodoffeaturescreeningforgenomicselection
AT fuguifang adaptivethresholddeterminationmethodoffeaturescreeningforgenomicselection
AT wanggang adaptivethresholddeterminationmethodoffeaturescreeningforgenomicselection
AT daixiaotian adaptivethresholddeterminationmethodoffeaturescreeningforgenomicselection