Cargando…

Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study

BACKGROUND: Identifying true positive variants in genome-wide associations (GWA) depends on several factors, including the number of genotyped individuals. The limited dimensionality of genomic information may give insights into the optimal number of individuals to be used in GWA. This study investi...

Descripción completa

Detalles Bibliográficos
Autores principales: Jang, Sungbong, Tsuruta, Shogo, Leite, Natalia Galoro, Misztal, Ignacy, Lourenco, Daniela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10351171/
https://www.ncbi.nlm.nih.gov/pubmed/37460964
http://dx.doi.org/10.1186/s12711-023-00823-0
_version_ 1785074290405670912
author Jang, Sungbong
Tsuruta, Shogo
Leite, Natalia Galoro
Misztal, Ignacy
Lourenco, Daniela
author_facet Jang, Sungbong
Tsuruta, Shogo
Leite, Natalia Galoro
Misztal, Ignacy
Lourenco, Daniela
author_sort Jang, Sungbong
collection PubMed
description BACKGROUND: Identifying true positive variants in genome-wide associations (GWA) depends on several factors, including the number of genotyped individuals. The limited dimensionality of genomic information may give insights into the optimal number of individuals to be used in GWA. This study investigated different discovery set sizes based on the number of largest eigenvalues explaining a certain proportion of variance in the genomic relationship matrix (G). In addition, we investigated the impact on the prediction accuracy by adding variants, which were selected based on different set sizes, to the regular single nucleotide polymorphism (SNP) chips used for genomic prediction. METHODS: We simulated sequence data that included 500k SNPs with 200 or 2000 quantitative trait nucleotides (QTN). A regular 50k panel included one in every ten simulated SNPs. Effective population size (Ne) was set to 20 or 200. GWA were performed using a number of genotyped animals equivalent to the number of largest eigenvalues of G (EIG) explaining 50, 60, 70, 80, 90, 95, 98, and 99% of the variance. In addition, the largest discovery set consisted of 30k genotyped animals. Limited or extensive phenotypic information was mimicked by changing the trait heritability. Significant and large-effect size SNPs were added to the 50k panel and used for single-step genomic best linear unbiased prediction (ssGBLUP). RESULTS: Using a number of genotyped animals corresponding to at least EIG98 allowed the identification of QTN with the largest effect sizes when Ne was large. Populations with smaller Ne required more than EIG98. Furthermore, including genotyped animals with a higher reliability (i.e., a higher trait heritability) improved the identification of the most informative QTN. Prediction accuracy was highest when the significant or the large-effect SNPs representing twice the number of simulated QTN were added to the 50k panel. CONCLUSIONS: Accurately identifying causative variants from sequence data depends on the effective population size and, therefore, on the dimensionality of genomic information. This dimensionality can help identify the most suitable sample size for GWA and could be considered for variant selection, especially when resources are restricted. Even when variants are accurately identified, their inclusion in prediction models has limited benefits. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12711-023-00823-0.
format Online
Article
Text
id pubmed-10351171
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-103511712023-07-18 Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study Jang, Sungbong Tsuruta, Shogo Leite, Natalia Galoro Misztal, Ignacy Lourenco, Daniela Genet Sel Evol Research Article BACKGROUND: Identifying true positive variants in genome-wide associations (GWA) depends on several factors, including the number of genotyped individuals. The limited dimensionality of genomic information may give insights into the optimal number of individuals to be used in GWA. This study investigated different discovery set sizes based on the number of largest eigenvalues explaining a certain proportion of variance in the genomic relationship matrix (G). In addition, we investigated the impact on the prediction accuracy by adding variants, which were selected based on different set sizes, to the regular single nucleotide polymorphism (SNP) chips used for genomic prediction. METHODS: We simulated sequence data that included 500k SNPs with 200 or 2000 quantitative trait nucleotides (QTN). A regular 50k panel included one in every ten simulated SNPs. Effective population size (Ne) was set to 20 or 200. GWA were performed using a number of genotyped animals equivalent to the number of largest eigenvalues of G (EIG) explaining 50, 60, 70, 80, 90, 95, 98, and 99% of the variance. In addition, the largest discovery set consisted of 30k genotyped animals. Limited or extensive phenotypic information was mimicked by changing the trait heritability. Significant and large-effect size SNPs were added to the 50k panel and used for single-step genomic best linear unbiased prediction (ssGBLUP). RESULTS: Using a number of genotyped animals corresponding to at least EIG98 allowed the identification of QTN with the largest effect sizes when Ne was large. Populations with smaller Ne required more than EIG98. Furthermore, including genotyped animals with a higher reliability (i.e., a higher trait heritability) improved the identification of the most informative QTN. Prediction accuracy was highest when the significant or the large-effect SNPs representing twice the number of simulated QTN were added to the 50k panel. CONCLUSIONS: Accurately identifying causative variants from sequence data depends on the effective population size and, therefore, on the dimensionality of genomic information. This dimensionality can help identify the most suitable sample size for GWA and could be considered for variant selection, especially when resources are restricted. Even when variants are accurately identified, their inclusion in prediction models has limited benefits. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12711-023-00823-0. BioMed Central 2023-07-17 /pmc/articles/PMC10351171/ /pubmed/37460964 http://dx.doi.org/10.1186/s12711-023-00823-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Jang, Sungbong
Tsuruta, Shogo
Leite, Natalia Galoro
Misztal, Ignacy
Lourenco, Daniela
Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study
title Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study
title_full Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study
title_fullStr Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study
title_full_unstemmed Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study
title_short Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study
title_sort dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10351171/
https://www.ncbi.nlm.nih.gov/pubmed/37460964
http://dx.doi.org/10.1186/s12711-023-00823-0
work_keys_str_mv AT jangsungbong dimensionalityofgenomicinformationanditsimpactongenomewideassociationsandvariantselectionforgenomicpredictionasimulationstudy
AT tsurutashogo dimensionalityofgenomicinformationanditsimpactongenomewideassociationsandvariantselectionforgenomicpredictionasimulationstudy
AT leitenataliagaloro dimensionalityofgenomicinformationanditsimpactongenomewideassociationsandvariantselectionforgenomicpredictionasimulationstudy
AT misztalignacy dimensionalityofgenomicinformationanditsimpactongenomewideassociationsandvariantselectionforgenomicpredictionasimulationstudy
AT lourencodaniela dimensionalityofgenomicinformationanditsimpactongenomewideassociationsandvariantselectionforgenomicpredictionasimulationstudy