Cargando…
Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study
BACKGROUND: Identifying true positive variants in genome-wide associations (GWA) depends on several factors, including the number of genotyped individuals. The limited dimensionality of genomic information may give insights into the optimal number of individuals to be used in GWA. This study investi...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10351171/ https://www.ncbi.nlm.nih.gov/pubmed/37460964 http://dx.doi.org/10.1186/s12711-023-00823-0 |
_version_ | 1785074290405670912 |
---|---|
author | Jang, Sungbong Tsuruta, Shogo Leite, Natalia Galoro Misztal, Ignacy Lourenco, Daniela |
author_facet | Jang, Sungbong Tsuruta, Shogo Leite, Natalia Galoro Misztal, Ignacy Lourenco, Daniela |
author_sort | Jang, Sungbong |
collection | PubMed |
description | BACKGROUND: Identifying true positive variants in genome-wide associations (GWA) depends on several factors, including the number of genotyped individuals. The limited dimensionality of genomic information may give insights into the optimal number of individuals to be used in GWA. This study investigated different discovery set sizes based on the number of largest eigenvalues explaining a certain proportion of variance in the genomic relationship matrix (G). In addition, we investigated the impact on the prediction accuracy by adding variants, which were selected based on different set sizes, to the regular single nucleotide polymorphism (SNP) chips used for genomic prediction. METHODS: We simulated sequence data that included 500k SNPs with 200 or 2000 quantitative trait nucleotides (QTN). A regular 50k panel included one in every ten simulated SNPs. Effective population size (Ne) was set to 20 or 200. GWA were performed using a number of genotyped animals equivalent to the number of largest eigenvalues of G (EIG) explaining 50, 60, 70, 80, 90, 95, 98, and 99% of the variance. In addition, the largest discovery set consisted of 30k genotyped animals. Limited or extensive phenotypic information was mimicked by changing the trait heritability. Significant and large-effect size SNPs were added to the 50k panel and used for single-step genomic best linear unbiased prediction (ssGBLUP). RESULTS: Using a number of genotyped animals corresponding to at least EIG98 allowed the identification of QTN with the largest effect sizes when Ne was large. Populations with smaller Ne required more than EIG98. Furthermore, including genotyped animals with a higher reliability (i.e., a higher trait heritability) improved the identification of the most informative QTN. Prediction accuracy was highest when the significant or the large-effect SNPs representing twice the number of simulated QTN were added to the 50k panel. CONCLUSIONS: Accurately identifying causative variants from sequence data depends on the effective population size and, therefore, on the dimensionality of genomic information. This dimensionality can help identify the most suitable sample size for GWA and could be considered for variant selection, especially when resources are restricted. Even when variants are accurately identified, their inclusion in prediction models has limited benefits. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12711-023-00823-0. |
format | Online Article Text |
id | pubmed-10351171 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-103511712023-07-18 Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study Jang, Sungbong Tsuruta, Shogo Leite, Natalia Galoro Misztal, Ignacy Lourenco, Daniela Genet Sel Evol Research Article BACKGROUND: Identifying true positive variants in genome-wide associations (GWA) depends on several factors, including the number of genotyped individuals. The limited dimensionality of genomic information may give insights into the optimal number of individuals to be used in GWA. This study investigated different discovery set sizes based on the number of largest eigenvalues explaining a certain proportion of variance in the genomic relationship matrix (G). In addition, we investigated the impact on the prediction accuracy by adding variants, which were selected based on different set sizes, to the regular single nucleotide polymorphism (SNP) chips used for genomic prediction. METHODS: We simulated sequence data that included 500k SNPs with 200 or 2000 quantitative trait nucleotides (QTN). A regular 50k panel included one in every ten simulated SNPs. Effective population size (Ne) was set to 20 or 200. GWA were performed using a number of genotyped animals equivalent to the number of largest eigenvalues of G (EIG) explaining 50, 60, 70, 80, 90, 95, 98, and 99% of the variance. In addition, the largest discovery set consisted of 30k genotyped animals. Limited or extensive phenotypic information was mimicked by changing the trait heritability. Significant and large-effect size SNPs were added to the 50k panel and used for single-step genomic best linear unbiased prediction (ssGBLUP). RESULTS: Using a number of genotyped animals corresponding to at least EIG98 allowed the identification of QTN with the largest effect sizes when Ne was large. Populations with smaller Ne required more than EIG98. Furthermore, including genotyped animals with a higher reliability (i.e., a higher trait heritability) improved the identification of the most informative QTN. Prediction accuracy was highest when the significant or the large-effect SNPs representing twice the number of simulated QTN were added to the 50k panel. CONCLUSIONS: Accurately identifying causative variants from sequence data depends on the effective population size and, therefore, on the dimensionality of genomic information. This dimensionality can help identify the most suitable sample size for GWA and could be considered for variant selection, especially when resources are restricted. Even when variants are accurately identified, their inclusion in prediction models has limited benefits. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12711-023-00823-0. BioMed Central 2023-07-17 /pmc/articles/PMC10351171/ /pubmed/37460964 http://dx.doi.org/10.1186/s12711-023-00823-0 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Jang, Sungbong Tsuruta, Shogo Leite, Natalia Galoro Misztal, Ignacy Lourenco, Daniela Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study |
title | Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study |
title_full | Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study |
title_fullStr | Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study |
title_full_unstemmed | Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study |
title_short | Dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study |
title_sort | dimensionality of genomic information and its impact on genome-wide associations and variant selection for genomic prediction: a simulation study |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10351171/ https://www.ncbi.nlm.nih.gov/pubmed/37460964 http://dx.doi.org/10.1186/s12711-023-00823-0 |
work_keys_str_mv | AT jangsungbong dimensionalityofgenomicinformationanditsimpactongenomewideassociationsandvariantselectionforgenomicpredictionasimulationstudy AT tsurutashogo dimensionalityofgenomicinformationanditsimpactongenomewideassociationsandvariantselectionforgenomicpredictionasimulationstudy AT leitenataliagaloro dimensionalityofgenomicinformationanditsimpactongenomewideassociationsandvariantselectionforgenomicpredictionasimulationstudy AT misztalignacy dimensionalityofgenomicinformationanditsimpactongenomewideassociationsandvariantselectionforgenomicpredictionasimulationstudy AT lourencodaniela dimensionalityofgenomicinformationanditsimpactongenomewideassociationsandvariantselectionforgenomicpredictionasimulationstudy |