Cargando…
Grouping of genomic markers in populations with family structure
BACKGROUND: Linkage and linkage disequilibrium (LD) between genome regions cause dependencies among genomic markers. Due to family stratification in populations with non-random mating in livestock or crop, the standard measures of population LD such as [Formula: see text] may be biased. Grouping of...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7893918/ https://www.ncbi.nlm.nih.gov/pubmed/33607943 http://dx.doi.org/10.1186/s12859-021-04010-0 |
_version_ | 1783653143414833152 |
---|---|
author | Wittenburg, Dörte Doschoris, Michael Klosa, Jan |
author_facet | Wittenburg, Dörte Doschoris, Michael Klosa, Jan |
author_sort | Wittenburg, Dörte |
collection | PubMed |
description | BACKGROUND: Linkage and linkage disequilibrium (LD) between genome regions cause dependencies among genomic markers. Due to family stratification in populations with non-random mating in livestock or crop, the standard measures of population LD such as [Formula: see text] may be biased. Grouping of markers according to their interdependence needs to account for the actual population structure in order to allow proper inference in genome-based evaluations. RESULTS: Given a matrix reflecting the strength of association between markers, groups are built successively using a greedy algorithm; largest groups are built at first. As an option, a representative marker is selected for each group. We provide an implementation of the grouping approach as a new function to the R package hscovar. This package enables the calculation of the theoretical covariance between biallelic markers for half- or full-sib families and the derivation of representative markers. In case studies, we have shown that the number of groups comprising dependent markers was smaller and representative SNPs were spread more uniformly over the investigated chromosome region when the family stratification was respected compared to a population-LD approach. In a simulation study, we observed that sensitivity and specificity of a genome-based association study improved if selection of representative markers took family structure into account. CONCLUSIONS: Chromosome segments which frequently recombine in the underlying population can be identified from the matrix of pairwise dependence between markers. Representative markers can be exploited, for instance, for dimension reduction prior to a genome-based association study or the grouping structure itself can be employed in a grouped penalization approach. |
format | Online Article Text |
id | pubmed-7893918 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-78939182021-02-22 Grouping of genomic markers in populations with family structure Wittenburg, Dörte Doschoris, Michael Klosa, Jan BMC Bioinformatics Research Article BACKGROUND: Linkage and linkage disequilibrium (LD) between genome regions cause dependencies among genomic markers. Due to family stratification in populations with non-random mating in livestock or crop, the standard measures of population LD such as [Formula: see text] may be biased. Grouping of markers according to their interdependence needs to account for the actual population structure in order to allow proper inference in genome-based evaluations. RESULTS: Given a matrix reflecting the strength of association between markers, groups are built successively using a greedy algorithm; largest groups are built at first. As an option, a representative marker is selected for each group. We provide an implementation of the grouping approach as a new function to the R package hscovar. This package enables the calculation of the theoretical covariance between biallelic markers for half- or full-sib families and the derivation of representative markers. In case studies, we have shown that the number of groups comprising dependent markers was smaller and representative SNPs were spread more uniformly over the investigated chromosome region when the family stratification was respected compared to a population-LD approach. In a simulation study, we observed that sensitivity and specificity of a genome-based association study improved if selection of representative markers took family structure into account. CONCLUSIONS: Chromosome segments which frequently recombine in the underlying population can be identified from the matrix of pairwise dependence between markers. Representative markers can be exploited, for instance, for dimension reduction prior to a genome-based association study or the grouping structure itself can be employed in a grouped penalization approach. BioMed Central 2021-02-19 /pmc/articles/PMC7893918/ /pubmed/33607943 http://dx.doi.org/10.1186/s12859-021-04010-0 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Wittenburg, Dörte Doschoris, Michael Klosa, Jan Grouping of genomic markers in populations with family structure |
title | Grouping of genomic markers in populations with family structure |
title_full | Grouping of genomic markers in populations with family structure |
title_fullStr | Grouping of genomic markers in populations with family structure |
title_full_unstemmed | Grouping of genomic markers in populations with family structure |
title_short | Grouping of genomic markers in populations with family structure |
title_sort | grouping of genomic markers in populations with family structure |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7893918/ https://www.ncbi.nlm.nih.gov/pubmed/33607943 http://dx.doi.org/10.1186/s12859-021-04010-0 |
work_keys_str_mv | AT wittenburgdorte groupingofgenomicmarkersinpopulationswithfamilystructure AT doschorismichael groupingofgenomicmarkersinpopulationswithfamilystructure AT klosajan groupingofgenomicmarkersinpopulationswithfamilystructure |