Cargando…

Structure-informed clustering for population stratification in association studies

BACKGROUND: Identifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use princ...

Descripción completa

Detalles Bibliográficos
Autores principales: Bose, Aritra, Burch, Myson, Chowdhury, Agniva, Paschou, Peristera, Drineas, Petros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10619291/
https://www.ncbi.nlm.nih.gov/pubmed/37907836
http://dx.doi.org/10.1186/s12859-023-05511-w
_version_ 1785129956372643840
author Bose, Aritra
Burch, Myson
Chowdhury, Agniva
Paschou, Peristera
Drineas, Petros
author_facet Bose, Aritra
Burch, Myson
Chowdhury, Agniva
Paschou, Peristera
Drineas, Petros
author_sort Bose, Aritra
collection PubMed
description BACKGROUND: Identifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants. RESULTS: To overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans. CONCLUSIONS: CluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05511-w.
format Online
Article
Text
id pubmed-10619291
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106192912023-11-02 Structure-informed clustering for population stratification in association studies Bose, Aritra Burch, Myson Chowdhury, Agniva Paschou, Peristera Drineas, Petros BMC Bioinformatics Research BACKGROUND: Identifying variants associated with complex traits is a challenging task in genetic association studies due to linkage disequilibrium (LD) between genetic variants and population stratification, unrelated to the disease risk. Existing methods of population structure correction use principal component analysis or linear mixed models with a random effect when modeling associations between a trait of interest and genetic markers. However, due to stringent significance thresholds and latent interactions between the markers, these methods often fail to detect genuinely associated variants. RESULTS: To overcome this, we propose CluStrat, which corrects for complex arbitrarily structured populations while leveraging the linkage disequilibrium induced distances between genetic markers. It performs an agglomerative hierarchical clustering using the Mahalanobis distance covariance matrix of the markers. In simulation studies, we show that our method outperforms existing methods in detecting true causal variants. Applying CluStrat on WTCCC2 and UK Biobank cohorts, we found biologically relevant associations in Schizophrenia and Myocardial Infarction. CluStrat was also able to correct for population structure in polygenic adaptation of height in Europeans. CONCLUSIONS: CluStrat highlights the advantages of biologically relevant distance metrics, such as the Mahalanobis distance, which captures the cryptic interactions within populations in the presence of LD better than the Euclidean distance. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-023-05511-w. BioMed Central 2023-10-31 /pmc/articles/PMC10619291/ /pubmed/37907836 http://dx.doi.org/10.1186/s12859-023-05511-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Bose, Aritra
Burch, Myson
Chowdhury, Agniva
Paschou, Peristera
Drineas, Petros
Structure-informed clustering for population stratification in association studies
title Structure-informed clustering for population stratification in association studies
title_full Structure-informed clustering for population stratification in association studies
title_fullStr Structure-informed clustering for population stratification in association studies
title_full_unstemmed Structure-informed clustering for population stratification in association studies
title_short Structure-informed clustering for population stratification in association studies
title_sort structure-informed clustering for population stratification in association studies
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10619291/
https://www.ncbi.nlm.nih.gov/pubmed/37907836
http://dx.doi.org/10.1186/s12859-023-05511-w
work_keys_str_mv AT bosearitra structureinformedclusteringforpopulationstratificationinassociationstudies
AT burchmyson structureinformedclusteringforpopulationstratificationinassociationstudies
AT chowdhuryagniva structureinformedclusteringforpopulationstratificationinassociationstudies
AT paschouperistera structureinformedclusteringforpopulationstratificationinassociationstudies
AT drineaspetros structureinformedclusteringforpopulationstratificationinassociationstudies