Cargando…
Estimating disease prevalence in large datasets using genetic risk scores
Clinical classification is essential for estimating disease prevalence but is difficult, often requiring complex investigations. The widespread availability of population level genetic data makes novel genetic stratification techniques a highly attractive alternative. We propose a generalizable math...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8575951/ https://www.ncbi.nlm.nih.gov/pubmed/34750397 http://dx.doi.org/10.1038/s41467-021-26501-7 |
_version_ | 1784595782242926592 |
---|---|
author | Evans, Benjamin D. Słowiński, Piotr Hattersley, Andrew T. Jones, Samuel E. Sharp, Seth Kimmitt, Robert A. Weedon, Michael N. Oram, Richard A. Tsaneva-Atanasova, Krasimira Thomas, Nicholas J. |
author_facet | Evans, Benjamin D. Słowiński, Piotr Hattersley, Andrew T. Jones, Samuel E. Sharp, Seth Kimmitt, Robert A. Weedon, Michael N. Oram, Richard A. Tsaneva-Atanasova, Krasimira Thomas, Nicholas J. |
author_sort | Evans, Benjamin D. |
collection | PubMed |
description | Clinical classification is essential for estimating disease prevalence but is difficult, often requiring complex investigations. The widespread availability of population level genetic data makes novel genetic stratification techniques a highly attractive alternative. We propose a generalizable mathematical framework for determining disease prevalence within a cohort using genetic risk scores. We compare and evaluate methods based on the means of genetic risk scores’ distributions; the Earth Mover’s Distance between distributions; a linear combination of kernel density estimates of distributions; and an Excess method. We demonstrate the performance of genetic stratification to produce robust prevalence estimates. Specifically, we show that robust estimates of prevalence are still possible even with rarer diseases, smaller cohort sizes and less discriminative genetic risk scores, highlighting the general utility of these approaches. Genetic stratification techniques offer exciting new research tools, enabling unbiased insights into disease prevalence and clinical characteristics unhampered by clinical classification criteria. |
format | Online Article Text |
id | pubmed-8575951 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-85759512021-11-19 Estimating disease prevalence in large datasets using genetic risk scores Evans, Benjamin D. Słowiński, Piotr Hattersley, Andrew T. Jones, Samuel E. Sharp, Seth Kimmitt, Robert A. Weedon, Michael N. Oram, Richard A. Tsaneva-Atanasova, Krasimira Thomas, Nicholas J. Nat Commun Article Clinical classification is essential for estimating disease prevalence but is difficult, often requiring complex investigations. The widespread availability of population level genetic data makes novel genetic stratification techniques a highly attractive alternative. We propose a generalizable mathematical framework for determining disease prevalence within a cohort using genetic risk scores. We compare and evaluate methods based on the means of genetic risk scores’ distributions; the Earth Mover’s Distance between distributions; a linear combination of kernel density estimates of distributions; and an Excess method. We demonstrate the performance of genetic stratification to produce robust prevalence estimates. Specifically, we show that robust estimates of prevalence are still possible even with rarer diseases, smaller cohort sizes and less discriminative genetic risk scores, highlighting the general utility of these approaches. Genetic stratification techniques offer exciting new research tools, enabling unbiased insights into disease prevalence and clinical characteristics unhampered by clinical classification criteria. Nature Publishing Group UK 2021-11-08 /pmc/articles/PMC8575951/ /pubmed/34750397 http://dx.doi.org/10.1038/s41467-021-26501-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Evans, Benjamin D. Słowiński, Piotr Hattersley, Andrew T. Jones, Samuel E. Sharp, Seth Kimmitt, Robert A. Weedon, Michael N. Oram, Richard A. Tsaneva-Atanasova, Krasimira Thomas, Nicholas J. Estimating disease prevalence in large datasets using genetic risk scores |
title | Estimating disease prevalence in large datasets using genetic risk scores |
title_full | Estimating disease prevalence in large datasets using genetic risk scores |
title_fullStr | Estimating disease prevalence in large datasets using genetic risk scores |
title_full_unstemmed | Estimating disease prevalence in large datasets using genetic risk scores |
title_short | Estimating disease prevalence in large datasets using genetic risk scores |
title_sort | estimating disease prevalence in large datasets using genetic risk scores |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8575951/ https://www.ncbi.nlm.nih.gov/pubmed/34750397 http://dx.doi.org/10.1038/s41467-021-26501-7 |
work_keys_str_mv | AT evansbenjamind estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores AT słowinskipiotr estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores AT hattersleyandrewt estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores AT jonessamuele estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores AT sharpseth estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores AT kimmittroberta estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores AT weedonmichaeln estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores AT oramricharda estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores AT tsanevaatanasovakrasimira estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores AT thomasnicholasj estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores |