Cargando…

Estimating disease prevalence in large datasets using genetic risk scores

Clinical classification is essential for estimating disease prevalence but is difficult, often requiring complex investigations. The widespread availability of population level genetic data makes novel genetic stratification techniques a highly attractive alternative. We propose a generalizable math...

Descripción completa

Detalles Bibliográficos
Autores principales: Evans, Benjamin D., Słowiński, Piotr, Hattersley, Andrew T., Jones, Samuel E., Sharp, Seth, Kimmitt, Robert A., Weedon, Michael N., Oram, Richard A., Tsaneva-Atanasova, Krasimira, Thomas, Nicholas J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8575951/
https://www.ncbi.nlm.nih.gov/pubmed/34750397
http://dx.doi.org/10.1038/s41467-021-26501-7
_version_ 1784595782242926592
author Evans, Benjamin D.
Słowiński, Piotr
Hattersley, Andrew T.
Jones, Samuel E.
Sharp, Seth
Kimmitt, Robert A.
Weedon, Michael N.
Oram, Richard A.
Tsaneva-Atanasova, Krasimira
Thomas, Nicholas J.
author_facet Evans, Benjamin D.
Słowiński, Piotr
Hattersley, Andrew T.
Jones, Samuel E.
Sharp, Seth
Kimmitt, Robert A.
Weedon, Michael N.
Oram, Richard A.
Tsaneva-Atanasova, Krasimira
Thomas, Nicholas J.
author_sort Evans, Benjamin D.
collection PubMed
description Clinical classification is essential for estimating disease prevalence but is difficult, often requiring complex investigations. The widespread availability of population level genetic data makes novel genetic stratification techniques a highly attractive alternative. We propose a generalizable mathematical framework for determining disease prevalence within a cohort using genetic risk scores. We compare and evaluate methods based on the means of genetic risk scores’ distributions; the Earth Mover’s Distance between distributions; a linear combination of kernel density estimates of distributions; and an Excess method. We demonstrate the performance of genetic stratification to produce robust prevalence estimates. Specifically, we show that robust estimates of prevalence are still possible even with rarer diseases, smaller cohort sizes and less discriminative genetic risk scores, highlighting the general utility of these approaches. Genetic stratification techniques offer exciting new research tools, enabling unbiased insights into disease prevalence and clinical characteristics unhampered by clinical classification criteria.
format Online
Article
Text
id pubmed-8575951
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-85759512021-11-19 Estimating disease prevalence in large datasets using genetic risk scores Evans, Benjamin D. Słowiński, Piotr Hattersley, Andrew T. Jones, Samuel E. Sharp, Seth Kimmitt, Robert A. Weedon, Michael N. Oram, Richard A. Tsaneva-Atanasova, Krasimira Thomas, Nicholas J. Nat Commun Article Clinical classification is essential for estimating disease prevalence but is difficult, often requiring complex investigations. The widespread availability of population level genetic data makes novel genetic stratification techniques a highly attractive alternative. We propose a generalizable mathematical framework for determining disease prevalence within a cohort using genetic risk scores. We compare and evaluate methods based on the means of genetic risk scores’ distributions; the Earth Mover’s Distance between distributions; a linear combination of kernel density estimates of distributions; and an Excess method. We demonstrate the performance of genetic stratification to produce robust prevalence estimates. Specifically, we show that robust estimates of prevalence are still possible even with rarer diseases, smaller cohort sizes and less discriminative genetic risk scores, highlighting the general utility of these approaches. Genetic stratification techniques offer exciting new research tools, enabling unbiased insights into disease prevalence and clinical characteristics unhampered by clinical classification criteria. Nature Publishing Group UK 2021-11-08 /pmc/articles/PMC8575951/ /pubmed/34750397 http://dx.doi.org/10.1038/s41467-021-26501-7 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Evans, Benjamin D.
Słowiński, Piotr
Hattersley, Andrew T.
Jones, Samuel E.
Sharp, Seth
Kimmitt, Robert A.
Weedon, Michael N.
Oram, Richard A.
Tsaneva-Atanasova, Krasimira
Thomas, Nicholas J.
Estimating disease prevalence in large datasets using genetic risk scores
title Estimating disease prevalence in large datasets using genetic risk scores
title_full Estimating disease prevalence in large datasets using genetic risk scores
title_fullStr Estimating disease prevalence in large datasets using genetic risk scores
title_full_unstemmed Estimating disease prevalence in large datasets using genetic risk scores
title_short Estimating disease prevalence in large datasets using genetic risk scores
title_sort estimating disease prevalence in large datasets using genetic risk scores
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8575951/
https://www.ncbi.nlm.nih.gov/pubmed/34750397
http://dx.doi.org/10.1038/s41467-021-26501-7
work_keys_str_mv AT evansbenjamind estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores
AT słowinskipiotr estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores
AT hattersleyandrewt estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores
AT jonessamuele estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores
AT sharpseth estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores
AT kimmittroberta estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores
AT weedonmichaeln estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores
AT oramricharda estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores
AT tsanevaatanasovakrasimira estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores
AT thomasnicholasj estimatingdiseaseprevalenceinlargedatasetsusinggeneticriskscores