Cargando…

Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores

BACKGROUND: Inherited susceptibility to common, complex diseases may be caused by rare, pathogenic variants (“monogenic”) or by the cumulative effect of numerous common variants (“polygenic”). Comprehensive genome interpretation should enable assessment for both monogenic and polygenic components of...

Descripción completa

Detalles Bibliográficos
Autores principales: Homburger, Julian R., Neben, Cynthia L., Mishne, Gilad, Zhou, Alicia Y., Kathiresan, Sekar, Khera, Amit V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6880438/
https://www.ncbi.nlm.nih.gov/pubmed/31771638
http://dx.doi.org/10.1186/s13073-019-0682-2
_version_ 1783473760626540544
author Homburger, Julian R.
Neben, Cynthia L.
Mishne, Gilad
Zhou, Alicia Y.
Kathiresan, Sekar
Khera, Amit V.
author_facet Homburger, Julian R.
Neben, Cynthia L.
Mishne, Gilad
Zhou, Alicia Y.
Kathiresan, Sekar
Khera, Amit V.
author_sort Homburger, Julian R.
collection PubMed
description BACKGROUND: Inherited susceptibility to common, complex diseases may be caused by rare, pathogenic variants (“monogenic”) or by the cumulative effect of numerous common variants (“polygenic”). Comprehensive genome interpretation should enable assessment for both monogenic and polygenic components of inherited risk. The traditional approach requires two distinct genetic testing technologies—high coverage sequencing of known genes to detect monogenic variants and a genome-wide genotyping array followed by imputation to calculate genome-wide polygenic scores (GPSs). We assessed the feasibility and accuracy of using low coverage whole genome sequencing (lcWGS) as an alternative to genotyping arrays to calculate GPSs. METHODS: First, we performed downsampling and imputation of WGS data from ten individuals to assess concordance with known genotypes. Second, we assessed the correlation between GPSs for 3 common diseases—coronary artery disease (CAD), breast cancer (BC), and atrial fibrillation (AF)—calculated using lcWGS and genotyping array in 184 samples. Third, we assessed concordance of lcWGS-based genotype calls and GPS calculation in 120 individuals with known genotypes, selected to reflect diverse ancestral backgrounds. Fourth, we assessed the relationship between GPSs calculated using lcWGS and disease phenotypes in a cohort of 11,502 individuals of European ancestry. RESULTS: We found imputation accuracy r(2) values of greater than 0.90 for all ten samples—including those of African and Ashkenazi Jewish ancestry—with lcWGS data at 0.5×. GPSs calculated using lcWGS and genotyping array followed by imputation in 184 individuals were highly correlated for each of the 3 common diseases (r(2) = 0.93–0.97) with similar score distributions. Using lcWGS data from 120 individuals of diverse ancestral backgrounds, we found similar results with respect to imputation accuracy and GPS correlations. Finally, we calculated GPSs for CAD, BC, and AF using lcWGS in 11,502 individuals of European ancestry, confirming odds ratios per standard deviation increment ranging 1.28 to 1.59, consistent with previous studies. CONCLUSIONS: lcWGS is an alternative technology to genotyping arrays for common genetic variant assessment and GPS calculation. lcWGS provides comparable imputation accuracy while also overcoming the ascertainment bias inherent to variant selection in genotyping array design.
format Online
Article
Text
id pubmed-6880438
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68804382019-11-29 Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores Homburger, Julian R. Neben, Cynthia L. Mishne, Gilad Zhou, Alicia Y. Kathiresan, Sekar Khera, Amit V. Genome Med Research BACKGROUND: Inherited susceptibility to common, complex diseases may be caused by rare, pathogenic variants (“monogenic”) or by the cumulative effect of numerous common variants (“polygenic”). Comprehensive genome interpretation should enable assessment for both monogenic and polygenic components of inherited risk. The traditional approach requires two distinct genetic testing technologies—high coverage sequencing of known genes to detect monogenic variants and a genome-wide genotyping array followed by imputation to calculate genome-wide polygenic scores (GPSs). We assessed the feasibility and accuracy of using low coverage whole genome sequencing (lcWGS) as an alternative to genotyping arrays to calculate GPSs. METHODS: First, we performed downsampling and imputation of WGS data from ten individuals to assess concordance with known genotypes. Second, we assessed the correlation between GPSs for 3 common diseases—coronary artery disease (CAD), breast cancer (BC), and atrial fibrillation (AF)—calculated using lcWGS and genotyping array in 184 samples. Third, we assessed concordance of lcWGS-based genotype calls and GPS calculation in 120 individuals with known genotypes, selected to reflect diverse ancestral backgrounds. Fourth, we assessed the relationship between GPSs calculated using lcWGS and disease phenotypes in a cohort of 11,502 individuals of European ancestry. RESULTS: We found imputation accuracy r(2) values of greater than 0.90 for all ten samples—including those of African and Ashkenazi Jewish ancestry—with lcWGS data at 0.5×. GPSs calculated using lcWGS and genotyping array followed by imputation in 184 individuals were highly correlated for each of the 3 common diseases (r(2) = 0.93–0.97) with similar score distributions. Using lcWGS data from 120 individuals of diverse ancestral backgrounds, we found similar results with respect to imputation accuracy and GPS correlations. Finally, we calculated GPSs for CAD, BC, and AF using lcWGS in 11,502 individuals of European ancestry, confirming odds ratios per standard deviation increment ranging 1.28 to 1.59, consistent with previous studies. CONCLUSIONS: lcWGS is an alternative technology to genotyping arrays for common genetic variant assessment and GPS calculation. lcWGS provides comparable imputation accuracy while also overcoming the ascertainment bias inherent to variant selection in genotyping array design. BioMed Central 2019-11-26 /pmc/articles/PMC6880438/ /pubmed/31771638 http://dx.doi.org/10.1186/s13073-019-0682-2 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Homburger, Julian R.
Neben, Cynthia L.
Mishne, Gilad
Zhou, Alicia Y.
Kathiresan, Sekar
Khera, Amit V.
Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores
title Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores
title_full Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores
title_fullStr Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores
title_full_unstemmed Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores
title_short Low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores
title_sort low coverage whole genome sequencing enables accurate assessment of common variants and calculation of genome-wide polygenic scores
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6880438/
https://www.ncbi.nlm.nih.gov/pubmed/31771638
http://dx.doi.org/10.1186/s13073-019-0682-2
work_keys_str_mv AT homburgerjulianr lowcoveragewholegenomesequencingenablesaccurateassessmentofcommonvariantsandcalculationofgenomewidepolygenicscores
AT nebencynthial lowcoveragewholegenomesequencingenablesaccurateassessmentofcommonvariantsandcalculationofgenomewidepolygenicscores
AT mishnegilad lowcoveragewholegenomesequencingenablesaccurateassessmentofcommonvariantsandcalculationofgenomewidepolygenicscores
AT zhoualiciay lowcoveragewholegenomesequencingenablesaccurateassessmentofcommonvariantsandcalculationofgenomewidepolygenicscores
AT kathiresansekar lowcoveragewholegenomesequencingenablesaccurateassessmentofcommonvariantsandcalculationofgenomewidepolygenicscores
AT kheraamitv lowcoveragewholegenomesequencingenablesaccurateassessmentofcommonvariantsandcalculationofgenomewidepolygenicscores