Cargando…

Population‐wide copy number variation calling using variant call format files from 6,898 individuals

Copy number variants (CNVs) play an important role in a number of human diseases, but the accurate calling of CNVs remains challenging. Most current approaches to CNV detection use raw read alignments, which are computationally intensive to process. We use a regression tree‐based approach to call ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Png, Grace, Suveges, Daniel, Park, Young‐Chan, Walter, Klaudia, Kundu, Kousik, Ntalla, Ioanna, Tsafantakis, Emmanouil, Karaleftheri, Maria, Dedoussis, George, Zeggini, Eleftheria, Gilly, Arthur
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8653900/
https://www.ncbi.nlm.nih.gov/pubmed/31520489
http://dx.doi.org/10.1002/gepi.22260
_version_ 1784611762270633984
author Png, Grace
Suveges, Daniel
Park, Young‐Chan
Walter, Klaudia
Kundu, Kousik
Ntalla, Ioanna
Tsafantakis, Emmanouil
Karaleftheri, Maria
Dedoussis, George
Zeggini, Eleftheria
Gilly, Arthur
author_facet Png, Grace
Suveges, Daniel
Park, Young‐Chan
Walter, Klaudia
Kundu, Kousik
Ntalla, Ioanna
Tsafantakis, Emmanouil
Karaleftheri, Maria
Dedoussis, George
Zeggini, Eleftheria
Gilly, Arthur
author_sort Png, Grace
collection PubMed
description Copy number variants (CNVs) play an important role in a number of human diseases, but the accurate calling of CNVs remains challenging. Most current approaches to CNV detection use raw read alignments, which are computationally intensive to process. We use a regression tree‐based approach to call germline CNVs from whole‐genome sequencing (WGS, >18x) variant call sets in 6,898 samples across four European cohorts, and describe a rich large variation landscape comprising 1,320 CNVs. Eighty‐one percent of detected events have been previously reported in the Database of Genomic Variants. Twenty‐three percent of high‐quality deletions affect entire genes, and we recapitulate known events such as the GSTM1 and RHD gene deletions. We test for association between the detected deletions and 275 protein levels in 1,457 individuals to assess the potential clinical impact of the detected CNVs. We describe complex CNV patterns underlying an association with levels of the CCL3 protein (MAF = 0.15, p = 3.6x10(−12)) at the CCL3L3 locus, and a novel cis‐association between a low‐frequency NOMO1 deletion and NOMO1 protein levels (MAF = 0.02, p = 2.2x10(−7)). This study demonstrates that existing population‐wide WGS call sets can be mined for germline CNVs with minimal computational overhead, delivering insight into a less well‐studied, yet potentially impactful class of genetic variant.
format Online
Article
Text
id pubmed-8653900
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-86539002021-12-20 Population‐wide copy number variation calling using variant call format files from 6,898 individuals Png, Grace Suveges, Daniel Park, Young‐Chan Walter, Klaudia Kundu, Kousik Ntalla, Ioanna Tsafantakis, Emmanouil Karaleftheri, Maria Dedoussis, George Zeggini, Eleftheria Gilly, Arthur Genet Epidemiol Research Articles Copy number variants (CNVs) play an important role in a number of human diseases, but the accurate calling of CNVs remains challenging. Most current approaches to CNV detection use raw read alignments, which are computationally intensive to process. We use a regression tree‐based approach to call germline CNVs from whole‐genome sequencing (WGS, >18x) variant call sets in 6,898 samples across four European cohorts, and describe a rich large variation landscape comprising 1,320 CNVs. Eighty‐one percent of detected events have been previously reported in the Database of Genomic Variants. Twenty‐three percent of high‐quality deletions affect entire genes, and we recapitulate known events such as the GSTM1 and RHD gene deletions. We test for association between the detected deletions and 275 protein levels in 1,457 individuals to assess the potential clinical impact of the detected CNVs. We describe complex CNV patterns underlying an association with levels of the CCL3 protein (MAF = 0.15, p = 3.6x10(−12)) at the CCL3L3 locus, and a novel cis‐association between a low‐frequency NOMO1 deletion and NOMO1 protein levels (MAF = 0.02, p = 2.2x10(−7)). This study demonstrates that existing population‐wide WGS call sets can be mined for germline CNVs with minimal computational overhead, delivering insight into a less well‐studied, yet potentially impactful class of genetic variant. John Wiley and Sons Inc. 2019-09-14 2020-01 /pmc/articles/PMC8653900/ /pubmed/31520489 http://dx.doi.org/10.1002/gepi.22260 Text en © 2019 The Authors. Genetic Epidemiology published by Wiley Periodicals, Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Png, Grace
Suveges, Daniel
Park, Young‐Chan
Walter, Klaudia
Kundu, Kousik
Ntalla, Ioanna
Tsafantakis, Emmanouil
Karaleftheri, Maria
Dedoussis, George
Zeggini, Eleftheria
Gilly, Arthur
Population‐wide copy number variation calling using variant call format files from 6,898 individuals
title Population‐wide copy number variation calling using variant call format files from 6,898 individuals
title_full Population‐wide copy number variation calling using variant call format files from 6,898 individuals
title_fullStr Population‐wide copy number variation calling using variant call format files from 6,898 individuals
title_full_unstemmed Population‐wide copy number variation calling using variant call format files from 6,898 individuals
title_short Population‐wide copy number variation calling using variant call format files from 6,898 individuals
title_sort population‐wide copy number variation calling using variant call format files from 6,898 individuals
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8653900/
https://www.ncbi.nlm.nih.gov/pubmed/31520489
http://dx.doi.org/10.1002/gepi.22260
work_keys_str_mv AT pnggrace populationwidecopynumbervariationcallingusingvariantcallformatfilesfrom6898individuals
AT suvegesdaniel populationwidecopynumbervariationcallingusingvariantcallformatfilesfrom6898individuals
AT parkyoungchan populationwidecopynumbervariationcallingusingvariantcallformatfilesfrom6898individuals
AT walterklaudia populationwidecopynumbervariationcallingusingvariantcallformatfilesfrom6898individuals
AT kundukousik populationwidecopynumbervariationcallingusingvariantcallformatfilesfrom6898individuals
AT ntallaioanna populationwidecopynumbervariationcallingusingvariantcallformatfilesfrom6898individuals
AT tsafantakisemmanouil populationwidecopynumbervariationcallingusingvariantcallformatfilesfrom6898individuals
AT karaleftherimaria populationwidecopynumbervariationcallingusingvariantcallformatfilesfrom6898individuals
AT dedoussisgeorge populationwidecopynumbervariationcallingusingvariantcallformatfilesfrom6898individuals
AT zegginieleftheria populationwidecopynumbervariationcallingusingvariantcallformatfilesfrom6898individuals
AT gillyarthur populationwidecopynumbervariationcallingusingvariantcallformatfilesfrom6898individuals