Cargando…

Determining multiallelic complex copy number and sequence variation from high coverage exome sequencing data

BACKGROUND: Copy number variation (CNV) is a major component of genomic variation, yet methods to accurately type genomic CNV lag behind methods that type single nucleotide variation. High-throughput sequencing can contribute to these methods by using sequence read depth, which takes the number of r...

Descripción completa

Detalles Bibliográficos
Autores principales: Forni, Diego, Martin, Diana, Abujaber, Razan, Sharp, Andrew J., Sironi, Manuela, Hollox, Edward J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4630827/
https://www.ncbi.nlm.nih.gov/pubmed/26526070
http://dx.doi.org/10.1186/s12864-015-2123-y
_version_ 1782398773812527104
author Forni, Diego
Martin, Diana
Abujaber, Razan
Sharp, Andrew J.
Sironi, Manuela
Hollox, Edward J.
author_facet Forni, Diego
Martin, Diana
Abujaber, Razan
Sharp, Andrew J.
Sironi, Manuela
Hollox, Edward J.
author_sort Forni, Diego
collection PubMed
description BACKGROUND: Copy number variation (CNV) is a major component of genomic variation, yet methods to accurately type genomic CNV lag behind methods that type single nucleotide variation. High-throughput sequencing can contribute to these methods by using sequence read depth, which takes the number of reads that map to a given part of the reference genome as a proxy for copy number of that region, and compares across samples. Furthermore, high-throughput sequencing also provides information on the sequence differences between copies within and between individuals. METHODS: In this study we use high-coverage phase 3 exome sequences of the 1000 Genomes project to infer diploid copy number of the beta-defensin genomic region, a well-studied CNV that carries several beta-defensin genes involved in the antimicrobial response, signalling, and fertility. We also use these data to call sequence variants, a particular challenge given the multicopy nature of the region. RESULTS: We confidently call copy number and sequence variation of the beta-defensin genes on 1285 samples from 26 global populations, validate copy number using Nanostring nCounter and triplex paralogue ratio test data. We use the copy number calls to verify the genomic extent of the CNV and validate sequence calls using analysis of cloned PCR products. We identify novel variation, mostly individually rare, predicted to alter amino-acid sequence in the beta-defensin genes. Such novel variants may alter antimicrobial properties or have off-target receptor interactions, and may contribute to individuality in immunological response and fertility. CONCLUSIONS: Given that 81 % of identified sequence variants were not previously in dbSNP, we show that sequence variation in multiallelic CNVs represent an unappreciated source of genomic diversity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2123-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4630827
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46308272015-11-03 Determining multiallelic complex copy number and sequence variation from high coverage exome sequencing data Forni, Diego Martin, Diana Abujaber, Razan Sharp, Andrew J. Sironi, Manuela Hollox, Edward J. BMC Genomics Research Article BACKGROUND: Copy number variation (CNV) is a major component of genomic variation, yet methods to accurately type genomic CNV lag behind methods that type single nucleotide variation. High-throughput sequencing can contribute to these methods by using sequence read depth, which takes the number of reads that map to a given part of the reference genome as a proxy for copy number of that region, and compares across samples. Furthermore, high-throughput sequencing also provides information on the sequence differences between copies within and between individuals. METHODS: In this study we use high-coverage phase 3 exome sequences of the 1000 Genomes project to infer diploid copy number of the beta-defensin genomic region, a well-studied CNV that carries several beta-defensin genes involved in the antimicrobial response, signalling, and fertility. We also use these data to call sequence variants, a particular challenge given the multicopy nature of the region. RESULTS: We confidently call copy number and sequence variation of the beta-defensin genes on 1285 samples from 26 global populations, validate copy number using Nanostring nCounter and triplex paralogue ratio test data. We use the copy number calls to verify the genomic extent of the CNV and validate sequence calls using analysis of cloned PCR products. We identify novel variation, mostly individually rare, predicted to alter amino-acid sequence in the beta-defensin genes. Such novel variants may alter antimicrobial properties or have off-target receptor interactions, and may contribute to individuality in immunological response and fertility. CONCLUSIONS: Given that 81 % of identified sequence variants were not previously in dbSNP, we show that sequence variation in multiallelic CNVs represent an unappreciated source of genomic diversity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2123-y) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-02 /pmc/articles/PMC4630827/ /pubmed/26526070 http://dx.doi.org/10.1186/s12864-015-2123-y Text en © Forni et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Forni, Diego
Martin, Diana
Abujaber, Razan
Sharp, Andrew J.
Sironi, Manuela
Hollox, Edward J.
Determining multiallelic complex copy number and sequence variation from high coverage exome sequencing data
title Determining multiallelic complex copy number and sequence variation from high coverage exome sequencing data
title_full Determining multiallelic complex copy number and sequence variation from high coverage exome sequencing data
title_fullStr Determining multiallelic complex copy number and sequence variation from high coverage exome sequencing data
title_full_unstemmed Determining multiallelic complex copy number and sequence variation from high coverage exome sequencing data
title_short Determining multiallelic complex copy number and sequence variation from high coverage exome sequencing data
title_sort determining multiallelic complex copy number and sequence variation from high coverage exome sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4630827/
https://www.ncbi.nlm.nih.gov/pubmed/26526070
http://dx.doi.org/10.1186/s12864-015-2123-y
work_keys_str_mv AT fornidiego determiningmultialleliccomplexcopynumberandsequencevariationfromhighcoverageexomesequencingdata
AT martindiana determiningmultialleliccomplexcopynumberandsequencevariationfromhighcoverageexomesequencingdata
AT abujaberrazan determiningmultialleliccomplexcopynumberandsequencevariationfromhighcoverageexomesequencingdata
AT sharpandrewj determiningmultialleliccomplexcopynumberandsequencevariationfromhighcoverageexomesequencingdata
AT sironimanuela determiningmultialleliccomplexcopynumberandsequencevariationfromhighcoverageexomesequencingdata
AT holloxedwardj determiningmultialleliccomplexcopynumberandsequencevariationfromhighcoverageexomesequencingdata