Cargando…

Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads

The study of genomic regions that contain gene copies and structural variation is a major challenge in modern genomics. Unlike variation involving single nucleotide changes, data on the variation of copy number is difficult to collect and few tools exist for analyzing the variation between individua...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Shishi, Yu, Jane A., Song, Yun S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5025152/
https://www.ncbi.nlm.nih.gov/pubmed/27632220
http://dx.doi.org/10.1371/journal.pcbi.1005117
_version_ 1782453908380057600
author Luo, Shishi
Yu, Jane A.
Song, Yun S.
author_facet Luo, Shishi
Yu, Jane A.
Song, Yun S.
author_sort Luo, Shishi
collection PubMed
description The study of genomic regions that contain gene copies and structural variation is a major challenge in modern genomics. Unlike variation involving single nucleotide changes, data on the variation of copy number is difficult to collect and few tools exist for analyzing the variation between individuals. The immunoglobulin heavy variable (IGHV) locus, which plays an integral role in the adaptive immune response, is an example of a complex genomic region that varies in gene copy number. Lack of standard methods to genotype this region prevents it from being included in association studies and is holding back the growing field of antibody repertoire analysis. Here we develop a method that takes short reads from high-throughput sequencing and outputs a genetic profile of the IGHV locus with the read coverage depth and a putative nucleotide sequence for each operationally defined gene cluster. Our operationally defined gene clusters aim to address a major challenge in studying the IGHV locus: the high sequence similarity between gene segments in different genomic locations. Tests on simulated data demonstrate that our approach can accurately determine the presence or absence of a gene cluster from reads as short as 70 bp. More detailed resolution on the copy number of gene clusters can be obtained from read coverage depth using longer reads (e.g., ≥ 100 bp). Detail at the nucleotide resolution of single copy genes (genes present in one copy per haplotype) can be determined with 250 bp reads. For IGHV genes with more than one copy, accurate nucleotide-resolution reconstruction is currently beyond the means of our approach. When applied to a family of European ancestry, our pipeline outputs genotypes that are consistent with the family pedigree, confirms existing multigene variants and suggests new copy number variants. This study paves the way for analyzing population-level patterns of variation in IGHV gene clusters in larger diverse datasets and for quantitatively handling regions of copy number variation in other structurally varying and complex loci.
format Online
Article
Text
id pubmed-5025152
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-50251522016-09-27 Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads Luo, Shishi Yu, Jane A. Song, Yun S. PLoS Comput Biol Research Article The study of genomic regions that contain gene copies and structural variation is a major challenge in modern genomics. Unlike variation involving single nucleotide changes, data on the variation of copy number is difficult to collect and few tools exist for analyzing the variation between individuals. The immunoglobulin heavy variable (IGHV) locus, which plays an integral role in the adaptive immune response, is an example of a complex genomic region that varies in gene copy number. Lack of standard methods to genotype this region prevents it from being included in association studies and is holding back the growing field of antibody repertoire analysis. Here we develop a method that takes short reads from high-throughput sequencing and outputs a genetic profile of the IGHV locus with the read coverage depth and a putative nucleotide sequence for each operationally defined gene cluster. Our operationally defined gene clusters aim to address a major challenge in studying the IGHV locus: the high sequence similarity between gene segments in different genomic locations. Tests on simulated data demonstrate that our approach can accurately determine the presence or absence of a gene cluster from reads as short as 70 bp. More detailed resolution on the copy number of gene clusters can be obtained from read coverage depth using longer reads (e.g., ≥ 100 bp). Detail at the nucleotide resolution of single copy genes (genes present in one copy per haplotype) can be determined with 250 bp reads. For IGHV genes with more than one copy, accurate nucleotide-resolution reconstruction is currently beyond the means of our approach. When applied to a family of European ancestry, our pipeline outputs genotypes that are consistent with the family pedigree, confirms existing multigene variants and suggests new copy number variants. This study paves the way for analyzing population-level patterns of variation in IGHV gene clusters in larger diverse datasets and for quantitatively handling regions of copy number variation in other structurally varying and complex loci. Public Library of Science 2016-09-15 /pmc/articles/PMC5025152/ /pubmed/27632220 http://dx.doi.org/10.1371/journal.pcbi.1005117 Text en © 2016 Luo et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Luo, Shishi
Yu, Jane A.
Song, Yun S.
Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads
title Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads
title_full Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads
title_fullStr Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads
title_full_unstemmed Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads
title_short Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads
title_sort estimating copy number and allelic variation at the immunoglobulin heavy chain locus using short reads
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5025152/
https://www.ncbi.nlm.nih.gov/pubmed/27632220
http://dx.doi.org/10.1371/journal.pcbi.1005117
work_keys_str_mv AT luoshishi estimatingcopynumberandallelicvariationattheimmunoglobulinheavychainlocususingshortreads
AT yujanea estimatingcopynumberandallelicvariationattheimmunoglobulinheavychainlocususingshortreads
AT songyuns estimatingcopynumberandallelicvariationattheimmunoglobulinheavychainlocususingshortreads