Cargando…

Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads

The study of genomic regions that contain gene copies and structural variation is a major challenge in modern genomics. Unlike variation involving single nucleotide changes, data on the variation of copy number is difficult to collect and few tools exist for analyzing the variation between individua...

Descripción completa

Detalles Bibliográficos
Autores principales:	Luo, Shishi, Yu, Jane A., Song, Yun S.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5025152/ https://www.ncbi.nlm.nih.gov/pubmed/27632220 http://dx.doi.org/10.1371/journal.pcbi.1005117

_version_	1782453908380057600
author	Luo, Shishi Yu, Jane A. Song, Yun S.
author_facet	Luo, Shishi Yu, Jane A. Song, Yun S.
author_sort	Luo, Shishi
collection	PubMed
description	The study of genomic regions that contain gene copies and structural variation is a major challenge in modern genomics. Unlike variation involving single nucleotide changes, data on the variation of copy number is difficult to collect and few tools exist for analyzing the variation between individuals. The immunoglobulin heavy variable (IGHV) locus, which plays an integral role in the adaptive immune response, is an example of a complex genomic region that varies in gene copy number. Lack of standard methods to genotype this region prevents it from being included in association studies and is holding back the growing field of antibody repertoire analysis. Here we develop a method that takes short reads from high-throughput sequencing and outputs a genetic profile of the IGHV locus with the read coverage depth and a putative nucleotide sequence for each operationally defined gene cluster. Our operationally defined gene clusters aim to address a major challenge in studying the IGHV locus: the high sequence similarity between gene segments in different genomic locations. Tests on simulated data demonstrate that our approach can accurately determine the presence or absence of a gene cluster from reads as short as 70 bp. More detailed resolution on the copy number of gene clusters can be obtained from read coverage depth using longer reads (e.g., ≥ 100 bp). Detail at the nucleotide resolution of single copy genes (genes present in one copy per haplotype) can be determined with 250 bp reads. For IGHV genes with more than one copy, accurate nucleotide-resolution reconstruction is currently beyond the means of our approach. When applied to a family of European ancestry, our pipeline outputs genotypes that are consistent with the family pedigree, confirms existing multigene variants and suggests new copy number variants. This study paves the way for analyzing population-level patterns of variation in IGHV gene clusters in larger diverse datasets and for quantitatively handling regions of copy number variation in other structurally varying and complex loci.
format	Online Article Text
id	pubmed-5025152
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-50251522016-09-27 Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads Luo, Shishi Yu, Jane A. Song, Yun S. PLoS Comput Biol Research Article The study of genomic regions that contain gene copies and structural variation is a major challenge in modern genomics. Unlike variation involving single nucleotide changes, data on the variation of copy number is difficult to collect and few tools exist for analyzing the variation between individuals. The immunoglobulin heavy variable (IGHV) locus, which plays an integral role in the adaptive immune response, is an example of a complex genomic region that varies in gene copy number. Lack of standard methods to genotype this region prevents it from being included in association studies and is holding back the growing field of antibody repertoire analysis. Here we develop a method that takes short reads from high-throughput sequencing and outputs a genetic profile of the IGHV locus with the read coverage depth and a putative nucleotide sequence for each operationally defined gene cluster. Our operationally defined gene clusters aim to address a major challenge in studying the IGHV locus: the high sequence similarity between gene segments in different genomic locations. Tests on simulated data demonstrate that our approach can accurately determine the presence or absence of a gene cluster from reads as short as 70 bp. More detailed resolution on the copy number of gene clusters can be obtained from read coverage depth using longer reads (e.g., ≥ 100 bp). Detail at the nucleotide resolution of single copy genes (genes present in one copy per haplotype) can be determined with 250 bp reads. For IGHV genes with more than one copy, accurate nucleotide-resolution reconstruction is currently beyond the means of our approach. When applied to a family of European ancestry, our pipeline outputs genotypes that are consistent with the family pedigree, confirms existing multigene variants and suggests new copy number variants. This study paves the way for analyzing population-level patterns of variation in IGHV gene clusters in larger diverse datasets and for quantitatively handling regions of copy number variation in other structurally varying and complex loci. Public Library of Science 2016-09-15 /pmc/articles/PMC5025152/ /pubmed/27632220 http://dx.doi.org/10.1371/journal.pcbi.1005117 Text en © 2016 Luo et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Luo, Shishi Yu, Jane A. Song, Yun S. Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads
title	Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads
title_full	Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads
title_fullStr	Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads
title_full_unstemmed	Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads
title_short	Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads
title_sort	estimating copy number and allelic variation at the immunoglobulin heavy chain locus using short reads
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5025152/ https://www.ncbi.nlm.nih.gov/pubmed/27632220 http://dx.doi.org/10.1371/journal.pcbi.1005117
work_keys_str_mv	AT luoshishi estimatingcopynumberandallelicvariationattheimmunoglobulinheavychainlocususingshortreads AT yujanea estimatingcopynumberandallelicvariationattheimmunoglobulinheavychainlocususingshortreads AT songyuns estimatingcopynumberandallelicvariationattheimmunoglobulinheavychainlocususingshortreads

Estimating Copy Number and Allelic Variation at the Immunoglobulin Heavy Chain Locus Using Short Reads

Ejemplares similares