Cargando…

Using de novo assembly to identify structural variation of eight complex immune system gene regions

Driven by the necessity to survive environmental pathogens, the human immune system has evolved exceptional diversity and plasticity, to which several factors contribute including inheritable structural polymorphism of the underlying genes. Characterizing this variation is challenging due to the com...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Jia-Yuan, Roberts, Hannah, Flores, David S. C., Cutler, Antony J., Brown, Andrew C., Whalley, Justin P., Mielczarek, Olga, Buck, David, Lockstone, Helen, Xella, Barbara, Oliver, Karen, Corton, Craig, Betteridge, Emma, Bashford-Rogers, Rachael, Knight, Julian C., Todd, John A., Band, Gavin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8363018/
https://www.ncbi.nlm.nih.gov/pubmed/34343164
http://dx.doi.org/10.1371/journal.pcbi.1009254
_version_ 1783738284039471104
author Zhang, Jia-Yuan
Roberts, Hannah
Flores, David S. C.
Cutler, Antony J.
Brown, Andrew C.
Whalley, Justin P.
Mielczarek, Olga
Buck, David
Lockstone, Helen
Xella, Barbara
Oliver, Karen
Corton, Craig
Betteridge, Emma
Bashford-Rogers, Rachael
Knight, Julian C.
Todd, John A.
Band, Gavin
author_facet Zhang, Jia-Yuan
Roberts, Hannah
Flores, David S. C.
Cutler, Antony J.
Brown, Andrew C.
Whalley, Justin P.
Mielczarek, Olga
Buck, David
Lockstone, Helen
Xella, Barbara
Oliver, Karen
Corton, Craig
Betteridge, Emma
Bashford-Rogers, Rachael
Knight, Julian C.
Todd, John A.
Band, Gavin
author_sort Zhang, Jia-Yuan
collection PubMed
description Driven by the necessity to survive environmental pathogens, the human immune system has evolved exceptional diversity and plasticity, to which several factors contribute including inheritable structural polymorphism of the underlying genes. Characterizing this variation is challenging due to the complexity of these loci, which contain extensive regions of paralogy, segmental duplication and high copy-number repeats, but recent progress in long-read sequencing and optical mapping techniques suggests this problem may now be tractable. Here we assess this by using long-read sequencing platforms from PacBio and Oxford Nanopore, supplemented with short-read sequencing and Bionano optical mapping, to sequence DNA extracted from CD14(+) monocytes and peripheral blood mononuclear cells from a single European individual identified as HV31. We use this data to build a de novo assembly of eight genomic regions encoding four key components of the immune system, namely the human leukocyte antigen, immunoglobulins, T cell receptors, and killer-cell immunoglobulin-like receptors. Validation of our assembly using k-mer based and alignment approaches suggests that it has high accuracy, with estimated base-level error rates below 1 in 10 kb, although we identify a small number of remaining structural errors. We use the assembly to identify heterozygous and homozygous structural variation in comparison to GRCh38. Despite analyzing only a single individual, we find multiple large structural variants affecting core genes at all three immunoglobulin regions and at two of the three T cell receptor regions. Several of these variants are not accurately callable using current algorithms, implying that further methodological improvements are needed. Our results demonstrate that assessing haplotype variation in these regions is possible given sufficiently accurate long-read and associated data. Continued reductions in the cost of these technologies will enable application of these methods to larger samples and provide a broader catalogue of germline structural variation at these loci, an important step toward making these regions accessible to large-scale genetic association studies.
format Online
Article
Text
id pubmed-8363018
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-83630182021-08-14 Using de novo assembly to identify structural variation of eight complex immune system gene regions Zhang, Jia-Yuan Roberts, Hannah Flores, David S. C. Cutler, Antony J. Brown, Andrew C. Whalley, Justin P. Mielczarek, Olga Buck, David Lockstone, Helen Xella, Barbara Oliver, Karen Corton, Craig Betteridge, Emma Bashford-Rogers, Rachael Knight, Julian C. Todd, John A. Band, Gavin PLoS Comput Biol Research Article Driven by the necessity to survive environmental pathogens, the human immune system has evolved exceptional diversity and plasticity, to which several factors contribute including inheritable structural polymorphism of the underlying genes. Characterizing this variation is challenging due to the complexity of these loci, which contain extensive regions of paralogy, segmental duplication and high copy-number repeats, but recent progress in long-read sequencing and optical mapping techniques suggests this problem may now be tractable. Here we assess this by using long-read sequencing platforms from PacBio and Oxford Nanopore, supplemented with short-read sequencing and Bionano optical mapping, to sequence DNA extracted from CD14(+) monocytes and peripheral blood mononuclear cells from a single European individual identified as HV31. We use this data to build a de novo assembly of eight genomic regions encoding four key components of the immune system, namely the human leukocyte antigen, immunoglobulins, T cell receptors, and killer-cell immunoglobulin-like receptors. Validation of our assembly using k-mer based and alignment approaches suggests that it has high accuracy, with estimated base-level error rates below 1 in 10 kb, although we identify a small number of remaining structural errors. We use the assembly to identify heterozygous and homozygous structural variation in comparison to GRCh38. Despite analyzing only a single individual, we find multiple large structural variants affecting core genes at all three immunoglobulin regions and at two of the three T cell receptor regions. Several of these variants are not accurately callable using current algorithms, implying that further methodological improvements are needed. Our results demonstrate that assessing haplotype variation in these regions is possible given sufficiently accurate long-read and associated data. Continued reductions in the cost of these technologies will enable application of these methods to larger samples and provide a broader catalogue of germline structural variation at these loci, an important step toward making these regions accessible to large-scale genetic association studies. Public Library of Science 2021-08-03 /pmc/articles/PMC8363018/ /pubmed/34343164 http://dx.doi.org/10.1371/journal.pcbi.1009254 Text en © 2021 Zhang et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhang, Jia-Yuan
Roberts, Hannah
Flores, David S. C.
Cutler, Antony J.
Brown, Andrew C.
Whalley, Justin P.
Mielczarek, Olga
Buck, David
Lockstone, Helen
Xella, Barbara
Oliver, Karen
Corton, Craig
Betteridge, Emma
Bashford-Rogers, Rachael
Knight, Julian C.
Todd, John A.
Band, Gavin
Using de novo assembly to identify structural variation of eight complex immune system gene regions
title Using de novo assembly to identify structural variation of eight complex immune system gene regions
title_full Using de novo assembly to identify structural variation of eight complex immune system gene regions
title_fullStr Using de novo assembly to identify structural variation of eight complex immune system gene regions
title_full_unstemmed Using de novo assembly to identify structural variation of eight complex immune system gene regions
title_short Using de novo assembly to identify structural variation of eight complex immune system gene regions
title_sort using de novo assembly to identify structural variation of eight complex immune system gene regions
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8363018/
https://www.ncbi.nlm.nih.gov/pubmed/34343164
http://dx.doi.org/10.1371/journal.pcbi.1009254
work_keys_str_mv AT zhangjiayuan usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT robertshannah usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT floresdavidsc usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT cutlerantonyj usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT brownandrewc usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT whalleyjustinp usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT mielczarekolga usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT buckdavid usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT lockstonehelen usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT xellabarbara usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT oliverkaren usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT cortoncraig usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT betteridgeemma usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT bashfordrogersrachael usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT knightjulianc usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT toddjohna usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions
AT bandgavin usingdenovoassemblytoidentifystructuralvariationofeightcompleximmunesystemgeneregions