Cargando…

Single haplotype assembly of the human genome from a hydatidiform mole

A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive...

Descripción completa

Detalles Bibliográficos
Autores principales: Steinberg, Karyn Meltz, Schneider, Valerie A., Graves-Lindsay, Tina A., Fulton, Robert S., Agarwala, Richa, Huddleston, John, Shiryev, Sergey A., Morgulis, Aleksandr, Surti, Urvashi, Warren, Wesley C., Church, Deanna M., Eichler, Evan E., Wilson, Richard K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4248323/
https://www.ncbi.nlm.nih.gov/pubmed/25373144
http://dx.doi.org/10.1101/gr.180893.114
_version_ 1782346779221557248
author Steinberg, Karyn Meltz
Schneider, Valerie A.
Graves-Lindsay, Tina A.
Fulton, Robert S.
Agarwala, Richa
Huddleston, John
Shiryev, Sergey A.
Morgulis, Aleksandr
Surti, Urvashi
Warren, Wesley C.
Church, Deanna M.
Eichler, Evan E.
Wilson, Richard K.
author_facet Steinberg, Karyn Meltz
Schneider, Valerie A.
Graves-Lindsay, Tina A.
Fulton, Robert S.
Agarwala, Richa
Huddleston, John
Shiryev, Sergey A.
Morgulis, Aleksandr
Surti, Urvashi
Warren, Wesley C.
Church, Deanna M.
Eichler, Evan E.
Wilson, Richard K.
author_sort Steinberg, Karyn Meltz
collection PubMed
description A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly.
format Online
Article
Text
id pubmed-4248323
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-42483232015-06-01 Single haplotype assembly of the human genome from a hydatidiform mole Steinberg, Karyn Meltz Schneider, Valerie A. Graves-Lindsay, Tina A. Fulton, Robert S. Agarwala, Richa Huddleston, John Shiryev, Sergey A. Morgulis, Aleksandr Surti, Urvashi Warren, Wesley C. Church, Deanna M. Eichler, Evan E. Wilson, Richard K. Genome Res Resource A complete reference assembly is essential for accurately interpreting individual genomes and associating variation with phenotypes. While the current human reference genome sequence is of very high quality, gaps and misassemblies remain due to biological and technical complexities. Large repetitive sequences and complex allelic diversity are the two main drivers of assembly error. Although increasing the length of sequence reads and library fragments can improve assembly, even the longest available reads do not resolve all regions. In order to overcome the issue of allelic diversity, we used genomic DNA from an essentially haploid hydatidiform mole, CHM1. We utilized several resources from this DNA including a set of end-sequenced and indexed BAC clones and 100× Illumina whole-genome shotgun (WGS) sequence coverage. We used the WGS sequence and the GRCh37 reference assembly to create an assembly of the CHM1 genome. We subsequently incorporated 382 finished BAC clone sequences to generate a draft assembly, CHM1_1.1 (NCBI AssemblyDB GCA_000306695.2). Analysis of gene, repetitive element, and segmental duplication content show this assembly to be of excellent quality and contiguity. However, comparison to assembly-independent resources, such as BAC clone end sequences and PacBio long reads, indicate misassembled regions. Most of these regions are enriched for structural variation and segmental duplication, and can be resolved in the future. This publicly available assembly will be integrated into the Genome Reference Consortium curation framework for further improvement, with the ultimate goal being a completely finished gap-free assembly. Cold Spring Harbor Laboratory Press 2014-12 /pmc/articles/PMC4248323/ /pubmed/25373144 http://dx.doi.org/10.1101/gr.180893.114 Text en © 2014 Steinberg et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Resource
Steinberg, Karyn Meltz
Schneider, Valerie A.
Graves-Lindsay, Tina A.
Fulton, Robert S.
Agarwala, Richa
Huddleston, John
Shiryev, Sergey A.
Morgulis, Aleksandr
Surti, Urvashi
Warren, Wesley C.
Church, Deanna M.
Eichler, Evan E.
Wilson, Richard K.
Single haplotype assembly of the human genome from a hydatidiform mole
title Single haplotype assembly of the human genome from a hydatidiform mole
title_full Single haplotype assembly of the human genome from a hydatidiform mole
title_fullStr Single haplotype assembly of the human genome from a hydatidiform mole
title_full_unstemmed Single haplotype assembly of the human genome from a hydatidiform mole
title_short Single haplotype assembly of the human genome from a hydatidiform mole
title_sort single haplotype assembly of the human genome from a hydatidiform mole
topic Resource
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4248323/
https://www.ncbi.nlm.nih.gov/pubmed/25373144
http://dx.doi.org/10.1101/gr.180893.114
work_keys_str_mv AT steinbergkarynmeltz singlehaplotypeassemblyofthehumangenomefromahydatidiformmole
AT schneidervaleriea singlehaplotypeassemblyofthehumangenomefromahydatidiformmole
AT graveslindsaytinaa singlehaplotypeassemblyofthehumangenomefromahydatidiformmole
AT fultonroberts singlehaplotypeassemblyofthehumangenomefromahydatidiformmole
AT agarwalaricha singlehaplotypeassemblyofthehumangenomefromahydatidiformmole
AT huddlestonjohn singlehaplotypeassemblyofthehumangenomefromahydatidiformmole
AT shiryevsergeya singlehaplotypeassemblyofthehumangenomefromahydatidiformmole
AT morgulisaleksandr singlehaplotypeassemblyofthehumangenomefromahydatidiformmole
AT surtiurvashi singlehaplotypeassemblyofthehumangenomefromahydatidiformmole
AT warrenwesleyc singlehaplotypeassemblyofthehumangenomefromahydatidiformmole
AT churchdeannam singlehaplotypeassemblyofthehumangenomefromahydatidiformmole
AT eichlerevane singlehaplotypeassemblyofthehumangenomefromahydatidiformmole
AT wilsonrichardk singlehaplotypeassemblyofthehumangenomefromahydatidiformmole