Cargando…

Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly

The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Altemose, Nicolas, Miga, Karen H., Maggioni, Mauro, Willard, Huntington F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4022460/
https://www.ncbi.nlm.nih.gov/pubmed/24831296
http://dx.doi.org/10.1371/journal.pcbi.1003628
_version_ 1782316412455354368
author Altemose, Nicolas
Miga, Karen H.
Maggioni, Mauro
Willard, Huntington F.
author_facet Altemose, Nicolas
Miga, Karen H.
Maggioni, Mauro
Willard, Huntington F.
author_sort Altemose, Nicolas
collection PubMed
description The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb) and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations.
format Online
Article
Text
id pubmed-4022460
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-40224602014-05-21 Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly Altemose, Nicolas Miga, Karen H. Maggioni, Mauro Willard, Huntington F. PLoS Comput Biol Research Article The largest gaps in the human genome assembly correspond to multi-megabase heterochromatic regions composed primarily of two related families of tandem repeats, Human Satellites 2 and 3 (HSat2,3). The abundance of repetitive DNA in these regions challenges standard mapping and assembly algorithms, and as a result, the sequence composition and potential biological functions of these regions remain largely unexplored. Furthermore, existing genomic tools designed to predict consensus-based descriptions of repeat families cannot be readily applied to complex satellite repeats such as HSat2,3, which lack a consistent repeat unit reference sequence. Here we present an alignment-free method to characterize complex satellites using whole-genome shotgun read datasets. Utilizing this approach, we classify HSat2,3 sequences into fourteen subfamilies and predict their chromosomal distributions, resulting in a comprehensive satellite reference database to further enable genomic studies of heterochromatic regions. We also identify 1.3 Mb of non-repetitive sequence interspersed with HSat2,3 across 17 unmapped assembly scaffolds, including eight annotated gene predictions. Finally, we apply our satellite reference database to high-throughput sequence data from 396 males to estimate array size variation of the predominant HSat3 array on the Y chromosome, confirming that satellite array sizes can vary between individuals over an order of magnitude (7 to 98 Mb) and further demonstrating that array sizes are distributed differently within distinct Y haplogroups. In summary, we present a novel framework for generating initial reference databases for unassembled genomic regions enriched with complex satellite DNA, and we further demonstrate the utility of these reference databases for studying patterns of sequence variation within human populations. Public Library of Science 2014-05-15 /pmc/articles/PMC4022460/ /pubmed/24831296 http://dx.doi.org/10.1371/journal.pcbi.1003628 Text en © 2014 Altemose et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Altemose, Nicolas
Miga, Karen H.
Maggioni, Mauro
Willard, Huntington F.
Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly
title Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly
title_full Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly
title_fullStr Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly
title_full_unstemmed Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly
title_short Genomic Characterization of Large Heterochromatic Gaps in the Human Genome Assembly
title_sort genomic characterization of large heterochromatic gaps in the human genome assembly
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4022460/
https://www.ncbi.nlm.nih.gov/pubmed/24831296
http://dx.doi.org/10.1371/journal.pcbi.1003628
work_keys_str_mv AT altemosenicolas genomiccharacterizationoflargeheterochromaticgapsinthehumangenomeassembly
AT migakarenh genomiccharacterizationoflargeheterochromaticgapsinthehumangenomeassembly
AT maggionimauro genomiccharacterizationoflargeheterochromaticgapsinthehumangenomeassembly
AT willardhuntingtonf genomiccharacterizationoflargeheterochromaticgapsinthehumangenomeassembly