Cargando…

Centromere reference models for human chromosomes X and Y satellite arrays

The human genome sequence remains incomplete, with multimegabase-sized gaps representing the endogenous centromeres and other heterochromatic regions. Available sequence-based studies within these sites in the genome have demonstrated a role in centromere function and chromosome pairing, necessary t...

Descripción completa

Detalles Bibliográficos
Autores principales: Miga, Karen H., Newton, Yulia, Jain, Miten, Altemose, Nicolas, Willard, Huntington F., Kent, W. James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3975068/
https://www.ncbi.nlm.nih.gov/pubmed/24501022
http://dx.doi.org/10.1101/gr.159624.113
_version_ 1782310079288049664
author Miga, Karen H.
Newton, Yulia
Jain, Miten
Altemose, Nicolas
Willard, Huntington F.
Kent, W. James
author_facet Miga, Karen H.
Newton, Yulia
Jain, Miten
Altemose, Nicolas
Willard, Huntington F.
Kent, W. James
author_sort Miga, Karen H.
collection PubMed
description The human genome sequence remains incomplete, with multimegabase-sized gaps representing the endogenous centromeres and other heterochromatic regions. Available sequence-based studies within these sites in the genome have demonstrated a role in centromere function and chromosome pairing, necessary to ensure proper chromosome segregation during cell division. A common genomic feature of these regions is the enrichment of long arrays of near-identical tandem repeats, known as satellite DNAs, which offer a limited number of variant sites to differentiate individual repeat copies across millions of bases. This substantial sequence homogeneity challenges available assembly strategies and, as a result, centromeric regions are omitted from ongoing genomic studies. To address this problem, we utilize monomer sequence and ordering information obtained from whole-genome shotgun reads to model two haploid human satellite arrays on chromosomes X and Y, resulting in an initial characterization of 3.83 Mb of centromeric DNA within an individual genome. To further expand the utility of each centromeric reference sequence model, we evaluate sites within the arrays for short-read mappability and chromosome specificity. Because satellite DNAs evolve in a concerted manner, we use these centromeric assemblies to assess the extent of sequence variation among 366 individuals from distinct human populations. We thus identify two satellite array variants in both X and Y centromeres, as determined by array length and sequence composition. This study provides an initial sequence characterization of a regional centromere and establishes a foundation to extend genomic characterization to these sites as well as to other repeat-rich regions within complex genomes.
format Online
Article
Text
id pubmed-3975068
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-39750682014-10-01 Centromere reference models for human chromosomes X and Y satellite arrays Miga, Karen H. Newton, Yulia Jain, Miten Altemose, Nicolas Willard, Huntington F. Kent, W. James Genome Res Method The human genome sequence remains incomplete, with multimegabase-sized gaps representing the endogenous centromeres and other heterochromatic regions. Available sequence-based studies within these sites in the genome have demonstrated a role in centromere function and chromosome pairing, necessary to ensure proper chromosome segregation during cell division. A common genomic feature of these regions is the enrichment of long arrays of near-identical tandem repeats, known as satellite DNAs, which offer a limited number of variant sites to differentiate individual repeat copies across millions of bases. This substantial sequence homogeneity challenges available assembly strategies and, as a result, centromeric regions are omitted from ongoing genomic studies. To address this problem, we utilize monomer sequence and ordering information obtained from whole-genome shotgun reads to model two haploid human satellite arrays on chromosomes X and Y, resulting in an initial characterization of 3.83 Mb of centromeric DNA within an individual genome. To further expand the utility of each centromeric reference sequence model, we evaluate sites within the arrays for short-read mappability and chromosome specificity. Because satellite DNAs evolve in a concerted manner, we use these centromeric assemblies to assess the extent of sequence variation among 366 individuals from distinct human populations. We thus identify two satellite array variants in both X and Y centromeres, as determined by array length and sequence composition. This study provides an initial sequence characterization of a regional centromere and establishes a foundation to extend genomic characterization to these sites as well as to other repeat-rich regions within complex genomes. Cold Spring Harbor Laboratory Press 2014-04 /pmc/articles/PMC3975068/ /pubmed/24501022 http://dx.doi.org/10.1101/gr.159624.113 Text en © 2014 Miga et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/3.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 3.0 Unported), as described at http://creativecommons.org/licenses/by-nc/3.0/.
spellingShingle Method
Miga, Karen H.
Newton, Yulia
Jain, Miten
Altemose, Nicolas
Willard, Huntington F.
Kent, W. James
Centromere reference models for human chromosomes X and Y satellite arrays
title Centromere reference models for human chromosomes X and Y satellite arrays
title_full Centromere reference models for human chromosomes X and Y satellite arrays
title_fullStr Centromere reference models for human chromosomes X and Y satellite arrays
title_full_unstemmed Centromere reference models for human chromosomes X and Y satellite arrays
title_short Centromere reference models for human chromosomes X and Y satellite arrays
title_sort centromere reference models for human chromosomes x and y satellite arrays
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3975068/
https://www.ncbi.nlm.nih.gov/pubmed/24501022
http://dx.doi.org/10.1101/gr.159624.113
work_keys_str_mv AT migakarenh centromerereferencemodelsforhumanchromosomesxandysatellitearrays
AT newtonyulia centromerereferencemodelsforhumanchromosomesxandysatellitearrays
AT jainmiten centromerereferencemodelsforhumanchromosomesxandysatellitearrays
AT altemosenicolas centromerereferencemodelsforhumanchromosomesxandysatellitearrays
AT willardhuntingtonf centromerereferencemodelsforhumanchromosomesxandysatellitearrays
AT kentwjames centromerereferencemodelsforhumanchromosomesxandysatellitearrays