Cargando…
Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly
Centromeric alpha satellite (AS) is composed of highly identical higher-order DNA repetitive sequences, which make the standard assembly process impossible. Because of this the AS repeats were severely underrepresented in previous versions of the human genome assembly showing large centromeric gaps....
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4496801/ https://www.ncbi.nlm.nih.gov/pubmed/26167452 http://dx.doi.org/10.1016/j.gdata.2015.05.035 |
_version_ | 1782380460102385664 |
---|---|
author | Shepelev, V.A. Uralsky, L.I. Alexandrov, A.A. Yurov, Y.B. Rogaev, E.I. Alexandrov, I.A. |
author_facet | Shepelev, V.A. Uralsky, L.I. Alexandrov, A.A. Yurov, Y.B. Rogaev, E.I. Alexandrov, I.A. |
author_sort | Shepelev, V.A. |
collection | PubMed |
description | Centromeric alpha satellite (AS) is composed of highly identical higher-order DNA repetitive sequences, which make the standard assembly process impossible. Because of this the AS repeats were severely underrepresented in previous versions of the human genome assembly showing large centromeric gaps. The latest hg38 assembly (GCA_000001405.15) employed a novel method of approximate representation of these sequences using AS reference models to fill the gaps. Therefore, a lot more of assembled AS became available for genomic analysis. We used the PERCON program previously described by us to annotate various suprachromosomal families (SFs) of AS in the hg38 assembly and presented the results of our primary analysis as an easy-to-read track for the UCSC Genome Browser. The monomeric classes, characteristic of the five known SFs, were color-coded, which allowed quick visual assessment of AS composition in whole multi-megabase centromeres down to each individual AS monomer. Such comprehensive annotation of AS in the human genome assembly was performed for the first time. It showed the expected prevalence of the known major types of AS organization characteristic of the five established SFs. Also, some less common types of AS arrays were identified, such as pure R2 domains in SF5, apparent J/R and D/R mixes in SF1 and SF2, and several different SF4 higher-order repeats among reference models and in regular contigs. No new SFs or large unclassed AS domains were discovered. The dataset reveals the architecture of human centromeres and allows classification of AS sequence reads by alignment to the annotated hg38 assembly. The data were deposited here: http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hgt.customText=https://dl.dropboxusercontent.com/u/22994534/AS-tracks/human-GRC-hg38-M1SFs.bed.bz2. |
format | Online Article Text |
id | pubmed-4496801 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-44968012015-10-19 Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly Shepelev, V.A. Uralsky, L.I. Alexandrov, A.A. Yurov, Y.B. Rogaev, E.I. Alexandrov, I.A. Genom Data Data in Brief Centromeric alpha satellite (AS) is composed of highly identical higher-order DNA repetitive sequences, which make the standard assembly process impossible. Because of this the AS repeats were severely underrepresented in previous versions of the human genome assembly showing large centromeric gaps. The latest hg38 assembly (GCA_000001405.15) employed a novel method of approximate representation of these sequences using AS reference models to fill the gaps. Therefore, a lot more of assembled AS became available for genomic analysis. We used the PERCON program previously described by us to annotate various suprachromosomal families (SFs) of AS in the hg38 assembly and presented the results of our primary analysis as an easy-to-read track for the UCSC Genome Browser. The monomeric classes, characteristic of the five known SFs, were color-coded, which allowed quick visual assessment of AS composition in whole multi-megabase centromeres down to each individual AS monomer. Such comprehensive annotation of AS in the human genome assembly was performed for the first time. It showed the expected prevalence of the known major types of AS organization characteristic of the five established SFs. Also, some less common types of AS arrays were identified, such as pure R2 domains in SF5, apparent J/R and D/R mixes in SF1 and SF2, and several different SF4 higher-order repeats among reference models and in regular contigs. No new SFs or large unclassed AS domains were discovered. The dataset reveals the architecture of human centromeres and allows classification of AS sequence reads by alignment to the annotated hg38 assembly. The data were deposited here: http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&hgt.customText=https://dl.dropboxusercontent.com/u/22994534/AS-tracks/human-GRC-hg38-M1SFs.bed.bz2. Elsevier 2015-06-05 /pmc/articles/PMC4496801/ /pubmed/26167452 http://dx.doi.org/10.1016/j.gdata.2015.05.035 Text en © 2015 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Data in Brief Shepelev, V.A. Uralsky, L.I. Alexandrov, A.A. Yurov, Y.B. Rogaev, E.I. Alexandrov, I.A. Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly |
title | Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly |
title_full | Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly |
title_fullStr | Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly |
title_full_unstemmed | Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly |
title_short | Annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly |
title_sort | annotation of suprachromosomal families reveals uncommon types of alpha satellite organization in pericentromeric regions of hg38 human genome assembly |
topic | Data in Brief |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4496801/ https://www.ncbi.nlm.nih.gov/pubmed/26167452 http://dx.doi.org/10.1016/j.gdata.2015.05.035 |
work_keys_str_mv | AT shepelevva annotationofsuprachromosomalfamiliesrevealsuncommontypesofalphasatelliteorganizationinpericentromericregionsofhg38humangenomeassembly AT uralskyli annotationofsuprachromosomalfamiliesrevealsuncommontypesofalphasatelliteorganizationinpericentromericregionsofhg38humangenomeassembly AT alexandrovaa annotationofsuprachromosomalfamiliesrevealsuncommontypesofalphasatelliteorganizationinpericentromericregionsofhg38humangenomeassembly AT yurovyb annotationofsuprachromosomalfamiliesrevealsuncommontypesofalphasatelliteorganizationinpericentromericregionsofhg38humangenomeassembly AT rogaevei annotationofsuprachromosomalfamiliesrevealsuncommontypesofalphasatelliteorganizationinpericentromericregionsofhg38humangenomeassembly AT alexandrovia annotationofsuprachromosomalfamiliesrevealsuncommontypesofalphasatelliteorganizationinpericentromericregionsofhg38humangenomeassembly |