Cargando…

Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly

In the latest hg38 human genome assembly, centromeric gaps has been filled in by alpha satellite (AS) reference models (RMs) which are statistical representations of homogeneous higher-order repeat (HOR) arrays that make up the bulk of the centromeric regions. We analyzed these models to compose an...

Descripción completa

Detalles Bibliográficos
Autores principales: Uralsky, L.I., Shepelev, V.A., Alexandrov, A.A., Yurov, Y.B., Rogaev, E.I., Alexandrov, I.A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6447721/
https://www.ncbi.nlm.nih.gov/pubmed/30989093
http://dx.doi.org/10.1016/j.dib.2019.103708
_version_ 1783408555934613504
author Uralsky, L.I.
Shepelev, V.A.
Alexandrov, A.A.
Yurov, Y.B.
Rogaev, E.I.
Alexandrov, I.A.
author_facet Uralsky, L.I.
Shepelev, V.A.
Alexandrov, A.A.
Yurov, Y.B.
Rogaev, E.I.
Alexandrov, I.A.
author_sort Uralsky, L.I.
collection PubMed
description In the latest hg38 human genome assembly, centromeric gaps has been filled in by alpha satellite (AS) reference models (RMs) which are statistical representations of homogeneous higher-order repeat (HOR) arrays that make up the bulk of the centromeric regions. We analyzed these models to compose an atlas of human AS HORs where each monomer of a HOR was represented by a number of its polymorphic sequence variants. We combined these data and HMMER sequence analysis platform to annotate AS HORs in the assembly. This led to discovery of a new type of low copy number highly divergent HORs which were not represented by RMs. These were included in the dataset. The annotation can be viewed as UCSC Genome Browser custom track (the HOR-track) and used together with our previous annotation of AS suprachromosomal families (SFs) in the same assembly, where each AS monomer can be viewed in its genomic context together with its classification into one of the 5 major SFs (the SF-track). To catalog the diversity of AS HORs in the human genome we introduced a new naming system. Each HOR received a name which showed its SF, chromosomal location and index number. Here we present the first installment of the HOR-track covering only the 17 HORs that belong to SF1 which forms live functional centromeres in chromosomes 1, 3, 5, 6, 7, 10, 12, 16 and 19 and also a large number of minor dead HOR domains, both homogeneous and divergent. Monomer-by-monomer HOR annotation used for this dataset as opposed to annotation of whole HOR repeats provides for mapping and quantification of various structural variants of AS HORs which can be used to collect data on inter-individual polymorphism of AS.
format Online
Article
Text
id pubmed-6447721
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-64477212019-04-15 Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly Uralsky, L.I. Shepelev, V.A. Alexandrov, A.A. Yurov, Y.B. Rogaev, E.I. Alexandrov, I.A. Data Brief Genetics, Genomics and Molecular Biology In the latest hg38 human genome assembly, centromeric gaps has been filled in by alpha satellite (AS) reference models (RMs) which are statistical representations of homogeneous higher-order repeat (HOR) arrays that make up the bulk of the centromeric regions. We analyzed these models to compose an atlas of human AS HORs where each monomer of a HOR was represented by a number of its polymorphic sequence variants. We combined these data and HMMER sequence analysis platform to annotate AS HORs in the assembly. This led to discovery of a new type of low copy number highly divergent HORs which were not represented by RMs. These were included in the dataset. The annotation can be viewed as UCSC Genome Browser custom track (the HOR-track) and used together with our previous annotation of AS suprachromosomal families (SFs) in the same assembly, where each AS monomer can be viewed in its genomic context together with its classification into one of the 5 major SFs (the SF-track). To catalog the diversity of AS HORs in the human genome we introduced a new naming system. Each HOR received a name which showed its SF, chromosomal location and index number. Here we present the first installment of the HOR-track covering only the 17 HORs that belong to SF1 which forms live functional centromeres in chromosomes 1, 3, 5, 6, 7, 10, 12, 16 and 19 and also a large number of minor dead HOR domains, both homogeneous and divergent. Monomer-by-monomer HOR annotation used for this dataset as opposed to annotation of whole HOR repeats provides for mapping and quantification of various structural variants of AS HORs which can be used to collect data on inter-individual polymorphism of AS. Elsevier 2019-03-08 /pmc/articles/PMC6447721/ /pubmed/30989093 http://dx.doi.org/10.1016/j.dib.2019.103708 Text en © 2019 The Author(s) http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Genetics, Genomics and Molecular Biology
Uralsky, L.I.
Shepelev, V.A.
Alexandrov, A.A.
Yurov, Y.B.
Rogaev, E.I.
Alexandrov, I.A.
Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
title Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
title_full Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
title_fullStr Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
title_full_unstemmed Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
title_short Classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
title_sort classification and monomer-by-monomer annotation dataset of suprachromosomal family 1 alpha satellite higher-order repeats in hg38 human genome assembly
topic Genetics, Genomics and Molecular Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6447721/
https://www.ncbi.nlm.nih.gov/pubmed/30989093
http://dx.doi.org/10.1016/j.dib.2019.103708
work_keys_str_mv AT uralskyli classificationandmonomerbymonomerannotationdatasetofsuprachromosomalfamily1alphasatellitehigherorderrepeatsinhg38humangenomeassembly
AT shepelevva classificationandmonomerbymonomerannotationdatasetofsuprachromosomalfamily1alphasatellitehigherorderrepeatsinhg38humangenomeassembly
AT alexandrovaa classificationandmonomerbymonomerannotationdatasetofsuprachromosomalfamily1alphasatellitehigherorderrepeatsinhg38humangenomeassembly
AT yurovyb classificationandmonomerbymonomerannotationdatasetofsuprachromosomalfamily1alphasatellitehigherorderrepeatsinhg38humangenomeassembly
AT rogaevei classificationandmonomerbymonomerannotationdatasetofsuprachromosomalfamily1alphasatellitehigherorderrepeatsinhg38humangenomeassembly
AT alexandrovia classificationandmonomerbymonomerannotationdatasetofsuprachromosomalfamily1alphasatellitehigherorderrepeatsinhg38humangenomeassembly