Cargando…

CentromereArchitect: inference and analysis of the architecture of centromeres

MOTIVATION: Recent advances in long-read sequencing technologies led to rapid progress in centromere assembly in the last year and, for the first time, opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. However, since these advances...

Descripción completa

Detalles Bibliográficos
Autores principales: Dvorkina, Tatiana, Kunyavskaya, Olga, Bzikadze, Andrey V, Alexandrov, Ivan, Pevzner, Pavel A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8336445/
https://www.ncbi.nlm.nih.gov/pubmed/34252949
http://dx.doi.org/10.1093/bioinformatics/btab265
_version_ 1783733321387212800
author Dvorkina, Tatiana
Kunyavskaya, Olga
Bzikadze, Andrey V
Alexandrov, Ivan
Pevzner, Pavel A
author_facet Dvorkina, Tatiana
Kunyavskaya, Olga
Bzikadze, Andrey V
Alexandrov, Ivan
Pevzner, Pavel A
author_sort Dvorkina, Tatiana
collection PubMed
description MOTIVATION: Recent advances in long-read sequencing technologies led to rapid progress in centromere assembly in the last year and, for the first time, opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. However, since these advances have not been yet accompanied by the development of the centromere-specific bioinformatics algorithms, even the fundamental questions (e.g. centromere annotation by deriving the complete set of human monomers and high-order repeats), let alone more complex questions (e.g. explaining how monomers and high-order repeats evolved) about human centromeres remain open. Moreover, even though there was a four-decade-long series of studies aimed at cataloging all human monomers and high-order repeats, the rigorous algorithmic definitions of these concepts are still lacking. Thus, the development of a centromere annotation tool is a prerequisite for follow-up personalized biomedical studies of centromeres across the human population and evolutionary studies of centromeres across various species. RESULTS: We describe the CentromereArchitect, the first tool for the centromere annotation in a newly sequenced genome, apply it to the recently generated complete assembly of a human genome by the Telomere-to-Telomere consortium, generate the complete set of human monomers and high-order repeats for ‘live’ centromeres, and reveal a vast set of hybrid monomers that may represent the focal points of centromere evolution. AVAILABILITY AND IMPLEMENTATION: CentromereArchitect is publicly available on https://github.com/ablab/stringdecomposer/tree/ismb2021 SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8336445
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83364452021-08-09 CentromereArchitect: inference and analysis of the architecture of centromeres Dvorkina, Tatiana Kunyavskaya, Olga Bzikadze, Andrey V Alexandrov, Ivan Pevzner, Pavel A Bioinformatics Genome Sequence Analysis MOTIVATION: Recent advances in long-read sequencing technologies led to rapid progress in centromere assembly in the last year and, for the first time, opened a possibility to address the long-standing questions about the architecture and evolution of human centromeres. However, since these advances have not been yet accompanied by the development of the centromere-specific bioinformatics algorithms, even the fundamental questions (e.g. centromere annotation by deriving the complete set of human monomers and high-order repeats), let alone more complex questions (e.g. explaining how monomers and high-order repeats evolved) about human centromeres remain open. Moreover, even though there was a four-decade-long series of studies aimed at cataloging all human monomers and high-order repeats, the rigorous algorithmic definitions of these concepts are still lacking. Thus, the development of a centromere annotation tool is a prerequisite for follow-up personalized biomedical studies of centromeres across the human population and evolutionary studies of centromeres across various species. RESULTS: We describe the CentromereArchitect, the first tool for the centromere annotation in a newly sequenced genome, apply it to the recently generated complete assembly of a human genome by the Telomere-to-Telomere consortium, generate the complete set of human monomers and high-order repeats for ‘live’ centromeres, and reveal a vast set of hybrid monomers that may represent the focal points of centromere evolution. AVAILABILITY AND IMPLEMENTATION: CentromereArchitect is publicly available on https://github.com/ablab/stringdecomposer/tree/ismb2021 SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-07-12 /pmc/articles/PMC8336445/ /pubmed/34252949 http://dx.doi.org/10.1093/bioinformatics/btab265 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Genome Sequence Analysis
Dvorkina, Tatiana
Kunyavskaya, Olga
Bzikadze, Andrey V
Alexandrov, Ivan
Pevzner, Pavel A
CentromereArchitect: inference and analysis of the architecture of centromeres
title CentromereArchitect: inference and analysis of the architecture of centromeres
title_full CentromereArchitect: inference and analysis of the architecture of centromeres
title_fullStr CentromereArchitect: inference and analysis of the architecture of centromeres
title_full_unstemmed CentromereArchitect: inference and analysis of the architecture of centromeres
title_short CentromereArchitect: inference and analysis of the architecture of centromeres
title_sort centromerearchitect: inference and analysis of the architecture of centromeres
topic Genome Sequence Analysis
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8336445/
https://www.ncbi.nlm.nih.gov/pubmed/34252949
http://dx.doi.org/10.1093/bioinformatics/btab265
work_keys_str_mv AT dvorkinatatiana centromerearchitectinferenceandanalysisofthearchitectureofcentromeres
AT kunyavskayaolga centromerearchitectinferenceandanalysisofthearchitectureofcentromeres
AT bzikadzeandreyv centromerearchitectinferenceandanalysisofthearchitectureofcentromeres
AT alexandrovivan centromerearchitectinferenceandanalysisofthearchitectureofcentromeres
AT pevznerpavela centromerearchitectinferenceandanalysisofthearchitectureofcentromeres