Cargando…

Fast characterization of segmental duplication structure in multiple genome assemblies

MOTIVATION: The increasing availability of high-quality genome assemblies raised interest in the characterization of genomic architecture. Major architectural elements, such as common repeats and segmental duplications (SDs), increase genome plasticity that stimulates further evolution by changing t...

Descripción completa

Detalles Bibliográficos
Autores principales: Išerić, Hamza, Alkan, Can, Hach, Faraz, Numanagić, Ibrahim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8932185/
https://www.ncbi.nlm.nih.gov/pubmed/35303886
http://dx.doi.org/10.1186/s13015-022-00210-2
_version_ 1784671402769514496
author Išerić, Hamza
Alkan, Can
Hach, Faraz
Numanagić, Ibrahim
author_facet Išerić, Hamza
Alkan, Can
Hach, Faraz
Numanagić, Ibrahim
author_sort Išerić, Hamza
collection PubMed
description MOTIVATION: The increasing availability of high-quality genome assemblies raised interest in the characterization of genomic architecture. Major architectural elements, such as common repeats and segmental duplications (SDs), increase genome plasticity that stimulates further evolution by changing the genomic structure and inventing new genes. Optimal computation of SDs within a genome requires quadratic-time local alignment algorithms that are impractical due to the size of most genomes. Additionally, to perform evolutionary analysis, one needs to characterize SDs in multiple genomes and find relations between those SDs and unique (non-duplicated) segments in other genomes. A naïve approach consisting of multiple sequence alignment would make the optimal solution to this problem even more impractical. Thus there is a need for fast and accurate algorithms to characterize SD structure in multiple genome assemblies to better understand the evolutionary forces that shaped the genomes of today. RESULTS: Here we introduce a new approach, BISER, to quickly detect SDs in multiple genomes and identify elementary SDs and core duplicons that drive the formation of such SDs. BISER improves earlier tools by (i) scaling the detection of SDs with low homology to multiple genomes while introducing further 7–33[Formula: see text] speed-ups over the existing tools, and by (ii) characterizing elementary SDs and detecting core duplicons to help trace the evolutionary history of duplications to as far as 300 million years. AVAILABILITY AND IMPLEMENTATION: BISER is implemented in Seq programming language and is publicly available at https://github.com/0xTCG/biser.
format Online
Article
Text
id pubmed-8932185
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-89321852022-03-23 Fast characterization of segmental duplication structure in multiple genome assemblies Išerić, Hamza Alkan, Can Hach, Faraz Numanagić, Ibrahim Algorithms Mol Biol Research MOTIVATION: The increasing availability of high-quality genome assemblies raised interest in the characterization of genomic architecture. Major architectural elements, such as common repeats and segmental duplications (SDs), increase genome plasticity that stimulates further evolution by changing the genomic structure and inventing new genes. Optimal computation of SDs within a genome requires quadratic-time local alignment algorithms that are impractical due to the size of most genomes. Additionally, to perform evolutionary analysis, one needs to characterize SDs in multiple genomes and find relations between those SDs and unique (non-duplicated) segments in other genomes. A naïve approach consisting of multiple sequence alignment would make the optimal solution to this problem even more impractical. Thus there is a need for fast and accurate algorithms to characterize SD structure in multiple genome assemblies to better understand the evolutionary forces that shaped the genomes of today. RESULTS: Here we introduce a new approach, BISER, to quickly detect SDs in multiple genomes and identify elementary SDs and core duplicons that drive the formation of such SDs. BISER improves earlier tools by (i) scaling the detection of SDs with low homology to multiple genomes while introducing further 7–33[Formula: see text] speed-ups over the existing tools, and by (ii) characterizing elementary SDs and detecting core duplicons to help trace the evolutionary history of duplications to as far as 300 million years. AVAILABILITY AND IMPLEMENTATION: BISER is implemented in Seq programming language and is publicly available at https://github.com/0xTCG/biser. BioMed Central 2022-03-18 /pmc/articles/PMC8932185/ /pubmed/35303886 http://dx.doi.org/10.1186/s13015-022-00210-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Išerić, Hamza
Alkan, Can
Hach, Faraz
Numanagić, Ibrahim
Fast characterization of segmental duplication structure in multiple genome assemblies
title Fast characterization of segmental duplication structure in multiple genome assemblies
title_full Fast characterization of segmental duplication structure in multiple genome assemblies
title_fullStr Fast characterization of segmental duplication structure in multiple genome assemblies
title_full_unstemmed Fast characterization of segmental duplication structure in multiple genome assemblies
title_short Fast characterization of segmental duplication structure in multiple genome assemblies
title_sort fast characterization of segmental duplication structure in multiple genome assemblies
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8932185/
https://www.ncbi.nlm.nih.gov/pubmed/35303886
http://dx.doi.org/10.1186/s13015-022-00210-2
work_keys_str_mv AT iserichamza fastcharacterizationofsegmentalduplicationstructureinmultiplegenomeassemblies
AT alkancan fastcharacterizationofsegmentalduplicationstructureinmultiplegenomeassemblies
AT hachfaraz fastcharacterizationofsegmentalduplicationstructureinmultiplegenomeassemblies
AT numanagicibrahim fastcharacterizationofsegmentalduplicationstructureinmultiplegenomeassemblies