Cargando…
A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes
BACKGROUND: With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic sequences have been derived, from which there will be great opportunities for computational pangenomics. Althoug...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8161984/ https://www.ncbi.nlm.nih.gov/pubmed/34044757 http://dx.doi.org/10.1186/s12859-021-04149-w |
_version_ | 1783700621641121792 |
---|---|
author | Guo, Jindan Pang, Erli Song, Hongtao Lin, Kui |
author_facet | Guo, Jindan Pang, Erli Song, Hongtao Lin, Kui |
author_sort | Guo, Jindan |
collection | PubMed |
description | BACKGROUND: With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic sequences have been derived, from which there will be great opportunities for computational pangenomics. Although genome graphs are among the most useful models for pangenome representation, their structural complexity makes it difficult to present genome information intuitively, such as the linear reference genome. Thus, efficiently and accurately analyzing the genome graph spatial structure and coordinating the information remains a substantial challenge. RESULTS: We developed a new method, a colored superbubble (cSupB), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest. Based on this model, we propose a tri-tuple coordinate system that combines an offset value, topological structure and sample information. Additionally, cSupB provides a novel method that utilizes complete topological information and efficiently detects small indels (< 50 bp) for highly similar samples, which can be validated by simulated datasets. Moreover, we demonstrated that cSupB can adapt to the complex cycle structure. CONCLUSIONS: Although the solution is made suitable for increasingly complex genome graphs by relaxing the constraint, the directed acyclic graph, the motif cSupB and the cSupB method can be extended to any colored directed acyclic graph. We anticipate that our method will facilitate the analysis of individual haplotype variants and population genomic diversity. We have developed a C + + program for implementing our method that is available at https://github.com/eggleader/cSupB. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04149-w. |
format | Online Article Text |
id | pubmed-8161984 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-81619842021-06-01 A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes Guo, Jindan Pang, Erli Song, Hongtao Lin, Kui BMC Bioinformatics Methodology Article BACKGROUND: With the rapid development of accurate sequencing and assembly technologies, an increasing number of high-quality chromosome-level and haplotype-resolved assemblies of genomic sequences have been derived, from which there will be great opportunities for computational pangenomics. Although genome graphs are among the most useful models for pangenome representation, their structural complexity makes it difficult to present genome information intuitively, such as the linear reference genome. Thus, efficiently and accurately analyzing the genome graph spatial structure and coordinating the information remains a substantial challenge. RESULTS: We developed a new method, a colored superbubble (cSupB), that can overcome the complexity of graphs and organize a set of species- or population-specific haplotype sequences of interest. Based on this model, we propose a tri-tuple coordinate system that combines an offset value, topological structure and sample information. Additionally, cSupB provides a novel method that utilizes complete topological information and efficiently detects small indels (< 50 bp) for highly similar samples, which can be validated by simulated datasets. Moreover, we demonstrated that cSupB can adapt to the complex cycle structure. CONCLUSIONS: Although the solution is made suitable for increasingly complex genome graphs by relaxing the constraint, the directed acyclic graph, the motif cSupB and the cSupB method can be extended to any colored directed acyclic graph. We anticipate that our method will facilitate the analysis of individual haplotype variants and population genomic diversity. We have developed a C + + program for implementing our method that is available at https://github.com/eggleader/cSupB. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04149-w. BioMed Central 2021-05-27 /pmc/articles/PMC8161984/ /pubmed/34044757 http://dx.doi.org/10.1186/s12859-021-04149-w Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Guo, Jindan Pang, Erli Song, Hongtao Lin, Kui A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes |
title | A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes |
title_full | A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes |
title_fullStr | A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes |
title_full_unstemmed | A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes |
title_short | A tri-tuple coordinate system derived for fast and accurate analysis of the colored de Bruijn graph-based pangenomes |
title_sort | tri-tuple coordinate system derived for fast and accurate analysis of the colored de bruijn graph-based pangenomes |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8161984/ https://www.ncbi.nlm.nih.gov/pubmed/34044757 http://dx.doi.org/10.1186/s12859-021-04149-w |
work_keys_str_mv | AT guojindan atrituplecoordinatesystemderivedforfastandaccurateanalysisofthecoloreddebruijngraphbasedpangenomes AT pangerli atrituplecoordinatesystemderivedforfastandaccurateanalysisofthecoloreddebruijngraphbasedpangenomes AT songhongtao atrituplecoordinatesystemderivedforfastandaccurateanalysisofthecoloreddebruijngraphbasedpangenomes AT linkui atrituplecoordinatesystemderivedforfastandaccurateanalysisofthecoloreddebruijngraphbasedpangenomes AT guojindan trituplecoordinatesystemderivedforfastandaccurateanalysisofthecoloreddebruijngraphbasedpangenomes AT pangerli trituplecoordinatesystemderivedforfastandaccurateanalysisofthecoloreddebruijngraphbasedpangenomes AT songhongtao trituplecoordinatesystemderivedforfastandaccurateanalysisofthecoloreddebruijngraphbasedpangenomes AT linkui trituplecoordinatesystemderivedforfastandaccurateanalysisofthecoloreddebruijngraphbasedpangenomes |