Cargando…
Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs
BACKGROUND: Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10243065/ https://www.ncbi.nlm.nih.gov/pubmed/37277700 http://dx.doi.org/10.1186/s12859-023-05371-4 |
_version_ | 1785054351837888512 |
---|---|
author | Fiedler, Lisa Bernt, Matthias Middendorf, Martin Stadler, Peter F. |
author_facet | Fiedler, Lisa Bernt, Matthias Middendorf, Martin Stadler, Peter F. |
author_sort | Fiedler, Lisa |
collection | PubMed |
description | BACKGROUND: Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene annotations are erroneous, or only nucleotide sequences are available. Especially in mitochondrial genomes, high variations in gene orders are usually accompanied by a high degree of sequence inconsistencies. This makes accurately locating breakpoints in mitogenomic nucleotide sequences a challenging task. RESULTS: This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI ’s ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI ’s applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach. CONCLUSION: The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps. |
format | Online Article Text |
id | pubmed-10243065 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-102430652023-06-07 Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs Fiedler, Lisa Bernt, Matthias Middendorf, Martin Stadler, Peter F. BMC Bioinformatics Research BACKGROUND: Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene annotations are erroneous, or only nucleotide sequences are available. Especially in mitochondrial genomes, high variations in gene orders are usually accompanied by a high degree of sequence inconsistencies. This makes accurately locating breakpoints in mitogenomic nucleotide sequences a challenging task. RESULTS: This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI ’s ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI ’s applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach. CONCLUSION: The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps. BioMed Central 2023-06-05 /pmc/articles/PMC10243065/ /pubmed/37277700 http://dx.doi.org/10.1186/s12859-023-05371-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Fiedler, Lisa Bernt, Matthias Middendorf, Martin Stadler, Peter F. Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs |
title | Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs |
title_full | Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs |
title_fullStr | Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs |
title_full_unstemmed | Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs |
title_short | Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs |
title_sort | detecting gene breakpoints in noisy genome sequences using position-annotated colored de-bruijn graphs |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10243065/ https://www.ncbi.nlm.nih.gov/pubmed/37277700 http://dx.doi.org/10.1186/s12859-023-05371-4 |
work_keys_str_mv | AT fiedlerlisa detectinggenebreakpointsinnoisygenomesequencesusingpositionannotatedcoloreddebruijngraphs AT berntmatthias detectinggenebreakpointsinnoisygenomesequencesusingpositionannotatedcoloreddebruijngraphs AT middendorfmartin detectinggenebreakpointsinnoisygenomesequencesusingpositionannotatedcoloreddebruijngraphs AT stadlerpeterf detectinggenebreakpointsinnoisygenomesequencesusingpositionannotatedcoloreddebruijngraphs |