Cargando…

Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs

BACKGROUND: Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Fiedler, Lisa, Bernt, Matthias, Middendorf, Martin, Stadler, Peter F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10243065/
https://www.ncbi.nlm.nih.gov/pubmed/37277700
http://dx.doi.org/10.1186/s12859-023-05371-4
_version_ 1785054351837888512
author Fiedler, Lisa
Bernt, Matthias
Middendorf, Martin
Stadler, Peter F.
author_facet Fiedler, Lisa
Bernt, Matthias
Middendorf, Martin
Stadler, Peter F.
author_sort Fiedler, Lisa
collection PubMed
description BACKGROUND: Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene annotations are erroneous, or only nucleotide sequences are available. Especially in mitochondrial genomes, high variations in gene orders are usually accompanied by a high degree of sequence inconsistencies. This makes accurately locating breakpoints in mitogenomic nucleotide sequences a challenging task. RESULTS: This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI ’s ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI ’s applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach. CONCLUSION: The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps.
format Online
Article
Text
id pubmed-10243065
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-102430652023-06-07 Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs Fiedler, Lisa Bernt, Matthias Middendorf, Martin Stadler, Peter F. BMC Bioinformatics Research BACKGROUND: Identifying the locations of gene breakpoints between species of different taxonomic groups can provide useful insights into the underlying evolutionary processes. Given the exact locations of their genes, the breakpoints can be computed without much effort. However, often, existing gene annotations are erroneous, or only nucleotide sequences are available. Especially in mitochondrial genomes, high variations in gene orders are usually accompanied by a high degree of sequence inconsistencies. This makes accurately locating breakpoints in mitogenomic nucleotide sequences a challenging task. RESULTS: This contribution presents a novel method for detecting gene breakpoints in the nucleotide sequences of complete mitochondrial genomes, taking into account possible high substitution rates. The method is implemented in the software package DeBBI. DeBBI allows to analyze transposition- and inversion-based breakpoints independently and uses a parallel program design, allowing to make use of modern multi-processor systems. Extensive tests on synthetic data sets, covering a broad range of sequence dissimilarities and different numbers of introduced breakpoints, demonstrate DeBBI ’s ability to produce accurate results. Case studies using species of various taxonomic groups further show DeBBI ’s applicability to real-life data. While (some) multiple sequence alignment tools can also be used for the task at hand, we demonstrate that especially gene breaks between short, poorly conserved tRNA genes can be detected more frequently with the proposed approach. CONCLUSION: The proposed method constructs a position-annotated de-Bruijn graph of the input sequences. Using a heuristic algorithm, this graph is searched for particular structures, called bulges, which may be associated with the breakpoint locations. Despite the large size of these structures, the algorithm only requires a small number of graph traversal steps. BioMed Central 2023-06-05 /pmc/articles/PMC10243065/ /pubmed/37277700 http://dx.doi.org/10.1186/s12859-023-05371-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Fiedler, Lisa
Bernt, Matthias
Middendorf, Martin
Stadler, Peter F.
Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs
title Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs
title_full Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs
title_fullStr Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs
title_full_unstemmed Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs
title_short Detecting gene breakpoints in noisy genome sequences using position-annotated colored de-Bruijn graphs
title_sort detecting gene breakpoints in noisy genome sequences using position-annotated colored de-bruijn graphs
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10243065/
https://www.ncbi.nlm.nih.gov/pubmed/37277700
http://dx.doi.org/10.1186/s12859-023-05371-4
work_keys_str_mv AT fiedlerlisa detectinggenebreakpointsinnoisygenomesequencesusingpositionannotatedcoloreddebruijngraphs
AT berntmatthias detectinggenebreakpointsinnoisygenomesequencesusingpositionannotatedcoloreddebruijngraphs
AT middendorfmartin detectinggenebreakpointsinnoisygenomesequencesusingpositionannotatedcoloreddebruijngraphs
AT stadlerpeterf detectinggenebreakpointsinnoisygenomesequencesusingpositionannotatedcoloreddebruijngraphs