Cargando…

Generalizations of the genomic rank distance to indels

MOTIVATION: The rank distance model represents genome rearrangements in multi-chromosomal genomes as matrix operations, which allows the reconstruction of parsimonious histories of evolution by rearrangements. We seek to generalize this model by allowing for genomes with different gene content, to a...

Descripción completa

Detalles Bibliográficos
Autores principales: Pereira Zanetti, João Paulo, Peres Oliveira, Lucas, Chindelevitch, Leonid, Meidanis, João
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985151/
https://www.ncbi.nlm.nih.gov/pubmed/36790056
http://dx.doi.org/10.1093/bioinformatics/btad087
_version_ 1784900892683665408
author Pereira Zanetti, João Paulo
Peres Oliveira, Lucas
Chindelevitch, Leonid
Meidanis, João
author_facet Pereira Zanetti, João Paulo
Peres Oliveira, Lucas
Chindelevitch, Leonid
Meidanis, João
author_sort Pereira Zanetti, João Paulo
collection PubMed
description MOTIVATION: The rank distance model represents genome rearrangements in multi-chromosomal genomes as matrix operations, which allows the reconstruction of parsimonious histories of evolution by rearrangements. We seek to generalize this model by allowing for genomes with different gene content, to accommodate a broader range of biological contexts. We approach this generalization by using a matrix representation of genomes. This leads to simple distance formulas and sorting algorithms for genomes with different gene contents, but without duplications. RESULTS: We generalize the rank distance to genomes with different gene content in two different ways. The first approach adds insertions, deletions and the substitution of a single extremity to the basic operations. We show how to efficiently compute this distance. To avoid genomes with incomplete markers, our alternative distance, the rank-indel distance, only uses insertions and deletions of entire chromosomes. We construct phylogenetic trees with our distances and the DCJ-Indel distance for simulated data and real prokaryotic genomes, and compare them against reference trees. For simulated data, our distances outperform the DCJ-Indel distance using the Quartet metric as baseline. This suggests that rank distances are more robust for comparing distantly related species. For real prokaryotic genomes, all rearrangement-based distances yield phylogenetic trees that are topologically distant from the reference (65% similarity with Quartet metric), but are able to cluster related species within their respective clades and distinguish the Shigella strains as the farthest relative of the Escherichia coli strains, a feature not seen in the reference tree. AVAILABILITY AND IMPLEMENTATION: Code and instructions are available at https://github.com/meidanis-lab/rank-indel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9985151
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99851512023-03-05 Generalizations of the genomic rank distance to indels Pereira Zanetti, João Paulo Peres Oliveira, Lucas Chindelevitch, Leonid Meidanis, João Bioinformatics Original Paper MOTIVATION: The rank distance model represents genome rearrangements in multi-chromosomal genomes as matrix operations, which allows the reconstruction of parsimonious histories of evolution by rearrangements. We seek to generalize this model by allowing for genomes with different gene content, to accommodate a broader range of biological contexts. We approach this generalization by using a matrix representation of genomes. This leads to simple distance formulas and sorting algorithms for genomes with different gene contents, but without duplications. RESULTS: We generalize the rank distance to genomes with different gene content in two different ways. The first approach adds insertions, deletions and the substitution of a single extremity to the basic operations. We show how to efficiently compute this distance. To avoid genomes with incomplete markers, our alternative distance, the rank-indel distance, only uses insertions and deletions of entire chromosomes. We construct phylogenetic trees with our distances and the DCJ-Indel distance for simulated data and real prokaryotic genomes, and compare them against reference trees. For simulated data, our distances outperform the DCJ-Indel distance using the Quartet metric as baseline. This suggests that rank distances are more robust for comparing distantly related species. For real prokaryotic genomes, all rearrangement-based distances yield phylogenetic trees that are topologically distant from the reference (65% similarity with Quartet metric), but are able to cluster related species within their respective clades and distinguish the Shigella strains as the farthest relative of the Escherichia coli strains, a feature not seen in the reference tree. AVAILABILITY AND IMPLEMENTATION: Code and instructions are available at https://github.com/meidanis-lab/rank-indel. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2023-02-15 /pmc/articles/PMC9985151/ /pubmed/36790056 http://dx.doi.org/10.1093/bioinformatics/btad087 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Pereira Zanetti, João Paulo
Peres Oliveira, Lucas
Chindelevitch, Leonid
Meidanis, João
Generalizations of the genomic rank distance to indels
title Generalizations of the genomic rank distance to indels
title_full Generalizations of the genomic rank distance to indels
title_fullStr Generalizations of the genomic rank distance to indels
title_full_unstemmed Generalizations of the genomic rank distance to indels
title_short Generalizations of the genomic rank distance to indels
title_sort generalizations of the genomic rank distance to indels
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9985151/
https://www.ncbi.nlm.nih.gov/pubmed/36790056
http://dx.doi.org/10.1093/bioinformatics/btad087
work_keys_str_mv AT pereirazanettijoaopaulo generalizationsofthegenomicrankdistancetoindels
AT peresoliveiralucas generalizationsofthegenomicrankdistancetoindels
AT chindelevitchleonid generalizationsofthegenomicrankdistancetoindels
AT meidanisjoao generalizationsofthegenomicrankdistancetoindels