Cargando…

On the weight of indels in genomic distances

BACKGROUND: Classical approaches to compute the genomic distance are usually limited to genomes with the same content, without duplicated markers. However, differences in the gene content are frequently observed and can reflect important evolutionary aspects. A few polynomial time algorithms that in...

Descripción completa

Detalles Bibliográficos
Autores principales: Braga, Marília D V, Machado, Raphael, Ribeiro, Leonardo C, Stoye, Jens
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3283315/
https://www.ncbi.nlm.nih.gov/pubmed/22151784
http://dx.doi.org/10.1186/1471-2105-12-S9-S13
_version_ 1782224183440179200
author Braga, Marília D V
Machado, Raphael
Ribeiro, Leonardo C
Stoye, Jens
author_facet Braga, Marília D V
Machado, Raphael
Ribeiro, Leonardo C
Stoye, Jens
author_sort Braga, Marília D V
collection PubMed
description BACKGROUND: Classical approaches to compute the genomic distance are usually limited to genomes with the same content, without duplicated markers. However, differences in the gene content are frequently observed and can reflect important evolutionary aspects. A few polynomial time algorithms that include genome rearrangements, insertions and deletions (or substitutions) were already proposed. These methods often allow a block of contiguous markers to be inserted, deleted or substituted at once but result in distance functions that do not respect the triangular inequality and hence do not constitute metrics. RESULTS: In the present study we discuss the disruption of the triangular inequality in some of the available methods and give a framework to establish an efficient correction for two models recently proposed, one that includes insertions, deletions and double cut and join (DCJ) operations, and one that includes substitutions and DCJ operations. CONCLUSIONS: We show that the proposed framework establishes the triangular inequality in both distances, by summing a surcharge on indel operations and on substitutions that depends only on the number of markers affected by these operations. This correction can be applied a posteriori, without interfering with the already available formulas to compute these distances. We claim that this correction leads to distances that are biologically more plausible.
format Online
Article
Text
id pubmed-3283315
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32833152012-02-22 On the weight of indels in genomic distances Braga, Marília D V Machado, Raphael Ribeiro, Leonardo C Stoye, Jens BMC Bioinformatics Proceedings BACKGROUND: Classical approaches to compute the genomic distance are usually limited to genomes with the same content, without duplicated markers. However, differences in the gene content are frequently observed and can reflect important evolutionary aspects. A few polynomial time algorithms that include genome rearrangements, insertions and deletions (or substitutions) were already proposed. These methods often allow a block of contiguous markers to be inserted, deleted or substituted at once but result in distance functions that do not respect the triangular inequality and hence do not constitute metrics. RESULTS: In the present study we discuss the disruption of the triangular inequality in some of the available methods and give a framework to establish an efficient correction for two models recently proposed, one that includes insertions, deletions and double cut and join (DCJ) operations, and one that includes substitutions and DCJ operations. CONCLUSIONS: We show that the proposed framework establishes the triangular inequality in both distances, by summing a surcharge on indel operations and on substitutions that depends only on the number of markers affected by these operations. This correction can be applied a posteriori, without interfering with the already available formulas to compute these distances. We claim that this correction leads to distances that are biologically more plausible. BioMed Central 2011-10-05 /pmc/articles/PMC3283315/ /pubmed/22151784 http://dx.doi.org/10.1186/1471-2105-12-S9-S13 Text en Copyright ©2011 Braga et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Braga, Marília D V
Machado, Raphael
Ribeiro, Leonardo C
Stoye, Jens
On the weight of indels in genomic distances
title On the weight of indels in genomic distances
title_full On the weight of indels in genomic distances
title_fullStr On the weight of indels in genomic distances
title_full_unstemmed On the weight of indels in genomic distances
title_short On the weight of indels in genomic distances
title_sort on the weight of indels in genomic distances
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3283315/
https://www.ncbi.nlm.nih.gov/pubmed/22151784
http://dx.doi.org/10.1186/1471-2105-12-S9-S13
work_keys_str_mv AT bragamariliadv ontheweightofindelsingenomicdistances
AT machadoraphael ontheweightofindelsingenomicdistances
AT ribeiroleonardoc ontheweightofindelsingenomicdistances
AT stoyejens ontheweightofindelsingenomicdistances