Cargando…
The gene family-free median of three
BACKGROUND: The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which ask...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5446766/ https://www.ncbi.nlm.nih.gov/pubmed/28559921 http://dx.doi.org/10.1186/s13015-017-0106-z |
_version_ | 1783239161139953664 |
---|---|
author | Doerr, Daniel Balaban, Metin Feijão, Pedro Chauve, Cedric |
author_facet | Doerr, Daniel Balaban, Metin Feijão, Pedro Chauve, Cedric |
author_sort | Doerr, Daniel |
collection | PubMed |
description | BACKGROUND: The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes. METHODS: We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of [Formula: see text] and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice. RESULTS: The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR. CONCLUSIONS: We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13015-017-0106-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5446766 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-54467662017-05-30 The gene family-free median of three Doerr, Daniel Balaban, Metin Feijão, Pedro Chauve, Cedric Algorithms Mol Biol Research BACKGROUND: The gene family-free framework for comparative genomics aims at providing methods for gene order analysis that do not require prior gene family assignment, but work directly on a sequence similarity graph. We study two problems related to the breakpoint median of three genomes, which asks for the construction of a fourth genome that minimizes the sum of breakpoint distances to the input genomes. METHODS: We present a model for constructing a median of three genomes in this family-free setting, based on maximizing an objective function that generalizes the classical breakpoint distance by integrating sequence similarity in the score of a gene adjacency. We study its computational complexity and we describe an integer linear program (ILP) for its exact solution. We further discuss a related problem called family-free adjacencies for k genomes for the special case of [Formula: see text] and present an ILP for its solution. However, for this problem, the computation of exact solutions remains intractable for sufficiently large instances. We then proceed to describe a heuristic method, FFAdj-AM, which performs well in practice. RESULTS: The developed methods compute accurate positional orthologs for genomes comparable in size of bacterial genomes on simulated data and genomic data acquired from the OMA orthology database. In particular, FFAdj-AM performs equally or better when compared to the well-established gene family prediction tool MultiMSOAR. CONCLUSIONS: We study the computational complexity of a new family-free model and present algorithms for its solution. With FFAdj-AM, we propose an appealing alternative to established tools for identifying higher confidence positional orthologs. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13015-017-0106-z) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-26 /pmc/articles/PMC5446766/ /pubmed/28559921 http://dx.doi.org/10.1186/s13015-017-0106-z Text en © The Author(s) 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Doerr, Daniel Balaban, Metin Feijão, Pedro Chauve, Cedric The gene family-free median of three |
title | The gene family-free median of three |
title_full | The gene family-free median of three |
title_fullStr | The gene family-free median of three |
title_full_unstemmed | The gene family-free median of three |
title_short | The gene family-free median of three |
title_sort | gene family-free median of three |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5446766/ https://www.ncbi.nlm.nih.gov/pubmed/28559921 http://dx.doi.org/10.1186/s13015-017-0106-z |
work_keys_str_mv | AT doerrdaniel thegenefamilyfreemedianofthree AT balabanmetin thegenefamilyfreemedianofthree AT feijaopedro thegenefamilyfreemedianofthree AT chauvecedric thegenefamilyfreemedianofthree AT doerrdaniel genefamilyfreemedianofthree AT balabanmetin genefamilyfreemedianofthree AT feijaopedro genefamilyfreemedianofthree AT chauvecedric genefamilyfreemedianofthree |