Cargando…

Coordinate systems for supergenomes

BACKGROUND: Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools t...

Descripción completa

Detalles Bibliográficos
Autores principales: Gärtner, Fabian, Höner zu Siederdissen, Christian, Müller, Lydia, Stadler, Peter F.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151955/
https://www.ncbi.nlm.nih.gov/pubmed/30258487
http://dx.doi.org/10.1186/s13015-018-0133-4
_version_ 1783357262063992832
author Gärtner, Fabian
Höner zu Siederdissen, Christian
Müller, Lydia
Stadler, Peter F.
author_facet Gärtner, Fabian
Höner zu Siederdissen, Christian
Müller, Lydia
Stadler, Peter F.
author_sort Gärtner, Fabian
collection PubMed
description BACKGROUND: Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools to visualize-omics data simultaneously for multiple species are sorely lacking. A first and crucial step in this direction is the construction of a common coordinate system. Since genomes not only differ by rearrangements but also by large insertions, deletions, and duplications, the use of a single reference genome is insufficient, in particular when the number of species becomes large. RESULTS: The computational problem then becomes to determine an order and orientations of optimal local alignments that are as co-linear as possible with all the genome sequences. We first review the most prominent approaches to model the problem formally and then proceed to showing that it can be phrased as a particular variant of the Betweenness Problem. It is NP hard in general. As exact solutions are beyond reach for the problem sizes of practical interest, we introduce a collection of heuristic simplifiers to resolve ordering conflicts. CONCLUSION: Benchmarks on real-life data ranging from bacterial to fly genomes demonstrate the feasibility of computing good common coordinate systems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13015-018-0133-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6151955
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61519552018-09-26 Coordinate systems for supergenomes Gärtner, Fabian Höner zu Siederdissen, Christian Müller, Lydia Stadler, Peter F. Algorithms Mol Biol Research BACKGROUND: Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools to visualize-omics data simultaneously for multiple species are sorely lacking. A first and crucial step in this direction is the construction of a common coordinate system. Since genomes not only differ by rearrangements but also by large insertions, deletions, and duplications, the use of a single reference genome is insufficient, in particular when the number of species becomes large. RESULTS: The computational problem then becomes to determine an order and orientations of optimal local alignments that are as co-linear as possible with all the genome sequences. We first review the most prominent approaches to model the problem formally and then proceed to showing that it can be phrased as a particular variant of the Betweenness Problem. It is NP hard in general. As exact solutions are beyond reach for the problem sizes of practical interest, we introduce a collection of heuristic simplifiers to resolve ordering conflicts. CONCLUSION: Benchmarks on real-life data ranging from bacterial to fly genomes demonstrate the feasibility of computing good common coordinate systems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13015-018-0133-4) contains supplementary material, which is available to authorized users. BioMed Central 2018-09-24 /pmc/articles/PMC6151955/ /pubmed/30258487 http://dx.doi.org/10.1186/s13015-018-0133-4 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Gärtner, Fabian
Höner zu Siederdissen, Christian
Müller, Lydia
Stadler, Peter F.
Coordinate systems for supergenomes
title Coordinate systems for supergenomes
title_full Coordinate systems for supergenomes
title_fullStr Coordinate systems for supergenomes
title_full_unstemmed Coordinate systems for supergenomes
title_short Coordinate systems for supergenomes
title_sort coordinate systems for supergenomes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151955/
https://www.ncbi.nlm.nih.gov/pubmed/30258487
http://dx.doi.org/10.1186/s13015-018-0133-4
work_keys_str_mv AT gartnerfabian coordinatesystemsforsupergenomes
AT honerzusiederdissenchristian coordinatesystemsforsupergenomes
AT mullerlydia coordinatesystemsforsupergenomes
AT stadlerpeterf coordinatesystemsforsupergenomes