Cargando…
Coordinate systems for supergenomes
BACKGROUND: Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools t...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151955/ https://www.ncbi.nlm.nih.gov/pubmed/30258487 http://dx.doi.org/10.1186/s13015-018-0133-4 |
_version_ | 1783357262063992832 |
---|---|
author | Gärtner, Fabian Höner zu Siederdissen, Christian Müller, Lydia Stadler, Peter F. |
author_facet | Gärtner, Fabian Höner zu Siederdissen, Christian Müller, Lydia Stadler, Peter F. |
author_sort | Gärtner, Fabian |
collection | PubMed |
description | BACKGROUND: Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools to visualize-omics data simultaneously for multiple species are sorely lacking. A first and crucial step in this direction is the construction of a common coordinate system. Since genomes not only differ by rearrangements but also by large insertions, deletions, and duplications, the use of a single reference genome is insufficient, in particular when the number of species becomes large. RESULTS: The computational problem then becomes to determine an order and orientations of optimal local alignments that are as co-linear as possible with all the genome sequences. We first review the most prominent approaches to model the problem formally and then proceed to showing that it can be phrased as a particular variant of the Betweenness Problem. It is NP hard in general. As exact solutions are beyond reach for the problem sizes of practical interest, we introduce a collection of heuristic simplifiers to resolve ordering conflicts. CONCLUSION: Benchmarks on real-life data ranging from bacterial to fly genomes demonstrate the feasibility of computing good common coordinate systems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13015-018-0133-4) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6151955 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-61519552018-09-26 Coordinate systems for supergenomes Gärtner, Fabian Höner zu Siederdissen, Christian Müller, Lydia Stadler, Peter F. Algorithms Mol Biol Research BACKGROUND: Genome sequences and genome annotation data have become available at ever increasing rates in response to the rapid progress in sequencing technologies. As a consequence the demand for methods supporting comparative, evolutionary analysis is also growing. In particular, efficient tools to visualize-omics data simultaneously for multiple species are sorely lacking. A first and crucial step in this direction is the construction of a common coordinate system. Since genomes not only differ by rearrangements but also by large insertions, deletions, and duplications, the use of a single reference genome is insufficient, in particular when the number of species becomes large. RESULTS: The computational problem then becomes to determine an order and orientations of optimal local alignments that are as co-linear as possible with all the genome sequences. We first review the most prominent approaches to model the problem formally and then proceed to showing that it can be phrased as a particular variant of the Betweenness Problem. It is NP hard in general. As exact solutions are beyond reach for the problem sizes of practical interest, we introduce a collection of heuristic simplifiers to resolve ordering conflicts. CONCLUSION: Benchmarks on real-life data ranging from bacterial to fly genomes demonstrate the feasibility of computing good common coordinate systems. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13015-018-0133-4) contains supplementary material, which is available to authorized users. BioMed Central 2018-09-24 /pmc/articles/PMC6151955/ /pubmed/30258487 http://dx.doi.org/10.1186/s13015-018-0133-4 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Gärtner, Fabian Höner zu Siederdissen, Christian Müller, Lydia Stadler, Peter F. Coordinate systems for supergenomes |
title | Coordinate systems for supergenomes |
title_full | Coordinate systems for supergenomes |
title_fullStr | Coordinate systems for supergenomes |
title_full_unstemmed | Coordinate systems for supergenomes |
title_short | Coordinate systems for supergenomes |
title_sort | coordinate systems for supergenomes |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6151955/ https://www.ncbi.nlm.nih.gov/pubmed/30258487 http://dx.doi.org/10.1186/s13015-018-0133-4 |
work_keys_str_mv | AT gartnerfabian coordinatesystemsforsupergenomes AT honerzusiederdissenchristian coordinatesystemsforsupergenomes AT mullerlydia coordinatesystemsforsupergenomes AT stadlerpeterf coordinatesystemsforsupergenomes |