Cargando…

Locating rearrangement events in a phylogeny based on highly fragmented assemblies

BACKGROUND: The inference of genome rearrangement operations requires complete genome assemblies as input data, since a rearrangement can involve an arbitrarily large proportion of one or more chromosomes. Most genome sequence projects, especially those on non-model organisms for which no physical m...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Chunfang, Sankoff, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895484/
https://www.ncbi.nlm.nih.gov/pubmed/26818753
http://dx.doi.org/10.1186/s12864-015-2294-6
_version_ 1782435856745758720
author Zheng, Chunfang
Sankoff, David
author_facet Zheng, Chunfang
Sankoff, David
author_sort Zheng, Chunfang
collection PubMed
description BACKGROUND: The inference of genome rearrangement operations requires complete genome assemblies as input data, since a rearrangement can involve an arbitrarily large proportion of one or more chromosomes. Most genome sequence projects, especially those on non-model organisms for which no physical map exists, produce very fragmented assembles, so that a rearranged fragment may be impossible to identify because its two endpoints are on different scaffolds. However, breakpoints are easily identified, as long as they do not coincide with scaffold ends. For the phylogenetic context, in comparing a fragmented assembly with a number of complete assemblies, certain combinatorial constraints on breakpoints can be derived. We ask to what extent we can use breakpoint data between a fragmented genome and a number of complete genomes to recover all the arrangements in a phylogeny. RESULTS: We simulate genomic evolution via chromosomal inversion, fragmenting one of the genomes into a large number of scaffolds to represent the incompleteness of assembly. We identify all the breakpoints between this genome and the remainder. We devise an algorithm which takes these breakpoints into account in trying to determine on which branch of the phylogeny a rearrangement event occurred. We present an analysis of the dependence of recovery rates on scaffold size and rearrangement rate, and show that the true tree, the one on which the rearrangement simulation was performed, tends to be most parsimonious in estimating the number of true events inferred. CONCLUSIONS: It is somewhat surprising that the breakpoints identified just between the fragmented genome and each of the others suffice to recover most of the rearrangements produced by the simulations. This holds even in parts of the phylogeny disjoint from the lineage of the fragmented genome.
format Online
Article
Text
id pubmed-4895484
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-48954842016-06-10 Locating rearrangement events in a phylogeny based on highly fragmented assemblies Zheng, Chunfang Sankoff, David BMC Genomics Research BACKGROUND: The inference of genome rearrangement operations requires complete genome assemblies as input data, since a rearrangement can involve an arbitrarily large proportion of one or more chromosomes. Most genome sequence projects, especially those on non-model organisms for which no physical map exists, produce very fragmented assembles, so that a rearranged fragment may be impossible to identify because its two endpoints are on different scaffolds. However, breakpoints are easily identified, as long as they do not coincide with scaffold ends. For the phylogenetic context, in comparing a fragmented assembly with a number of complete assemblies, certain combinatorial constraints on breakpoints can be derived. We ask to what extent we can use breakpoint data between a fragmented genome and a number of complete genomes to recover all the arrangements in a phylogeny. RESULTS: We simulate genomic evolution via chromosomal inversion, fragmenting one of the genomes into a large number of scaffolds to represent the incompleteness of assembly. We identify all the breakpoints between this genome and the remainder. We devise an algorithm which takes these breakpoints into account in trying to determine on which branch of the phylogeny a rearrangement event occurred. We present an analysis of the dependence of recovery rates on scaffold size and rearrangement rate, and show that the true tree, the one on which the rearrangement simulation was performed, tends to be most parsimonious in estimating the number of true events inferred. CONCLUSIONS: It is somewhat surprising that the breakpoints identified just between the fragmented genome and each of the others suffice to recover most of the rearrangements produced by the simulations. This holds even in parts of the phylogeny disjoint from the lineage of the fragmented genome. BioMed Central 2016-01-11 /pmc/articles/PMC4895484/ /pubmed/26818753 http://dx.doi.org/10.1186/s12864-015-2294-6 Text en © Zheng and Sankoff. 2015 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zheng, Chunfang
Sankoff, David
Locating rearrangement events in a phylogeny based on highly fragmented assemblies
title Locating rearrangement events in a phylogeny based on highly fragmented assemblies
title_full Locating rearrangement events in a phylogeny based on highly fragmented assemblies
title_fullStr Locating rearrangement events in a phylogeny based on highly fragmented assemblies
title_full_unstemmed Locating rearrangement events in a phylogeny based on highly fragmented assemblies
title_short Locating rearrangement events in a phylogeny based on highly fragmented assemblies
title_sort locating rearrangement events in a phylogeny based on highly fragmented assemblies
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4895484/
https://www.ncbi.nlm.nih.gov/pubmed/26818753
http://dx.doi.org/10.1186/s12864-015-2294-6
work_keys_str_mv AT zhengchunfang locatingrearrangementeventsinaphylogenybasedonhighlyfragmentedassemblies
AT sankoffdavid locatingrearrangementeventsinaphylogenybasedonhighlyfragmentedassemblies