Cargando…

Scaffold filling, contig fusion and comparative gene order inference

BACKGROUND: There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, r...

Descripción completa

Detalles Bibliográficos
Autores principales: Muñoz, Adriana, Zheng, Chunfang, Zhu, Qian, Albert, Victor A, Rounsley, Steve, Sankoff, David
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2902449/
https://www.ncbi.nlm.nih.gov/pubmed/20525342
http://dx.doi.org/10.1186/1471-2105-11-304
_version_ 1782183762595938304
author Muñoz, Adriana
Zheng, Chunfang
Zhu, Qian
Albert, Victor A
Rounsley, Steve
Sankoff, David
author_facet Muñoz, Adriana
Zheng, Chunfang
Zhu, Qian
Albert, Victor A
Rounsley, Steve
Sankoff, David
author_sort Muñoz, Adriana
collection PubMed
description BACKGROUND: There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes? RESULTS: Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera. CONCLUSIONS: The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method.
format Text
id pubmed-2902449
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-29024492010-07-13 Scaffold filling, contig fusion and comparative gene order inference Muñoz, Adriana Zheng, Chunfang Zhu, Qian Albert, Victor A Rounsley, Steve Sankoff, David BMC Bioinformatics Research article BACKGROUND: There has been a trend in increasing the phylogenetic scope of genome sequencing without finishing the sequence of the genome. Increasing numbers of genomes are being published in scaffold or contig form. Rearrangement algorithms, however, including gene order-based phylogenetic tools, require whole genome data on gene order or syntenic block order. How then can we use rearrangement algorithms to compare genomes available in scaffold form only? Can the comparative evidence predict the location of unsequenced genes? RESULTS: Our method involves optimally filling in genes missing from the scaffolds, while incorporating the augmented scaffolds directly into the rearrangement algorithms as if they were chromosomes. This is accomplished by an exact, polynomial-time algorithm. We then correct for the number of extra fusion/fission operations required to make scaffolds comparable to full assemblies. We model the relationship between the ratio of missing genes actually absent from the genome versus merely unsequenced ones, on one hand, and the increase of genomic distance after scaffold filling, on the other. We estimate the parameters of this model through simulations and by comparing the angiosperm genomes Ricinus communis and Vitis vinifera. CONCLUSIONS: The algorithm solves the comparison of genomes with 18,300 genes, including 4500 missing from one genome, in less than a minute on a MacBook, putting virtually all genomes within range of the method. BioMed Central 2010-06-04 /pmc/articles/PMC2902449/ /pubmed/20525342 http://dx.doi.org/10.1186/1471-2105-11-304 Text en Copyright ©2010 Muñoz et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research article
Muñoz, Adriana
Zheng, Chunfang
Zhu, Qian
Albert, Victor A
Rounsley, Steve
Sankoff, David
Scaffold filling, contig fusion and comparative gene order inference
title Scaffold filling, contig fusion and comparative gene order inference
title_full Scaffold filling, contig fusion and comparative gene order inference
title_fullStr Scaffold filling, contig fusion and comparative gene order inference
title_full_unstemmed Scaffold filling, contig fusion and comparative gene order inference
title_short Scaffold filling, contig fusion and comparative gene order inference
title_sort scaffold filling, contig fusion and comparative gene order inference
topic Research article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2902449/
https://www.ncbi.nlm.nih.gov/pubmed/20525342
http://dx.doi.org/10.1186/1471-2105-11-304
work_keys_str_mv AT munozadriana scaffoldfillingcontigfusionandcomparativegeneorderinference
AT zhengchunfang scaffoldfillingcontigfusionandcomparativegeneorderinference
AT zhuqian scaffoldfillingcontigfusionandcomparativegeneorderinference
AT albertvictora scaffoldfillingcontigfusionandcomparativegeneorderinference
AT rounsleysteve scaffoldfillingcontigfusionandcomparativegeneorderinference
AT sankoffdavid scaffoldfillingcontigfusionandcomparativegeneorderinference