Cargando…

Inferring synteny between genome assemblies: a systematic evaluation

BACKGROUND: Genome assemblies across all domains of life are being produced routinely. Initial analysis of a new genome usually includes annotation and comparative genomics. Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of differe...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Dang, Hunt, Martin, Tsai, Isheng J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5791376/
https://www.ncbi.nlm.nih.gov/pubmed/29382321
http://dx.doi.org/10.1186/s12859-018-2026-4
_version_ 1783296622722023424
author Liu, Dang
Hunt, Martin
Tsai, Isheng J
author_facet Liu, Dang
Hunt, Martin
Tsai, Isheng J
author_sort Liu, Dang
collection PubMed
description BACKGROUND: Genome assemblies across all domains of life are being produced routinely. Initial analysis of a new genome usually includes annotation and comparative genomics. Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of different species. The availability of human and mouse genomes paved the way for algorithm development in large-scale synteny mapping, which eventually became an integral part of comparative genomics. Synteny analysis is regularly performed on assembled sequences that are fragmented, neglecting the fact that most methods were developed using complete genomes. It is unknown to what extent draft assemblies lead to errors in such analysis. RESULTS: We fragmented genome assemblies of model nematodes to various extents and conducted synteny identification and downstream analysis. We first show that synteny between species can be underestimated up to 40% and find disagreements between popular tools that infer synteny blocks. This inconsistency and further demonstration of erroneous gene ontology enrichment tests raise questions about the robustness of previous synteny analysis when gold standard genome sequences remain limited. In addition, assembly scaffolding using a reference guided approach with a closely related species may result in chimeric scaffolds with inflated assembly metrics if a true evolutionary relationship was overlooked. Annotation quality, however, has minimal effect on synteny if the assembled genome is highly contiguous. CONCLUSIONS: Our results show that a minimum N50 of 1 Mb is required for robust downstream synteny analysis, which emphasizes the importance of gold standard genomes to the science community, and should be achieved given the current progress in sequencing technology. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2026-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5791376
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57913762018-02-12 Inferring synteny between genome assemblies: a systematic evaluation Liu, Dang Hunt, Martin Tsai, Isheng J BMC Bioinformatics Research Article BACKGROUND: Genome assemblies across all domains of life are being produced routinely. Initial analysis of a new genome usually includes annotation and comparative genomics. Synteny provides a framework in which conservation of homologous genes and gene order is identified between genomes of different species. The availability of human and mouse genomes paved the way for algorithm development in large-scale synteny mapping, which eventually became an integral part of comparative genomics. Synteny analysis is regularly performed on assembled sequences that are fragmented, neglecting the fact that most methods were developed using complete genomes. It is unknown to what extent draft assemblies lead to errors in such analysis. RESULTS: We fragmented genome assemblies of model nematodes to various extents and conducted synteny identification and downstream analysis. We first show that synteny between species can be underestimated up to 40% and find disagreements between popular tools that infer synteny blocks. This inconsistency and further demonstration of erroneous gene ontology enrichment tests raise questions about the robustness of previous synteny analysis when gold standard genome sequences remain limited. In addition, assembly scaffolding using a reference guided approach with a closely related species may result in chimeric scaffolds with inflated assembly metrics if a true evolutionary relationship was overlooked. Annotation quality, however, has minimal effect on synteny if the assembled genome is highly contiguous. CONCLUSIONS: Our results show that a minimum N50 of 1 Mb is required for robust downstream synteny analysis, which emphasizes the importance of gold standard genomes to the science community, and should be achieved given the current progress in sequencing technology. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2026-4) contains supplementary material, which is available to authorized users. BioMed Central 2018-01-30 /pmc/articles/PMC5791376/ /pubmed/29382321 http://dx.doi.org/10.1186/s12859-018-2026-4 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Liu, Dang
Hunt, Martin
Tsai, Isheng J
Inferring synteny between genome assemblies: a systematic evaluation
title Inferring synteny between genome assemblies: a systematic evaluation
title_full Inferring synteny between genome assemblies: a systematic evaluation
title_fullStr Inferring synteny between genome assemblies: a systematic evaluation
title_full_unstemmed Inferring synteny between genome assemblies: a systematic evaluation
title_short Inferring synteny between genome assemblies: a systematic evaluation
title_sort inferring synteny between genome assemblies: a systematic evaluation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5791376/
https://www.ncbi.nlm.nih.gov/pubmed/29382321
http://dx.doi.org/10.1186/s12859-018-2026-4
work_keys_str_mv AT liudang inferringsyntenybetweengenomeassembliesasystematicevaluation
AT huntmartin inferringsyntenybetweengenomeassembliesasystematicevaluation
AT tsaiishengj inferringsyntenybetweengenomeassembliesasystematicevaluation