Cargando…

GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes

BACKGROUND: Next-generation sequencing (NGS) technologies have greatly promoted the genomic study of prokaryotes. However, highly fragmented assemblies due to short reads from NGS are still a limiting factor in gaining insights into the genome biology. Reference-assisted tools are promising in genom...

Descripción completa

Detalles Bibliográficos
Autores principales: Yuan, Lina, Yu, Yang, Zhu, Yanmin, Li, Yulai, Li, Changqing, Li, Rujiao, Ma, Qin, Siu, Gilman Kit-Hang, Yu, Jun, Jiang, Taijiao, Xiao, Jingfa, Kang, Yu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5310280/
https://www.ncbi.nlm.nih.gov/pubmed/28198678
http://dx.doi.org/10.1186/s12864-016-3267-0
_version_ 1782507844666392576
author Yuan, Lina
Yu, Yang
Zhu, Yanmin
Li, Yulai
Li, Changqing
Li, Rujiao
Ma, Qin
Siu, Gilman Kit-Hang
Yu, Jun
Jiang, Taijiao
Xiao, Jingfa
Kang, Yu
author_facet Yuan, Lina
Yu, Yang
Zhu, Yanmin
Li, Yulai
Li, Changqing
Li, Rujiao
Ma, Qin
Siu, Gilman Kit-Hang
Yu, Jun
Jiang, Taijiao
Xiao, Jingfa
Kang, Yu
author_sort Yuan, Lina
collection PubMed
description BACKGROUND: Next-generation sequencing (NGS) technologies have greatly promoted the genomic study of prokaryotes. However, highly fragmented assemblies due to short reads from NGS are still a limiting factor in gaining insights into the genome biology. Reference-assisted tools are promising in genome assembly, but tend to result in false assembly when the assigned reference has extensive rearrangements. RESULTS: Herein, we present GAAP, a genome assembly pipeline for scaffolding based on core-gene-defined Genome Organizational Framework (cGOF) described in our previous study. Instead of assigning references, we use the multiple-reference-derived cGOFs as indexes to assist in order and orientation of the scaffolds and build a skeleton structure, and then use read pairs to extend scaffolds, called local scaffolding, and distinguish between true and chimeric adjacencies in the scaffolds. In our performance tests using both empirical and simulated data of 15 genomes in six species with diverse genome size, complexity, and all three categories of cGOFs, GAAP outcompetes or achieves comparable results when compared to three other reference-assisted programs, AlignGraph, Ragout and MeDuSa. CONCLUSIONS: GAAP uses both cGOF and pair-end reads to create assemblies in genomic scale, and performs better than the currently available reference-assisted assembly tools as it recovers more assemblies and makes fewer false locations, especially for species with extensive rearranged genomes. Our method is a promising solution for reconstruction of genome sequence from short reads of NGS. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3267-0) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5310280
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53102802017-02-22 GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes Yuan, Lina Yu, Yang Zhu, Yanmin Li, Yulai Li, Changqing Li, Rujiao Ma, Qin Siu, Gilman Kit-Hang Yu, Jun Jiang, Taijiao Xiao, Jingfa Kang, Yu BMC Genomics Research BACKGROUND: Next-generation sequencing (NGS) technologies have greatly promoted the genomic study of prokaryotes. However, highly fragmented assemblies due to short reads from NGS are still a limiting factor in gaining insights into the genome biology. Reference-assisted tools are promising in genome assembly, but tend to result in false assembly when the assigned reference has extensive rearrangements. RESULTS: Herein, we present GAAP, a genome assembly pipeline for scaffolding based on core-gene-defined Genome Organizational Framework (cGOF) described in our previous study. Instead of assigning references, we use the multiple-reference-derived cGOFs as indexes to assist in order and orientation of the scaffolds and build a skeleton structure, and then use read pairs to extend scaffolds, called local scaffolding, and distinguish between true and chimeric adjacencies in the scaffolds. In our performance tests using both empirical and simulated data of 15 genomes in six species with diverse genome size, complexity, and all three categories of cGOFs, GAAP outcompetes or achieves comparable results when compared to three other reference-assisted programs, AlignGraph, Ragout and MeDuSa. CONCLUSIONS: GAAP uses both cGOF and pair-end reads to create assemblies in genomic scale, and performs better than the currently available reference-assisted assembly tools as it recovers more assemblies and makes fewer false locations, especially for species with extensive rearranged genomes. Our method is a promising solution for reconstruction of genome sequence from short reads of NGS. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3267-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-25 /pmc/articles/PMC5310280/ /pubmed/28198678 http://dx.doi.org/10.1186/s12864-016-3267-0 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Yuan, Lina
Yu, Yang
Zhu, Yanmin
Li, Yulai
Li, Changqing
Li, Rujiao
Ma, Qin
Siu, Gilman Kit-Hang
Yu, Jun
Jiang, Taijiao
Xiao, Jingfa
Kang, Yu
GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes
title GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes
title_full GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes
title_fullStr GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes
title_full_unstemmed GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes
title_short GAAP: Genome-organization-framework-Assisted Assembly Pipeline for prokaryotic genomes
title_sort gaap: genome-organization-framework-assisted assembly pipeline for prokaryotic genomes
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5310280/
https://www.ncbi.nlm.nih.gov/pubmed/28198678
http://dx.doi.org/10.1186/s12864-016-3267-0
work_keys_str_mv AT yuanlina gaapgenomeorganizationframeworkassistedassemblypipelineforprokaryoticgenomes
AT yuyang gaapgenomeorganizationframeworkassistedassemblypipelineforprokaryoticgenomes
AT zhuyanmin gaapgenomeorganizationframeworkassistedassemblypipelineforprokaryoticgenomes
AT liyulai gaapgenomeorganizationframeworkassistedassemblypipelineforprokaryoticgenomes
AT lichangqing gaapgenomeorganizationframeworkassistedassemblypipelineforprokaryoticgenomes
AT lirujiao gaapgenomeorganizationframeworkassistedassemblypipelineforprokaryoticgenomes
AT maqin gaapgenomeorganizationframeworkassistedassemblypipelineforprokaryoticgenomes
AT siugilmankithang gaapgenomeorganizationframeworkassistedassemblypipelineforprokaryoticgenomes
AT yujun gaapgenomeorganizationframeworkassistedassemblypipelineforprokaryoticgenomes
AT jiangtaijiao gaapgenomeorganizationframeworkassistedassemblypipelineforprokaryoticgenomes
AT xiaojingfa gaapgenomeorganizationframeworkassistedassemblypipelineforprokaryoticgenomes
AT kangyu gaapgenomeorganizationframeworkassistedassemblypipelineforprokaryoticgenomes