Cargando…

The combination of direct and paired link graphs can boost repetitive genome assembly

Currently, most paired link based scaffolding algorithms intrinsically mask the sequences between two linked contigs and bypass their direct link information embedded in the original de Bruijn assembly graph. Such disadvantage substantially complicates the scaffolding process and leads to the inabil...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Wenyu, Ji, Peifeng, Zhao, Fangqing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5399794/
https://www.ncbi.nlm.nih.gov/pubmed/27924003
http://dx.doi.org/10.1093/nar/gkw1191
_version_ 1783230705634902016
author Shi, Wenyu
Ji, Peifeng
Zhao, Fangqing
author_facet Shi, Wenyu
Ji, Peifeng
Zhao, Fangqing
author_sort Shi, Wenyu
collection PubMed
description Currently, most paired link based scaffolding algorithms intrinsically mask the sequences between two linked contigs and bypass their direct link information embedded in the original de Bruijn assembly graph. Such disadvantage substantially complicates the scaffolding process and leads to the inability of resolving repetitive contig assembly. Here we present a novel algorithm, inGAP-sf, for effectively generating high-quality and continuous scaffolds. inGAP-sf achieves this by using a new strategy based on the combination of direct link and paired link graphs, in which direct link is used to increase graph connectivity and to decrease graph complexity and paired link is employed to supervise the traversing process on the direct link graph. Such advantage greatly facilitates the assembly of short-repeat enriched regions. Moreover, a new comprehensive decision model is developed to eliminate the noise routes accompanying with the introduced direct link. Through extensive evaluations on both simulated and real datasets, we demonstrated that inGAP-sf outperforms most of the genome scaffolding algorithms by generating more accurate and continuous assembly, especially for short repetitive regions.
format Online
Article
Text
id pubmed-5399794
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-53997942017-04-28 The combination of direct and paired link graphs can boost repetitive genome assembly Shi, Wenyu Ji, Peifeng Zhao, Fangqing Nucleic Acids Res Methods Online Currently, most paired link based scaffolding algorithms intrinsically mask the sequences between two linked contigs and bypass their direct link information embedded in the original de Bruijn assembly graph. Such disadvantage substantially complicates the scaffolding process and leads to the inability of resolving repetitive contig assembly. Here we present a novel algorithm, inGAP-sf, for effectively generating high-quality and continuous scaffolds. inGAP-sf achieves this by using a new strategy based on the combination of direct link and paired link graphs, in which direct link is used to increase graph connectivity and to decrease graph complexity and paired link is employed to supervise the traversing process on the direct link graph. Such advantage greatly facilitates the assembly of short-repeat enriched regions. Moreover, a new comprehensive decision model is developed to eliminate the noise routes accompanying with the introduced direct link. Through extensive evaluations on both simulated and real datasets, we demonstrated that inGAP-sf outperforms most of the genome scaffolding algorithms by generating more accurate and continuous assembly, especially for short repetitive regions. Oxford University Press 2017-04-07 2016-12-06 /pmc/articles/PMC5399794/ /pubmed/27924003 http://dx.doi.org/10.1093/nar/gkw1191 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Shi, Wenyu
Ji, Peifeng
Zhao, Fangqing
The combination of direct and paired link graphs can boost repetitive genome assembly
title The combination of direct and paired link graphs can boost repetitive genome assembly
title_full The combination of direct and paired link graphs can boost repetitive genome assembly
title_fullStr The combination of direct and paired link graphs can boost repetitive genome assembly
title_full_unstemmed The combination of direct and paired link graphs can boost repetitive genome assembly
title_short The combination of direct and paired link graphs can boost repetitive genome assembly
title_sort combination of direct and paired link graphs can boost repetitive genome assembly
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5399794/
https://www.ncbi.nlm.nih.gov/pubmed/27924003
http://dx.doi.org/10.1093/nar/gkw1191
work_keys_str_mv AT shiwenyu thecombinationofdirectandpairedlinkgraphscanboostrepetitivegenomeassembly
AT jipeifeng thecombinationofdirectandpairedlinkgraphscanboostrepetitivegenomeassembly
AT zhaofangqing thecombinationofdirectandpairedlinkgraphscanboostrepetitivegenomeassembly
AT shiwenyu combinationofdirectandpairedlinkgraphscanboostrepetitivegenomeassembly
AT jipeifeng combinationofdirectandpairedlinkgraphscanboostrepetitivegenomeassembly
AT zhaofangqing combinationofdirectandpairedlinkgraphscanboostrepetitivegenomeassembly