Cargando…
The combination of direct and paired link graphs can boost repetitive genome assembly
Currently, most paired link based scaffolding algorithms intrinsically mask the sequences between two linked contigs and bypass their direct link information embedded in the original de Bruijn assembly graph. Such disadvantage substantially complicates the scaffolding process and leads to the inabil...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5399794/ https://www.ncbi.nlm.nih.gov/pubmed/27924003 http://dx.doi.org/10.1093/nar/gkw1191 |
_version_ | 1783230705634902016 |
---|---|
author | Shi, Wenyu Ji, Peifeng Zhao, Fangqing |
author_facet | Shi, Wenyu Ji, Peifeng Zhao, Fangqing |
author_sort | Shi, Wenyu |
collection | PubMed |
description | Currently, most paired link based scaffolding algorithms intrinsically mask the sequences between two linked contigs and bypass their direct link information embedded in the original de Bruijn assembly graph. Such disadvantage substantially complicates the scaffolding process and leads to the inability of resolving repetitive contig assembly. Here we present a novel algorithm, inGAP-sf, for effectively generating high-quality and continuous scaffolds. inGAP-sf achieves this by using a new strategy based on the combination of direct link and paired link graphs, in which direct link is used to increase graph connectivity and to decrease graph complexity and paired link is employed to supervise the traversing process on the direct link graph. Such advantage greatly facilitates the assembly of short-repeat enriched regions. Moreover, a new comprehensive decision model is developed to eliminate the noise routes accompanying with the introduced direct link. Through extensive evaluations on both simulated and real datasets, we demonstrated that inGAP-sf outperforms most of the genome scaffolding algorithms by generating more accurate and continuous assembly, especially for short repetitive regions. |
format | Online Article Text |
id | pubmed-5399794 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-53997942017-04-28 The combination of direct and paired link graphs can boost repetitive genome assembly Shi, Wenyu Ji, Peifeng Zhao, Fangqing Nucleic Acids Res Methods Online Currently, most paired link based scaffolding algorithms intrinsically mask the sequences between two linked contigs and bypass their direct link information embedded in the original de Bruijn assembly graph. Such disadvantage substantially complicates the scaffolding process and leads to the inability of resolving repetitive contig assembly. Here we present a novel algorithm, inGAP-sf, for effectively generating high-quality and continuous scaffolds. inGAP-sf achieves this by using a new strategy based on the combination of direct link and paired link graphs, in which direct link is used to increase graph connectivity and to decrease graph complexity and paired link is employed to supervise the traversing process on the direct link graph. Such advantage greatly facilitates the assembly of short-repeat enriched regions. Moreover, a new comprehensive decision model is developed to eliminate the noise routes accompanying with the introduced direct link. Through extensive evaluations on both simulated and real datasets, we demonstrated that inGAP-sf outperforms most of the genome scaffolding algorithms by generating more accurate and continuous assembly, especially for short repetitive regions. Oxford University Press 2017-04-07 2016-12-06 /pmc/articles/PMC5399794/ /pubmed/27924003 http://dx.doi.org/10.1093/nar/gkw1191 Text en © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Shi, Wenyu Ji, Peifeng Zhao, Fangqing The combination of direct and paired link graphs can boost repetitive genome assembly |
title | The combination of direct and paired link graphs can boost repetitive genome assembly |
title_full | The combination of direct and paired link graphs can boost repetitive genome assembly |
title_fullStr | The combination of direct and paired link graphs can boost repetitive genome assembly |
title_full_unstemmed | The combination of direct and paired link graphs can boost repetitive genome assembly |
title_short | The combination of direct and paired link graphs can boost repetitive genome assembly |
title_sort | combination of direct and paired link graphs can boost repetitive genome assembly |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5399794/ https://www.ncbi.nlm.nih.gov/pubmed/27924003 http://dx.doi.org/10.1093/nar/gkw1191 |
work_keys_str_mv | AT shiwenyu thecombinationofdirectandpairedlinkgraphscanboostrepetitivegenomeassembly AT jipeifeng thecombinationofdirectandpairedlinkgraphscanboostrepetitivegenomeassembly AT zhaofangqing thecombinationofdirectandpairedlinkgraphscanboostrepetitivegenomeassembly AT shiwenyu combinationofdirectandpairedlinkgraphscanboostrepetitivegenomeassembly AT jipeifeng combinationofdirectandpairedlinkgraphscanboostrepetitivegenomeassembly AT zhaofangqing combinationofdirectandpairedlinkgraphscanboostrepetitivegenomeassembly |