Cargando…

AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references

Motivation: De novo assemblies of genomes remain one of the most challenging applications in next-generation sequencing. Usually, their results are incomplete and fragmented into hundreds of contigs. Repeats in genomes and sequencing errors are the main reasons for these complications. With the rapi...

Descripción completa

Detalles Bibliográficos
Autores principales: Bao, Ergude, Jiang, Tao, Girke, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058956/
https://www.ncbi.nlm.nih.gov/pubmed/24932000
http://dx.doi.org/10.1093/bioinformatics/btu291
_version_ 1782321193773170688
author Bao, Ergude
Jiang, Tao
Girke, Thomas
author_facet Bao, Ergude
Jiang, Tao
Girke, Thomas
author_sort Bao, Ergude
collection PubMed
description Motivation: De novo assemblies of genomes remain one of the most challenging applications in next-generation sequencing. Usually, their results are incomplete and fragmented into hundreds of contigs. Repeats in genomes and sequencing errors are the main reasons for these complications. With the rapidly growing number of sequenced genomes, it is now feasible to improve assemblies by guiding them with genomes from related species. Results: Here we introduce AlignGraph, an algorithm for extending and joining de novo-assembled contigs or scaffolds guided by closely related reference genomes. It aligns paired-end (PE) reads and preassembled contigs or scaffolds to a close reference. From the obtained alignments, it builds a novel data structure, called the PE multipositional de Bruijn graph. The incorporated positional information from the alignments and PE reads allows us to extend the initial assemblies, while avoiding incorrect extensions and early terminations. In our performance tests, AlignGraph was able to substantially improve the contigs and scaffolds from several assemblers. For instance, 28.7–62.3% of the contigs of Arabidopsis thaliana and human could be extended, resulting in improvements of common assembly metrics, such as an increase of the N50 of the extendable contigs by 89.9–94.5% and 80.3–165.8%, respectively. In another test, AlignGraph was able to improve the assembly of a published genome (Arabidopsis strain Landsberg) by increasing the N50 of its extendable scaffolds by 86.6%. These results demonstrate AlignGraph’s efficiency in improving genome assemblies by taking advantage of closely related references. Availability and implementation: The AlignGraph software can be downloaded for free from this site: https://github.com/baoe/AlignGraph. Contact: thomas.girke@ucr.edu
format Online
Article
Text
id pubmed-4058956
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-40589562014-06-18 AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references Bao, Ergude Jiang, Tao Girke, Thomas Bioinformatics Ismb 2014 Proceedings Papers Committee Motivation: De novo assemblies of genomes remain one of the most challenging applications in next-generation sequencing. Usually, their results are incomplete and fragmented into hundreds of contigs. Repeats in genomes and sequencing errors are the main reasons for these complications. With the rapidly growing number of sequenced genomes, it is now feasible to improve assemblies by guiding them with genomes from related species. Results: Here we introduce AlignGraph, an algorithm for extending and joining de novo-assembled contigs or scaffolds guided by closely related reference genomes. It aligns paired-end (PE) reads and preassembled contigs or scaffolds to a close reference. From the obtained alignments, it builds a novel data structure, called the PE multipositional de Bruijn graph. The incorporated positional information from the alignments and PE reads allows us to extend the initial assemblies, while avoiding incorrect extensions and early terminations. In our performance tests, AlignGraph was able to substantially improve the contigs and scaffolds from several assemblers. For instance, 28.7–62.3% of the contigs of Arabidopsis thaliana and human could be extended, resulting in improvements of common assembly metrics, such as an increase of the N50 of the extendable contigs by 89.9–94.5% and 80.3–165.8%, respectively. In another test, AlignGraph was able to improve the assembly of a published genome (Arabidopsis strain Landsberg) by increasing the N50 of its extendable scaffolds by 86.6%. These results demonstrate AlignGraph’s efficiency in improving genome assemblies by taking advantage of closely related references. Availability and implementation: The AlignGraph software can be downloaded for free from this site: https://github.com/baoe/AlignGraph. Contact: thomas.girke@ucr.edu Oxford University Press 2014-06-15 2014-06-11 /pmc/articles/PMC4058956/ /pubmed/24932000 http://dx.doi.org/10.1093/bioinformatics/btu291 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2014 Proceedings Papers Committee
Bao, Ergude
Jiang, Tao
Girke, Thomas
AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
title AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
title_full AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
title_fullStr AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
title_full_unstemmed AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
title_short AlignGraph: algorithm for secondary de novo genome assembly guided by closely related references
title_sort aligngraph: algorithm for secondary de novo genome assembly guided by closely related references
topic Ismb 2014 Proceedings Papers Committee
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4058956/
https://www.ncbi.nlm.nih.gov/pubmed/24932000
http://dx.doi.org/10.1093/bioinformatics/btu291
work_keys_str_mv AT baoergude aligngraphalgorithmforsecondarydenovogenomeassemblyguidedbycloselyrelatedreferences
AT jiangtao aligngraphalgorithmforsecondarydenovogenomeassemblyguidedbycloselyrelatedreferences
AT girkethomas aligngraphalgorithmforsecondarydenovogenomeassemblyguidedbycloselyrelatedreferences