Cargando…

ILP-based maximum likelihood genome scaffolding

BACKGROUND: Interest in de novo genome assembly has been renewed in the past decade due to rapid advances in high-throughput sequencing (HTS) technologies which generate relatively short reads resulting in highly fragmented assemblies consisting of contigs. Additional long-range linkage information...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lindsay, James, Salooti, Hamed, Măndoiu, Ion, Zelikovsky, Alex
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168704/ https://www.ncbi.nlm.nih.gov/pubmed/25253180 http://dx.doi.org/10.1186/1471-2105-15-S9-S9

_version_	1782335602488770560
author	Lindsay, James Salooti, Hamed Măndoiu, Ion Zelikovsky, Alex
author_facet	Lindsay, James Salooti, Hamed Măndoiu, Ion Zelikovsky, Alex
author_sort	Lindsay, James
collection	PubMed
description	BACKGROUND: Interest in de novo genome assembly has been renewed in the past decade due to rapid advances in high-throughput sequencing (HTS) technologies which generate relatively short reads resulting in highly fragmented assemblies consisting of contigs. Additional long-range linkage information is typically used to orient, order, and link contigs into larger structures referred to as scaffolds. Due to library preparation artifacts and erroneous mapping of reads originating from repeats, scaffolding remains a challenging problem. In this paper, we provide a scalable scaffolding algorithm (SILP2) employing a maximum likelihood model capturing read mapping uncertainty and/or non-uniformity of contig coverage which is solved using integer linear programming. A Non-Serial Dynamic Programming (NSDP) paradigm is applied to render our algorithm useful in the processing of larger mammalian genomes. To compare scaffolding tools, we employ novel quantitative metrics in addition to the extant metrics in the field. We have also expanded the set of experiments to include scaffolding of low-complexity metagenomic samples. RESULTS: SILP2 achieves better scalability throughg a more efficient NSDP algorithm than previous release of SILP. The results show that SILP2 compares favorably to previous methods OPERA and MIP in both scalability and accuracy for scaffolding single genomes of up to human size, and significantly outperforms them on scaffolding low-complexity metagenomic samples. CONCLUSIONS: Equipped with NSDP, SILP2 is able to scaffold large mammalian genomes, resulting in the longest and most accurate scaffolds. The ILP formulation for the maximum likelihood model is shown to be flexible enough to handle metagenomic samples.
format	Online Article Text
id	pubmed-4168704
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-41687042014-10-02 ILP-based maximum likelihood genome scaffolding Lindsay, James Salooti, Hamed Măndoiu, Ion Zelikovsky, Alex BMC Bioinformatics Proceedings BACKGROUND: Interest in de novo genome assembly has been renewed in the past decade due to rapid advances in high-throughput sequencing (HTS) technologies which generate relatively short reads resulting in highly fragmented assemblies consisting of contigs. Additional long-range linkage information is typically used to orient, order, and link contigs into larger structures referred to as scaffolds. Due to library preparation artifacts and erroneous mapping of reads originating from repeats, scaffolding remains a challenging problem. In this paper, we provide a scalable scaffolding algorithm (SILP2) employing a maximum likelihood model capturing read mapping uncertainty and/or non-uniformity of contig coverage which is solved using integer linear programming. A Non-Serial Dynamic Programming (NSDP) paradigm is applied to render our algorithm useful in the processing of larger mammalian genomes. To compare scaffolding tools, we employ novel quantitative metrics in addition to the extant metrics in the field. We have also expanded the set of experiments to include scaffolding of low-complexity metagenomic samples. RESULTS: SILP2 achieves better scalability throughg a more efficient NSDP algorithm than previous release of SILP. The results show that SILP2 compares favorably to previous methods OPERA and MIP in both scalability and accuracy for scaffolding single genomes of up to human size, and significantly outperforms them on scaffolding low-complexity metagenomic samples. CONCLUSIONS: Equipped with NSDP, SILP2 is able to scaffold large mammalian genomes, resulting in the longest and most accurate scaffolds. The ILP formulation for the maximum likelihood model is shown to be flexible enough to handle metagenomic samples. BioMed Central 2014-09-10 /pmc/articles/PMC4168704/ /pubmed/25253180 http://dx.doi.org/10.1186/1471-2105-15-S9-S9 Text en Copyright © 2014 Lindsay et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Proceedings Lindsay, James Salooti, Hamed Măndoiu, Ion Zelikovsky, Alex ILP-based maximum likelihood genome scaffolding
title	ILP-based maximum likelihood genome scaffolding
title_full	ILP-based maximum likelihood genome scaffolding
title_fullStr	ILP-based maximum likelihood genome scaffolding
title_full_unstemmed	ILP-based maximum likelihood genome scaffolding
title_short	ILP-based maximum likelihood genome scaffolding
title_sort	ilp-based maximum likelihood genome scaffolding
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4168704/ https://www.ncbi.nlm.nih.gov/pubmed/25253180 http://dx.doi.org/10.1186/1471-2105-15-S9-S9
work_keys_str_mv	AT lindsayjames ilpbasedmaximumlikelihoodgenomescaffolding AT salootihamed ilpbasedmaximumlikelihoodgenomescaffolding AT mandoiuion ilpbasedmaximumlikelihoodgenomescaffolding AT zelikovskyalex ilpbasedmaximumlikelihoodgenomescaffolding

ILP-based maximum likelihood genome scaffolding

Ejemplares similares