Cargando…

ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter

The assembly of DNA sequences de novo is fundamental to genomics research. It is the first of many steps toward elucidating and characterizing whole genomes. Downstream applications, including analysis of genomic variation between species, between or within individuals critically depend on robustly...

Descripción completa

Detalles Bibliográficos
Autores principales: Jackman, Shaun D., Vandervalk, Benjamin P., Mohamadi, Hamid, Chu, Justin, Yeo, Sarah, Hammond, S. Austin, Jahesh, Golnaz, Khan, Hamza, Coombe, Lauren, Warren, Rene L., Birol, Inanc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411771/
https://www.ncbi.nlm.nih.gov/pubmed/28232478
http://dx.doi.org/10.1101/gr.214346.116
_version_ 1783232862795857920
author Jackman, Shaun D.
Vandervalk, Benjamin P.
Mohamadi, Hamid
Chu, Justin
Yeo, Sarah
Hammond, S. Austin
Jahesh, Golnaz
Khan, Hamza
Coombe, Lauren
Warren, Rene L.
Birol, Inanc
author_facet Jackman, Shaun D.
Vandervalk, Benjamin P.
Mohamadi, Hamid
Chu, Justin
Yeo, Sarah
Hammond, S. Austin
Jahesh, Golnaz
Khan, Hamza
Coombe, Lauren
Warren, Rene L.
Birol, Inanc
author_sort Jackman, Shaun D.
collection PubMed
description The assembly of DNA sequences de novo is fundamental to genomics research. It is the first of many steps toward elucidating and characterizing whole genomes. Downstream applications, including analysis of genomic variation between species, between or within individuals critically depend on robustly assembled sequences. In the span of a single decade, the sequence throughput of leading DNA sequencing instruments has increased drastically, and coupled with established and planned large-scale, personalized medicine initiatives to sequence genomes in the thousands and even millions, the development of efficient, scalable and accurate bioinformatics tools for producing high-quality reference draft genomes is timely. With ABySS 1.0, we originally showed that assembling the human genome using short 50-bp sequencing reads was possible by aggregating the half terabyte of compute memory needed over several computers using a standardized message-passing system (MPI). We present here its redesign, which departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements. We benchmarked ABySS 2.0 human genome assembly using a Genome in a Bottle data set of 250-bp Illumina paired-end and 6-kbp mate-pair libraries from a single individual. Our assembly yielded a NG50 (NGA50) scaffold contiguity of 3.5 (3.0) Mbp using <35 GB of RAM. This is a modest memory requirement by today's standards and is often available on a single computer. We also investigate the use of BioNano Genomics and 10x Genomics’ Chromium data to further improve the scaffold NG50 (NGA50) of this assembly to 42 (15) Mbp.
format Online
Article
Text
id pubmed-5411771
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-54117712017-05-16 ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter Jackman, Shaun D. Vandervalk, Benjamin P. Mohamadi, Hamid Chu, Justin Yeo, Sarah Hammond, S. Austin Jahesh, Golnaz Khan, Hamza Coombe, Lauren Warren, Rene L. Birol, Inanc Genome Res Method The assembly of DNA sequences de novo is fundamental to genomics research. It is the first of many steps toward elucidating and characterizing whole genomes. Downstream applications, including analysis of genomic variation between species, between or within individuals critically depend on robustly assembled sequences. In the span of a single decade, the sequence throughput of leading DNA sequencing instruments has increased drastically, and coupled with established and planned large-scale, personalized medicine initiatives to sequence genomes in the thousands and even millions, the development of efficient, scalable and accurate bioinformatics tools for producing high-quality reference draft genomes is timely. With ABySS 1.0, we originally showed that assembling the human genome using short 50-bp sequencing reads was possible by aggregating the half terabyte of compute memory needed over several computers using a standardized message-passing system (MPI). We present here its redesign, which departs from MPI and instead implements algorithms that employ a Bloom filter, a probabilistic data structure, to represent a de Bruijn graph and reduce memory requirements. We benchmarked ABySS 2.0 human genome assembly using a Genome in a Bottle data set of 250-bp Illumina paired-end and 6-kbp mate-pair libraries from a single individual. Our assembly yielded a NG50 (NGA50) scaffold contiguity of 3.5 (3.0) Mbp using <35 GB of RAM. This is a modest memory requirement by today's standards and is often available on a single computer. We also investigate the use of BioNano Genomics and 10x Genomics’ Chromium data to further improve the scaffold NG50 (NGA50) of this assembly to 42 (15) Mbp. Cold Spring Harbor Laboratory Press 2017-05 /pmc/articles/PMC5411771/ /pubmed/28232478 http://dx.doi.org/10.1101/gr.214346.116 Text en © 2017 Jackman et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by/4.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.
spellingShingle Method
Jackman, Shaun D.
Vandervalk, Benjamin P.
Mohamadi, Hamid
Chu, Justin
Yeo, Sarah
Hammond, S. Austin
Jahesh, Golnaz
Khan, Hamza
Coombe, Lauren
Warren, Rene L.
Birol, Inanc
ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter
title ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter
title_full ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter
title_fullStr ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter
title_full_unstemmed ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter
title_short ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter
title_sort abyss 2.0: resource-efficient assembly of large genomes using a bloom filter
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411771/
https://www.ncbi.nlm.nih.gov/pubmed/28232478
http://dx.doi.org/10.1101/gr.214346.116
work_keys_str_mv AT jackmanshaund abyss20resourceefficientassemblyoflargegenomesusingabloomfilter
AT vandervalkbenjaminp abyss20resourceefficientassemblyoflargegenomesusingabloomfilter
AT mohamadihamid abyss20resourceefficientassemblyoflargegenomesusingabloomfilter
AT chujustin abyss20resourceefficientassemblyoflargegenomesusingabloomfilter
AT yeosarah abyss20resourceefficientassemblyoflargegenomesusingabloomfilter
AT hammondsaustin abyss20resourceefficientassemblyoflargegenomesusingabloomfilter
AT jaheshgolnaz abyss20resourceefficientassemblyoflargegenomesusingabloomfilter
AT khanhamza abyss20resourceefficientassemblyoflargegenomesusingabloomfilter
AT coombelauren abyss20resourceefficientassemblyoflargegenomesusingabloomfilter
AT warrenrenel abyss20resourceefficientassemblyoflargegenomesusingabloomfilter
AT birolinanc abyss20resourceefficientassemblyoflargegenomesusingabloomfilter