Cargando…

Konnector v2.0: pseudo-long reads from paired-end sequencing data

BACKGROUND: Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a leve...

Descripción completa

Detalles Bibliográficos
Autores principales: Vandervalk, Benjamin P, Yang, Chen, Xue, Zhuyi, Raghavan, Karthika, Chu, Justin, Mohamadi, Hamid, Jackman, Shaun D, Chiu, Readman, Warren, René L, Birol, Inanç
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4582294/
https://www.ncbi.nlm.nih.gov/pubmed/26399504
http://dx.doi.org/10.1186/1755-8794-8-S3-S1
_version_ 1782391680681377792
author Vandervalk, Benjamin P
Yang, Chen
Xue, Zhuyi
Raghavan, Karthika
Chu, Justin
Mohamadi, Hamid
Jackman, Shaun D
Chiu, Readman
Warren, René L
Birol, Inanç
author_facet Vandervalk, Benjamin P
Yang, Chen
Xue, Zhuyi
Raghavan, Karthika
Chu, Justin
Mohamadi, Hamid
Jackman, Shaun D
Chiu, Readman
Warren, René L
Birol, Inanç
author_sort Vandervalk, Benjamin P
collection PubMed
description BACKGROUND: Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool. RESULTS: Konnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences. CONCLUSIONS: Here we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware.
format Online
Article
Text
id pubmed-4582294
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45822942015-09-28 Konnector v2.0: pseudo-long reads from paired-end sequencing data Vandervalk, Benjamin P Yang, Chen Xue, Zhuyi Raghavan, Karthika Chu, Justin Mohamadi, Hamid Jackman, Shaun D Chiu, Readman Warren, René L Birol, Inanç BMC Med Genomics Research BACKGROUND: Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool. RESULTS: Konnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences. CONCLUSIONS: Here we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware. BioMed Central 2015-09-23 /pmc/articles/PMC4582294/ /pubmed/26399504 http://dx.doi.org/10.1186/1755-8794-8-S3-S1 Text en Copyright © 2015 Vandervalk et al.; http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Vandervalk, Benjamin P
Yang, Chen
Xue, Zhuyi
Raghavan, Karthika
Chu, Justin
Mohamadi, Hamid
Jackman, Shaun D
Chiu, Readman
Warren, René L
Birol, Inanç
Konnector v2.0: pseudo-long reads from paired-end sequencing data
title Konnector v2.0: pseudo-long reads from paired-end sequencing data
title_full Konnector v2.0: pseudo-long reads from paired-end sequencing data
title_fullStr Konnector v2.0: pseudo-long reads from paired-end sequencing data
title_full_unstemmed Konnector v2.0: pseudo-long reads from paired-end sequencing data
title_short Konnector v2.0: pseudo-long reads from paired-end sequencing data
title_sort konnector v2.0: pseudo-long reads from paired-end sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4582294/
https://www.ncbi.nlm.nih.gov/pubmed/26399504
http://dx.doi.org/10.1186/1755-8794-8-S3-S1
work_keys_str_mv AT vandervalkbenjaminp konnectorv20pseudolongreadsfrompairedendsequencingdata
AT yangchen konnectorv20pseudolongreadsfrompairedendsequencingdata
AT xuezhuyi konnectorv20pseudolongreadsfrompairedendsequencingdata
AT raghavankarthika konnectorv20pseudolongreadsfrompairedendsequencingdata
AT chujustin konnectorv20pseudolongreadsfrompairedendsequencingdata
AT mohamadihamid konnectorv20pseudolongreadsfrompairedendsequencingdata
AT jackmanshaund konnectorv20pseudolongreadsfrompairedendsequencingdata
AT chiureadman konnectorv20pseudolongreadsfrompairedendsequencingdata
AT warrenrenel konnectorv20pseudolongreadsfrompairedendsequencingdata
AT birolinanc konnectorv20pseudolongreadsfrompairedendsequencingdata