Cargando…
Konnector v2.0: pseudo-long reads from paired-end sequencing data
BACKGROUND: Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a leve...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4582294/ https://www.ncbi.nlm.nih.gov/pubmed/26399504 http://dx.doi.org/10.1186/1755-8794-8-S3-S1 |
_version_ | 1782391680681377792 |
---|---|
author | Vandervalk, Benjamin P Yang, Chen Xue, Zhuyi Raghavan, Karthika Chu, Justin Mohamadi, Hamid Jackman, Shaun D Chiu, Readman Warren, René L Birol, Inanç |
author_facet | Vandervalk, Benjamin P Yang, Chen Xue, Zhuyi Raghavan, Karthika Chu, Justin Mohamadi, Hamid Jackman, Shaun D Chiu, Readman Warren, René L Birol, Inanç |
author_sort | Vandervalk, Benjamin P |
collection | PubMed |
description | BACKGROUND: Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool. RESULTS: Konnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences. CONCLUSIONS: Here we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware. |
format | Online Article Text |
id | pubmed-4582294 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45822942015-09-28 Konnector v2.0: pseudo-long reads from paired-end sequencing data Vandervalk, Benjamin P Yang, Chen Xue, Zhuyi Raghavan, Karthika Chu, Justin Mohamadi, Hamid Jackman, Shaun D Chiu, Readman Warren, René L Birol, Inanç BMC Med Genomics Research BACKGROUND: Reading the nucleotides from two ends of a DNA fragment is called paired-end tag (PET) sequencing. When the fragment length is longer than the combined read length, there remains a gap of unsequenced nucleotides between read pairs. If the target in such experiments is sequenced at a level to provide redundant coverage, it may be possible to bridge these gaps using bioinformatics methods. Konnector is a local de novo assembly tool that addresses this problem. Here we report on version 2.0 of our tool. RESULTS: Konnector uses a probabilistic and memory-efficient data structure called Bloom filter to represent a k-mer spectrum - all possible sequences of length k in an input file, such as the collection of reads in a PET sequencing experiment. It performs look-ups to this data structure to construct an implicit de Bruijn graph, which describes (k-1) base pair overlaps between adjacent k-mers. It traverses this graph to bridge the gap between a given pair of flanking sequences. CONCLUSIONS: Here we report the performance of Konnector v2.0 on simulated and experimental datasets, and compare it against other tools with similar functionality. We note that, representing k-mers with 1.5 bytes of memory on average, Konnector can scale to very large genomes. With our parallel implementation, it can also process over a billion bases on commodity hardware. BioMed Central 2015-09-23 /pmc/articles/PMC4582294/ /pubmed/26399504 http://dx.doi.org/10.1186/1755-8794-8-S3-S1 Text en Copyright © 2015 Vandervalk et al.; http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Vandervalk, Benjamin P Yang, Chen Xue, Zhuyi Raghavan, Karthika Chu, Justin Mohamadi, Hamid Jackman, Shaun D Chiu, Readman Warren, René L Birol, Inanç Konnector v2.0: pseudo-long reads from paired-end sequencing data |
title | Konnector v2.0: pseudo-long reads from paired-end sequencing data |
title_full | Konnector v2.0: pseudo-long reads from paired-end sequencing data |
title_fullStr | Konnector v2.0: pseudo-long reads from paired-end sequencing data |
title_full_unstemmed | Konnector v2.0: pseudo-long reads from paired-end sequencing data |
title_short | Konnector v2.0: pseudo-long reads from paired-end sequencing data |
title_sort | konnector v2.0: pseudo-long reads from paired-end sequencing data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4582294/ https://www.ncbi.nlm.nih.gov/pubmed/26399504 http://dx.doi.org/10.1186/1755-8794-8-S3-S1 |
work_keys_str_mv | AT vandervalkbenjaminp konnectorv20pseudolongreadsfrompairedendsequencingdata AT yangchen konnectorv20pseudolongreadsfrompairedendsequencingdata AT xuezhuyi konnectorv20pseudolongreadsfrompairedendsequencingdata AT raghavankarthika konnectorv20pseudolongreadsfrompairedendsequencingdata AT chujustin konnectorv20pseudolongreadsfrompairedendsequencingdata AT mohamadihamid konnectorv20pseudolongreadsfrompairedendsequencingdata AT jackmanshaund konnectorv20pseudolongreadsfrompairedendsequencingdata AT chiureadman konnectorv20pseudolongreadsfrompairedendsequencingdata AT warrenrenel konnectorv20pseudolongreadsfrompairedendsequencingdata AT birolinanc konnectorv20pseudolongreadsfrompairedendsequencingdata |