Cargando…

Kollector: transcript-informed, targeted de novo assembly of gene loci

MOTIVATION: Despite considerable advancements in sequencing and computing technologies, de novo assembly of whole eukaryotic genomes is still a time-consuming task that requires a significant amount of computational resources and expertise. A targeted assembly approach to perform local assembly of s...

Descripción completa

Detalles Bibliográficos
Autores principales: Kucuk, Erdi, Chu, Justin, Vandervalk, Benjamin P, Hammond, S Austin, Warren, René L, Birol, Inanc
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5572715/
https://www.ncbi.nlm.nih.gov/pubmed/28186221
http://dx.doi.org/10.1093/bioinformatics/btx078
_version_ 1783259561347514368
author Kucuk, Erdi
Chu, Justin
Vandervalk, Benjamin P
Hammond, S Austin
Warren, René L
Birol, Inanc
author_facet Kucuk, Erdi
Chu, Justin
Vandervalk, Benjamin P
Hammond, S Austin
Warren, René L
Birol, Inanc
author_sort Kucuk, Erdi
collection PubMed
description MOTIVATION: Despite considerable advancements in sequencing and computing technologies, de novo assembly of whole eukaryotic genomes is still a time-consuming task that requires a significant amount of computational resources and expertise. A targeted assembly approach to perform local assembly of sequences of interest remains a valuable option for some applications. This is especially true for gene-centric assemblies, whose resulting sequence can be readily utilized for more focused biological research. Here we describe Kollector, an alignment-free targeted assembly pipeline that uses thousands of transcript sequences concurrently to inform the localized assembly of corresponding gene loci. Kollector robustly reconstructs introns and novel sequences within these loci, and scales well to large genomes—properties that makes it especially useful for researchers working on non-model eukaryotic organisms. RESULTS: We demonstrate the performance of Kollector for assembling complete or near-complete Caenorhabditis elegans and Homo sapiens gene loci from their respective, input transcripts. In a time- and memory-efficient manner, the Kollector pipeline successfully reconstructs respectively 99% and 80% (compared to 86% and 73% with standard de novo assembly techniques) of C.elegans and H.sapiens transcript targets in their corresponding genomic space using whole genome shotgun sequencing reads. We also show that Kollector outperforms both established and recently released targeted assembly tools. Finally, we demonstrate three use cases for Kollector, including comparative and cancer genomics applications. AVAILABILITY AND IMPLEMENTATION: Kollector is implemented as a bash script, and is available at https://github.com/bcgsc/kollector SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5572715
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-55727152017-09-01 Kollector: transcript-informed, targeted de novo assembly of gene loci Kucuk, Erdi Chu, Justin Vandervalk, Benjamin P Hammond, S Austin Warren, René L Birol, Inanc Bioinformatics Original Papers MOTIVATION: Despite considerable advancements in sequencing and computing technologies, de novo assembly of whole eukaryotic genomes is still a time-consuming task that requires a significant amount of computational resources and expertise. A targeted assembly approach to perform local assembly of sequences of interest remains a valuable option for some applications. This is especially true for gene-centric assemblies, whose resulting sequence can be readily utilized for more focused biological research. Here we describe Kollector, an alignment-free targeted assembly pipeline that uses thousands of transcript sequences concurrently to inform the localized assembly of corresponding gene loci. Kollector robustly reconstructs introns and novel sequences within these loci, and scales well to large genomes—properties that makes it especially useful for researchers working on non-model eukaryotic organisms. RESULTS: We demonstrate the performance of Kollector for assembling complete or near-complete Caenorhabditis elegans and Homo sapiens gene loci from their respective, input transcripts. In a time- and memory-efficient manner, the Kollector pipeline successfully reconstructs respectively 99% and 80% (compared to 86% and 73% with standard de novo assembly techniques) of C.elegans and H.sapiens transcript targets in their corresponding genomic space using whole genome shotgun sequencing reads. We also show that Kollector outperforms both established and recently released targeted assembly tools. Finally, we demonstrate three use cases for Kollector, including comparative and cancer genomics applications. AVAILABILITY AND IMPLEMENTATION: Kollector is implemented as a bash script, and is available at https://github.com/bcgsc/kollector SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2017-06-15 2017-02-10 /pmc/articles/PMC5572715/ /pubmed/28186221 http://dx.doi.org/10.1093/bioinformatics/btx078 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Kucuk, Erdi
Chu, Justin
Vandervalk, Benjamin P
Hammond, S Austin
Warren, René L
Birol, Inanc
Kollector: transcript-informed, targeted de novo assembly of gene loci
title Kollector: transcript-informed, targeted de novo assembly of gene loci
title_full Kollector: transcript-informed, targeted de novo assembly of gene loci
title_fullStr Kollector: transcript-informed, targeted de novo assembly of gene loci
title_full_unstemmed Kollector: transcript-informed, targeted de novo assembly of gene loci
title_short Kollector: transcript-informed, targeted de novo assembly of gene loci
title_sort kollector: transcript-informed, targeted de novo assembly of gene loci
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5572715/
https://www.ncbi.nlm.nih.gov/pubmed/28186221
http://dx.doi.org/10.1093/bioinformatics/btx078
work_keys_str_mv AT kucukerdi kollectortranscriptinformedtargeteddenovoassemblyofgeneloci
AT chujustin kollectortranscriptinformedtargeteddenovoassemblyofgeneloci
AT vandervalkbenjaminp kollectortranscriptinformedtargeteddenovoassemblyofgeneloci
AT hammondsaustin kollectortranscriptinformedtargeteddenovoassemblyofgeneloci
AT warrenrenel kollectortranscriptinformedtargeteddenovoassemblyofgeneloci
AT birolinanc kollectortranscriptinformedtargeteddenovoassemblyofgeneloci