Cargando…

Informed kmer selection for de novo transcriptome assembly

Motivation: De novo transcriptome assembly is an integral part for many RNA-seq workflows. Common applications include sequencing of non-model organisms, cancer or meta transcriptomes. Most de novo transcriptome assemblers use the de Bruijn graph (DBG) as the underlying data structure. The quality o...

Descripción completa

Detalles Bibliográficos
Autores principales:	Durai, Dilip A., Schulz, Marcel H.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2016
Materias:	Hitseq Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4892416/ https://www.ncbi.nlm.nih.gov/pubmed/27153653 http://dx.doi.org/10.1093/bioinformatics/btw217

_version_	1782435383375560704
author	Durai, Dilip A. Schulz, Marcel H.
author_facet	Durai, Dilip A. Schulz, Marcel H.
author_sort	Durai, Dilip A.
collection	PubMed
description	Motivation: De novo transcriptome assembly is an integral part for many RNA-seq workflows. Common applications include sequencing of non-model organisms, cancer or meta transcriptomes. Most de novo transcriptome assemblers use the de Bruijn graph (DBG) as the underlying data structure. The quality of the assemblies produced by such assemblers is highly influenced by the exact word length k. As such no single kmer value leads to optimal results. Instead, DBGs over different kmer values are built and the assemblies are merged to improve sensitivity. However, no studies have investigated thoroughly the problem of automatically learning at which kmer value to stop the assembly. Instead a suboptimal selection of kmer values is often used in practice. Results: Here we investigate the contribution of a single kmer value in a multi-kmer based assembly approach. We find that a comparative clustering of related assemblies can be used to estimate the importance of an additional kmer assembly. Using a model fit based algorithm we predict the kmer value at which no further assemblies are necessary. Our approach is tested with different de novo assemblers for datasets with different coverage values and read lengths. Further, we suggest a simple post processing step that significantly improves the quality of multi-kmer assemblies. Conclusion: We provide an automatic method for limiting the number of kmer values without a significant loss in assembly quality but with savings in assembly time. This is a step forward to making multi-kmer methods more reliable and easier to use. Availability and Implementation:A general implementation of our approach can be found under: https://github.com/SchulzLab/KREATION. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: mschulz@mmci.uni-saarland.de
format	Online Article Text
id	pubmed-4892416
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-48924162016-06-07 Informed kmer selection for de novo transcriptome assembly Durai, Dilip A. Schulz, Marcel H. Bioinformatics Hitseq Papers Motivation: De novo transcriptome assembly is an integral part for many RNA-seq workflows. Common applications include sequencing of non-model organisms, cancer or meta transcriptomes. Most de novo transcriptome assemblers use the de Bruijn graph (DBG) as the underlying data structure. The quality of the assemblies produced by such assemblers is highly influenced by the exact word length k. As such no single kmer value leads to optimal results. Instead, DBGs over different kmer values are built and the assemblies are merged to improve sensitivity. However, no studies have investigated thoroughly the problem of automatically learning at which kmer value to stop the assembly. Instead a suboptimal selection of kmer values is often used in practice. Results: Here we investigate the contribution of a single kmer value in a multi-kmer based assembly approach. We find that a comparative clustering of related assemblies can be used to estimate the importance of an additional kmer assembly. Using a model fit based algorithm we predict the kmer value at which no further assemblies are necessary. Our approach is tested with different de novo assemblers for datasets with different coverage values and read lengths. Further, we suggest a simple post processing step that significantly improves the quality of multi-kmer assemblies. Conclusion: We provide an automatic method for limiting the number of kmer values without a significant loss in assembly quality but with savings in assembly time. This is a step forward to making multi-kmer methods more reliable and easier to use. Availability and Implementation:A general implementation of our approach can be found under: https://github.com/SchulzLab/KREATION. Supplementary information: Supplementary data are available at Bioinformatics online. Contact: mschulz@mmci.uni-saarland.de Oxford University Press 2016-06-01 2016-04-28 /pmc/articles/PMC4892416/ /pubmed/27153653 http://dx.doi.org/10.1093/bioinformatics/btw217 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Hitseq Papers Durai, Dilip A. Schulz, Marcel H. Informed kmer selection for de novo transcriptome assembly
title	Informed kmer selection for de novo transcriptome assembly
title_full	Informed kmer selection for de novo transcriptome assembly
title_fullStr	Informed kmer selection for de novo transcriptome assembly
title_full_unstemmed	Informed kmer selection for de novo transcriptome assembly
title_short	Informed kmer selection for de novo transcriptome assembly
title_sort	informed kmer selection for de novo transcriptome assembly
topic	Hitseq Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4892416/ https://www.ncbi.nlm.nih.gov/pubmed/27153653 http://dx.doi.org/10.1093/bioinformatics/btw217
work_keys_str_mv	AT duraidilipa informedkmerselectionfordenovotranscriptomeassembly AT schulzmarcelh informedkmerselectionfordenovotranscriptomeassembly

Informed kmer selection for de novo transcriptome assembly

Ejemplares similares