Cargando…

Improving in-silico normalization using read weights

Specialized de novo assemblers for diverse datatypes have been developed and are in widespread use for the analyses of single-cell genomics, metagenomics and RNA-seq data. However, assembly of large sequencing datasets produced by modern technologies is challenging and computationally intensive. In-...

Descripción completa

Detalles Bibliográficos
Autores principales:	Durai, Dilip A., Schulz, Marcel H.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6435659/ https://www.ncbi.nlm.nih.gov/pubmed/30914698 http://dx.doi.org/10.1038/s41598-019-41502-9

_version_	1783406681472892928
author	Durai, Dilip A. Schulz, Marcel H.
author_facet	Durai, Dilip A. Schulz, Marcel H.
author_sort	Durai, Dilip A.
collection	PubMed
description	Specialized de novo assemblers for diverse datatypes have been developed and are in widespread use for the analyses of single-cell genomics, metagenomics and RNA-seq data. However, assembly of large sequencing datasets produced by modern technologies is challenging and computationally intensive. In-silico read normalization has been suggested as a computational strategy to reduce redundancy in read datasets, which leads to significant speedups and memory savings of assembly pipelines. Previously, we presented a set multi-cover optimization based approach, ORNA, where reads are reduced without losing important k-mer connectivity information, as used in assembly graphs. Here we propose extensions to ORNA, named ORNA-Q and ORNA-K, which consider a weighted set multi-cover optimization formulation for the in-silico read normalization problem. These novel formulations make use of the base quality scores obtained from sequencers (ORNA-Q) or k-mer abundances of reads (ORNA-K) to improve normalization further. We devise efficient heuristic algorithms for solving both formulations. In applications to human RNA-seq data, ORNA-Q and ORNA-K are shown to assemble more or equally many full length transcripts compared to other normalization methods at similar or higher read reduction values. The algorithm is implemented under the latest version of ORNA (v2.0, https://github.com/SchulzLab/ORNA).
format	Online Article Text
id	pubmed-6435659
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-64356592019-04-02 Improving in-silico normalization using read weights Durai, Dilip A. Schulz, Marcel H. Sci Rep Article Specialized de novo assemblers for diverse datatypes have been developed and are in widespread use for the analyses of single-cell genomics, metagenomics and RNA-seq data. However, assembly of large sequencing datasets produced by modern technologies is challenging and computationally intensive. In-silico read normalization has been suggested as a computational strategy to reduce redundancy in read datasets, which leads to significant speedups and memory savings of assembly pipelines. Previously, we presented a set multi-cover optimization based approach, ORNA, where reads are reduced without losing important k-mer connectivity information, as used in assembly graphs. Here we propose extensions to ORNA, named ORNA-Q and ORNA-K, which consider a weighted set multi-cover optimization formulation for the in-silico read normalization problem. These novel formulations make use of the base quality scores obtained from sequencers (ORNA-Q) or k-mer abundances of reads (ORNA-K) to improve normalization further. We devise efficient heuristic algorithms for solving both formulations. In applications to human RNA-seq data, ORNA-Q and ORNA-K are shown to assemble more or equally many full length transcripts compared to other normalization methods at similar or higher read reduction values. The algorithm is implemented under the latest version of ORNA (v2.0, https://github.com/SchulzLab/ORNA). Nature Publishing Group UK 2019-03-26 /pmc/articles/PMC6435659/ /pubmed/30914698 http://dx.doi.org/10.1038/s41598-019-41502-9 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Article Durai, Dilip A. Schulz, Marcel H. Improving in-silico normalization using read weights
title	Improving in-silico normalization using read weights
title_full	Improving in-silico normalization using read weights
title_fullStr	Improving in-silico normalization using read weights
title_full_unstemmed	Improving in-silico normalization using read weights
title_short	Improving in-silico normalization using read weights
title_sort	improving in-silico normalization using read weights
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6435659/ https://www.ncbi.nlm.nih.gov/pubmed/30914698 http://dx.doi.org/10.1038/s41598-019-41502-9
work_keys_str_mv	AT duraidilipa improvinginsiliconormalizationusingreadweights AT schulzmarcelh improvinginsiliconormalizationusingreadweights

Improving in-silico normalization using read weights

Ejemplares similares