Cargando…

Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods

BACKGROUND: Standard RNAseq methods using bulk RNA and recent single-cell RNAseq methods use DNA barcodes to identify samples and cells, and the barcoded cDNAs are pooled into a library pool before high throughput sequencing. In cases of single-cell and low-input RNAseq methods, the library is furth...

Descripción completa

Detalles Bibliográficos
Autores principales:	Katayama, Shintaro, Skoog, Tiina, Söderhäll, Cilla, Einarsdottir, Elisabet, Krjutškov, Kaarel, Kere, Juha
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2019
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6693229/ https://www.ncbi.nlm.nih.gov/pubmed/31409293 http://dx.doi.org/10.1186/s12859-019-3017-9

_version_	1783443671497048064
author	Katayama, Shintaro Skoog, Tiina Söderhäll, Cilla Einarsdottir, Elisabet Krjutškov, Kaarel Kere, Juha
author_facet	Katayama, Shintaro Skoog, Tiina Söderhäll, Cilla Einarsdottir, Elisabet Krjutškov, Kaarel Kere, Juha
author_sort	Katayama, Shintaro
collection	PubMed
description	BACKGROUND: Standard RNAseq methods using bulk RNA and recent single-cell RNAseq methods use DNA barcodes to identify samples and cells, and the barcoded cDNAs are pooled into a library pool before high throughput sequencing. In cases of single-cell and low-input RNAseq methods, the library is further amplified by PCR after the pooling. Preparation of hundreds or more samples for a large study often requires multiple library pools. However, sometimes correlation between expression profiles among the libraries is low and batch effect biases make integration of data between library pools difficult. RESULTS: We investigated 166 technical replicates in 14 RNAseq libraries made using the STRT method. The patterns of the library biases differed by genes, and uneven library yields were associated with library biases. The former bias was corrected using the NBGLM-LBC algorithm, which we present in the current study. The latter bias could not be corrected directly, but could be solved by omitting libraries with particularly low yields. A simulation experiment suggested that the library bias correction using NBGLM-LBC requires a consistent sample layout. The NBGLM-LBC correction method was applied to an expression profile for a cohort study of childhood acute respiratory illness, and the library biases were resolved. CONCLUSIONS: The R source code for the library bias correction named NBGLM-LBC is available at https://shka.github.io/NBGLM-LBC and https://shka.bitbucket.io/NBGLM-LBC. This method is applicable to correct the library biases in various studies that use highly multiplexed sequencing-based profiling methods with a consistent sample layout with samples to be compared (e.g., “cases” and “controls”) equally distributed in each library. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3017-9) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6693229
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-66932292019-08-16 Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods Katayama, Shintaro Skoog, Tiina Söderhäll, Cilla Einarsdottir, Elisabet Krjutškov, Kaarel Kere, Juha BMC Bioinformatics Software BACKGROUND: Standard RNAseq methods using bulk RNA and recent single-cell RNAseq methods use DNA barcodes to identify samples and cells, and the barcoded cDNAs are pooled into a library pool before high throughput sequencing. In cases of single-cell and low-input RNAseq methods, the library is further amplified by PCR after the pooling. Preparation of hundreds or more samples for a large study often requires multiple library pools. However, sometimes correlation between expression profiles among the libraries is low and batch effect biases make integration of data between library pools difficult. RESULTS: We investigated 166 technical replicates in 14 RNAseq libraries made using the STRT method. The patterns of the library biases differed by genes, and uneven library yields were associated with library biases. The former bias was corrected using the NBGLM-LBC algorithm, which we present in the current study. The latter bias could not be corrected directly, but could be solved by omitting libraries with particularly low yields. A simulation experiment suggested that the library bias correction using NBGLM-LBC requires a consistent sample layout. The NBGLM-LBC correction method was applied to an expression profile for a cohort study of childhood acute respiratory illness, and the library biases were resolved. CONCLUSIONS: The R source code for the library bias correction named NBGLM-LBC is available at https://shka.github.io/NBGLM-LBC and https://shka.bitbucket.io/NBGLM-LBC. This method is applicable to correct the library biases in various studies that use highly multiplexed sequencing-based profiling methods with a consistent sample layout with samples to be compared (e.g., “cases” and “controls”) equally distributed in each library. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-3017-9) contains supplementary material, which is available to authorized users. BioMed Central 2019-08-13 /pmc/articles/PMC6693229/ /pubmed/31409293 http://dx.doi.org/10.1186/s12859-019-3017-9 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Katayama, Shintaro Skoog, Tiina Söderhäll, Cilla Einarsdottir, Elisabet Krjutškov, Kaarel Kere, Juha Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods
title	Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods
title_full	Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods
title_fullStr	Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods
title_full_unstemmed	Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods
title_short	Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods
title_sort	guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed rnaseq methods
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6693229/ https://www.ncbi.nlm.nih.gov/pubmed/31409293 http://dx.doi.org/10.1186/s12859-019-3017-9
work_keys_str_mv	AT katayamashintaro guideforlibrarydesignandbiascorrectionforlargescaletranscriptomestudiesusinghighlymultiplexedrnaseqmethods AT skoogtiina guideforlibrarydesignandbiascorrectionforlargescaletranscriptomestudiesusinghighlymultiplexedrnaseqmethods AT soderhallcilla guideforlibrarydesignandbiascorrectionforlargescaletranscriptomestudiesusinghighlymultiplexedrnaseqmethods AT einarsdottirelisabet guideforlibrarydesignandbiascorrectionforlargescaletranscriptomestudiesusinghighlymultiplexedrnaseqmethods AT krjutskovkaarel guideforlibrarydesignandbiascorrectionforlargescaletranscriptomestudiesusinghighlymultiplexedrnaseqmethods AT kerejuha guideforlibrarydesignandbiascorrectionforlargescaletranscriptomestudiesusinghighlymultiplexedrnaseqmethods

Guide for library design and bias correction for large-scale transcriptome studies using highly multiplexed RNAseq methods

Ejemplares similares