Cargando…

A comparison of strategies for generating artificial replicates in RNA-seq experiments

Due to the overall high costs, technical replicates are usually omitted in RNA-seq experiments, but several methods exist to generate them artificially. Bootstrapping reads from FASTQ-files has recently been used in the context of other NGS analyses and can be used to generate artificial technical r...

Descripción completa

Detalles Bibliográficos
Autores principales: Saremi, Babak, Gusmag, Frederic, Distl, Ottmar, Schaarschmidt, Frank, Metzger, Julia, Becker, Stefanie, Jung, Klaus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9065086/
https://www.ncbi.nlm.nih.gov/pubmed/35505053
http://dx.doi.org/10.1038/s41598-022-11302-9
_version_ 1784699507677593600
author Saremi, Babak
Gusmag, Frederic
Distl, Ottmar
Schaarschmidt, Frank
Metzger, Julia
Becker, Stefanie
Jung, Klaus
author_facet Saremi, Babak
Gusmag, Frederic
Distl, Ottmar
Schaarschmidt, Frank
Metzger, Julia
Becker, Stefanie
Jung, Klaus
author_sort Saremi, Babak
collection PubMed
description Due to the overall high costs, technical replicates are usually omitted in RNA-seq experiments, but several methods exist to generate them artificially. Bootstrapping reads from FASTQ-files has recently been used in the context of other NGS analyses and can be used to generate artificial technical replicates. Bootstrapping samples from the columns of the expression matrix has already been used for DNA microarray data and generates a new artificial replicate of the whole experiment. Mixing data of individual samples has been used for data augmentation in machine learning. The aim of this comparison is to evaluate which of these strategies are best suited to study the reproducibility of differential expression and gene-set enrichment analysis in an RNA-seq experiment. To study the approaches under controlled conditions, we performed a new RNA-seq experiment on gene expression changes upon virus infection compared to untreated control samples. In order to compare the approaches for artificial replicates, each of the samples was sequenced twice, i.e. as true technical replicates, and differential expression analysis and GO term enrichment analysis was conducted separately for the two resulting data sets. Although we observed a high correlation between the results from the two replicates, there are still many genes and GO terms that would be selected from one replicate but not from the other. Cluster analyses showed that artificial replicates generated by bootstrapping reads produce it p values and fold changes that are close to those obtained from the true data sets. Results generated from artificial replicates with the approaches of column bootstrap or mixing observations were less similar to the results from the true replicates. Furthermore, the overlap of results among replicates generated by column bootstrap or mixing observations was much stronger than among the true replicates. Artificial technical replicates generated by bootstrapping sequencing reads from FASTQ-files are better suited to study the reproducibility of results from differential expression and GO term enrichment analysis in RNA-seq experiments than column bootstrap or mixing observations. However, FASTQ-bootstrapping is computationally more expensive than the other two approaches. The FASTQ-bootstrapping may be applicable to other applications of high-throughput sequencing.
format Online
Article
Text
id pubmed-9065086
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-90650862022-05-04 A comparison of strategies for generating artificial replicates in RNA-seq experiments Saremi, Babak Gusmag, Frederic Distl, Ottmar Schaarschmidt, Frank Metzger, Julia Becker, Stefanie Jung, Klaus Sci Rep Article Due to the overall high costs, technical replicates are usually omitted in RNA-seq experiments, but several methods exist to generate them artificially. Bootstrapping reads from FASTQ-files has recently been used in the context of other NGS analyses and can be used to generate artificial technical replicates. Bootstrapping samples from the columns of the expression matrix has already been used for DNA microarray data and generates a new artificial replicate of the whole experiment. Mixing data of individual samples has been used for data augmentation in machine learning. The aim of this comparison is to evaluate which of these strategies are best suited to study the reproducibility of differential expression and gene-set enrichment analysis in an RNA-seq experiment. To study the approaches under controlled conditions, we performed a new RNA-seq experiment on gene expression changes upon virus infection compared to untreated control samples. In order to compare the approaches for artificial replicates, each of the samples was sequenced twice, i.e. as true technical replicates, and differential expression analysis and GO term enrichment analysis was conducted separately for the two resulting data sets. Although we observed a high correlation between the results from the two replicates, there are still many genes and GO terms that would be selected from one replicate but not from the other. Cluster analyses showed that artificial replicates generated by bootstrapping reads produce it p values and fold changes that are close to those obtained from the true data sets. Results generated from artificial replicates with the approaches of column bootstrap or mixing observations were less similar to the results from the true replicates. Furthermore, the overlap of results among replicates generated by column bootstrap or mixing observations was much stronger than among the true replicates. Artificial technical replicates generated by bootstrapping sequencing reads from FASTQ-files are better suited to study the reproducibility of results from differential expression and GO term enrichment analysis in RNA-seq experiments than column bootstrap or mixing observations. However, FASTQ-bootstrapping is computationally more expensive than the other two approaches. The FASTQ-bootstrapping may be applicable to other applications of high-throughput sequencing. Nature Publishing Group UK 2022-05-03 /pmc/articles/PMC9065086/ /pubmed/35505053 http://dx.doi.org/10.1038/s41598-022-11302-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Saremi, Babak
Gusmag, Frederic
Distl, Ottmar
Schaarschmidt, Frank
Metzger, Julia
Becker, Stefanie
Jung, Klaus
A comparison of strategies for generating artificial replicates in RNA-seq experiments
title A comparison of strategies for generating artificial replicates in RNA-seq experiments
title_full A comparison of strategies for generating artificial replicates in RNA-seq experiments
title_fullStr A comparison of strategies for generating artificial replicates in RNA-seq experiments
title_full_unstemmed A comparison of strategies for generating artificial replicates in RNA-seq experiments
title_short A comparison of strategies for generating artificial replicates in RNA-seq experiments
title_sort comparison of strategies for generating artificial replicates in rna-seq experiments
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9065086/
https://www.ncbi.nlm.nih.gov/pubmed/35505053
http://dx.doi.org/10.1038/s41598-022-11302-9
work_keys_str_mv AT saremibabak acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT gusmagfrederic acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT distlottmar acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT schaarschmidtfrank acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT metzgerjulia acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT beckerstefanie acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT jungklaus acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT saremibabak comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT gusmagfrederic comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT distlottmar comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT schaarschmidtfrank comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT metzgerjulia comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT beckerstefanie comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments
AT jungklaus comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments