Cargando…
A comparison of strategies for generating artificial replicates in RNA-seq experiments
Due to the overall high costs, technical replicates are usually omitted in RNA-seq experiments, but several methods exist to generate them artificially. Bootstrapping reads from FASTQ-files has recently been used in the context of other NGS analyses and can be used to generate artificial technical r...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9065086/ https://www.ncbi.nlm.nih.gov/pubmed/35505053 http://dx.doi.org/10.1038/s41598-022-11302-9 |
_version_ | 1784699507677593600 |
---|---|
author | Saremi, Babak Gusmag, Frederic Distl, Ottmar Schaarschmidt, Frank Metzger, Julia Becker, Stefanie Jung, Klaus |
author_facet | Saremi, Babak Gusmag, Frederic Distl, Ottmar Schaarschmidt, Frank Metzger, Julia Becker, Stefanie Jung, Klaus |
author_sort | Saremi, Babak |
collection | PubMed |
description | Due to the overall high costs, technical replicates are usually omitted in RNA-seq experiments, but several methods exist to generate them artificially. Bootstrapping reads from FASTQ-files has recently been used in the context of other NGS analyses and can be used to generate artificial technical replicates. Bootstrapping samples from the columns of the expression matrix has already been used for DNA microarray data and generates a new artificial replicate of the whole experiment. Mixing data of individual samples has been used for data augmentation in machine learning. The aim of this comparison is to evaluate which of these strategies are best suited to study the reproducibility of differential expression and gene-set enrichment analysis in an RNA-seq experiment. To study the approaches under controlled conditions, we performed a new RNA-seq experiment on gene expression changes upon virus infection compared to untreated control samples. In order to compare the approaches for artificial replicates, each of the samples was sequenced twice, i.e. as true technical replicates, and differential expression analysis and GO term enrichment analysis was conducted separately for the two resulting data sets. Although we observed a high correlation between the results from the two replicates, there are still many genes and GO terms that would be selected from one replicate but not from the other. Cluster analyses showed that artificial replicates generated by bootstrapping reads produce it p values and fold changes that are close to those obtained from the true data sets. Results generated from artificial replicates with the approaches of column bootstrap or mixing observations were less similar to the results from the true replicates. Furthermore, the overlap of results among replicates generated by column bootstrap or mixing observations was much stronger than among the true replicates. Artificial technical replicates generated by bootstrapping sequencing reads from FASTQ-files are better suited to study the reproducibility of results from differential expression and GO term enrichment analysis in RNA-seq experiments than column bootstrap or mixing observations. However, FASTQ-bootstrapping is computationally more expensive than the other two approaches. The FASTQ-bootstrapping may be applicable to other applications of high-throughput sequencing. |
format | Online Article Text |
id | pubmed-9065086 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-90650862022-05-04 A comparison of strategies for generating artificial replicates in RNA-seq experiments Saremi, Babak Gusmag, Frederic Distl, Ottmar Schaarschmidt, Frank Metzger, Julia Becker, Stefanie Jung, Klaus Sci Rep Article Due to the overall high costs, technical replicates are usually omitted in RNA-seq experiments, but several methods exist to generate them artificially. Bootstrapping reads from FASTQ-files has recently been used in the context of other NGS analyses and can be used to generate artificial technical replicates. Bootstrapping samples from the columns of the expression matrix has already been used for DNA microarray data and generates a new artificial replicate of the whole experiment. Mixing data of individual samples has been used for data augmentation in machine learning. The aim of this comparison is to evaluate which of these strategies are best suited to study the reproducibility of differential expression and gene-set enrichment analysis in an RNA-seq experiment. To study the approaches under controlled conditions, we performed a new RNA-seq experiment on gene expression changes upon virus infection compared to untreated control samples. In order to compare the approaches for artificial replicates, each of the samples was sequenced twice, i.e. as true technical replicates, and differential expression analysis and GO term enrichment analysis was conducted separately for the two resulting data sets. Although we observed a high correlation between the results from the two replicates, there are still many genes and GO terms that would be selected from one replicate but not from the other. Cluster analyses showed that artificial replicates generated by bootstrapping reads produce it p values and fold changes that are close to those obtained from the true data sets. Results generated from artificial replicates with the approaches of column bootstrap or mixing observations were less similar to the results from the true replicates. Furthermore, the overlap of results among replicates generated by column bootstrap or mixing observations was much stronger than among the true replicates. Artificial technical replicates generated by bootstrapping sequencing reads from FASTQ-files are better suited to study the reproducibility of results from differential expression and GO term enrichment analysis in RNA-seq experiments than column bootstrap or mixing observations. However, FASTQ-bootstrapping is computationally more expensive than the other two approaches. The FASTQ-bootstrapping may be applicable to other applications of high-throughput sequencing. Nature Publishing Group UK 2022-05-03 /pmc/articles/PMC9065086/ /pubmed/35505053 http://dx.doi.org/10.1038/s41598-022-11302-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Saremi, Babak Gusmag, Frederic Distl, Ottmar Schaarschmidt, Frank Metzger, Julia Becker, Stefanie Jung, Klaus A comparison of strategies for generating artificial replicates in RNA-seq experiments |
title | A comparison of strategies for generating artificial replicates in RNA-seq experiments |
title_full | A comparison of strategies for generating artificial replicates in RNA-seq experiments |
title_fullStr | A comparison of strategies for generating artificial replicates in RNA-seq experiments |
title_full_unstemmed | A comparison of strategies for generating artificial replicates in RNA-seq experiments |
title_short | A comparison of strategies for generating artificial replicates in RNA-seq experiments |
title_sort | comparison of strategies for generating artificial replicates in rna-seq experiments |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9065086/ https://www.ncbi.nlm.nih.gov/pubmed/35505053 http://dx.doi.org/10.1038/s41598-022-11302-9 |
work_keys_str_mv | AT saremibabak acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT gusmagfrederic acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT distlottmar acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT schaarschmidtfrank acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT metzgerjulia acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT beckerstefanie acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT jungklaus acomparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT saremibabak comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT gusmagfrederic comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT distlottmar comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT schaarschmidtfrank comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT metzgerjulia comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT beckerstefanie comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments AT jungklaus comparisonofstrategiesforgeneratingartificialreplicatesinrnaseqexperiments |