Cargando…

A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells

BACKGROUND: Pipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expressi...

Descripción completa

Detalles Bibliográficos
Autores principales: Wolff, Alexander, Bayerlová, Michaela, Gaedcke, Jochen, Kube, Dieter, Beißbarth, Tim
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5955523/
https://www.ncbi.nlm.nih.gov/pubmed/29768462
http://dx.doi.org/10.1371/journal.pone.0197162
_version_ 1783323732774748160
author Wolff, Alexander
Bayerlová, Michaela
Gaedcke, Jochen
Kube, Dieter
Beißbarth, Tim
author_facet Wolff, Alexander
Bayerlová, Michaela
Gaedcke, Jochen
Kube, Dieter
Beißbarth, Tim
author_sort Wolff, Alexander
collection PubMed
description BACKGROUND: Pipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances. METHODS: Four commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data. RESULTS: The overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat’s overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67–0.69) than for the cell line dataset (ρ = 0.87–0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21–0.29 and 0.34–0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results. CONCLUSION: In conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines.
format Online
Article
Text
id pubmed-5955523
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-59555232018-05-25 A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells Wolff, Alexander Bayerlová, Michaela Gaedcke, Jochen Kube, Dieter Beißbarth, Tim PLoS One Research Article BACKGROUND: Pipeline comparisons for gene expression data are highly valuable for applied real data analyses, as they enable the selection of suitable analysis strategies for the dataset at hand. Such pipelines for RNA-Seq data should include mapping of reads, counting and differential gene expression analysis or preprocessing, normalization and differential gene expression in case of microarray analysis, in order to give a global insight into pipeline performances. METHODS: Four commonly used RNA-Seq pipelines (STAR/HTSeq-Count/edgeR, STAR/RSEM/edgeR, Sailfish/edgeR, TopHat2/Cufflinks/CuffDiff)) were investigated on multiple levels (alignment and counting) and cross-compared with the microarray counterpart on the level of gene expression and gene ontology enrichment. For these comparisons we generated two matched microarray and RNA-Seq datasets: Burkitt Lymphoma cell line data and rectal cancer patient data. RESULTS: The overall mapping rate of STAR was 98.98% for the cell line dataset and 98.49% for the patient dataset. Tophat’s overall mapping rate was 97.02% and 96.73%, respectively, while Sailfish had only an overall mapping rate of 84.81% and 54.44%. The correlation of gene expression in microarray and RNA-Seq data was moderately worse for the patient dataset (ρ = 0.67–0.69) than for the cell line dataset (ρ = 0.87–0.88). An exception were the correlation results of Cufflinks, which were substantially lower (ρ = 0.21–0.29 and 0.34–0.53). For both datasets we identified very low numbers of differentially expressed genes using the microarray platform. For RNA-Seq we checked the agreement of differentially expressed genes identified in the different pipelines and of GO-term enrichment results. CONCLUSION: In conclusion the combination of STAR aligner with HTSeq-Count followed by STAR aligner with RSEM and Sailfish generated differentially expressed genes best suited for the dataset at hand and in agreement with most of the other transcriptomics pipelines. Public Library of Science 2018-05-16 /pmc/articles/PMC5955523/ /pubmed/29768462 http://dx.doi.org/10.1371/journal.pone.0197162 Text en © 2018 Wolff et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Wolff, Alexander
Bayerlová, Michaela
Gaedcke, Jochen
Kube, Dieter
Beißbarth, Tim
A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells
title A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells
title_full A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells
title_fullStr A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells
title_full_unstemmed A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells
title_short A comparative study of RNA-Seq and microarray data analysis on the two examples of rectal-cancer patients and Burkitt Lymphoma cells
title_sort comparative study of rna-seq and microarray data analysis on the two examples of rectal-cancer patients and burkitt lymphoma cells
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5955523/
https://www.ncbi.nlm.nih.gov/pubmed/29768462
http://dx.doi.org/10.1371/journal.pone.0197162
work_keys_str_mv AT wolffalexander acomparativestudyofrnaseqandmicroarraydataanalysisonthetwoexamplesofrectalcancerpatientsandburkittlymphomacells
AT bayerlovamichaela acomparativestudyofrnaseqandmicroarraydataanalysisonthetwoexamplesofrectalcancerpatientsandburkittlymphomacells
AT gaedckejochen acomparativestudyofrnaseqandmicroarraydataanalysisonthetwoexamplesofrectalcancerpatientsandburkittlymphomacells
AT kubedieter acomparativestudyofrnaseqandmicroarraydataanalysisonthetwoexamplesofrectalcancerpatientsandburkittlymphomacells
AT beißbarthtim acomparativestudyofrnaseqandmicroarraydataanalysisonthetwoexamplesofrectalcancerpatientsandburkittlymphomacells
AT wolffalexander comparativestudyofrnaseqandmicroarraydataanalysisonthetwoexamplesofrectalcancerpatientsandburkittlymphomacells
AT bayerlovamichaela comparativestudyofrnaseqandmicroarraydataanalysisonthetwoexamplesofrectalcancerpatientsandburkittlymphomacells
AT gaedckejochen comparativestudyofrnaseqandmicroarraydataanalysisonthetwoexamplesofrectalcancerpatientsandburkittlymphomacells
AT kubedieter comparativestudyofrnaseqandmicroarraydataanalysisonthetwoexamplesofrectalcancerpatientsandburkittlymphomacells
AT beißbarthtim comparativestudyofrnaseqandmicroarraydataanalysisonthetwoexamplesofrectalcancerpatientsandburkittlymphomacells