Cargando…

Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons

BACKGROUND: Current normalization methods for RNA-sequencing data allow either for intersample comparison to identify differentially expressed (DE) genes or for intrasample comparison for the discovery and validation of gene signatures. Most studies on optimization of normalization methods typically...

Descripción completa

Detalles Bibliográficos
Autores principales: Smid, Marcel, Coebergh van den Braak, Robert R. J., van de Werken, Harmen J. G., van Riet, Job, van Galen, Anne, de Weerd, Vanja, van der Vlugt-Daane, Michelle, Bril, Sandra I., Lalmahomed, Zarina S., Kloosterman, Wigard P., Wilting, Saskia M., Foekens, John A., IJzermans, Jan N. M., Martens, John W. M., Sieuwerts, Anieta M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6013957/
https://www.ncbi.nlm.nih.gov/pubmed/29929481
http://dx.doi.org/10.1186/s12859-018-2246-7
_version_ 1783334130323292160
author Smid, Marcel
Coebergh van den Braak, Robert R. J.
van de Werken, Harmen J. G.
van Riet, Job
van Galen, Anne
de Weerd, Vanja
van der Vlugt-Daane, Michelle
Bril, Sandra I.
Lalmahomed, Zarina S.
Kloosterman, Wigard P.
Wilting, Saskia M.
Foekens, John A.
IJzermans, Jan N. M.
Martens, John W. M.
Sieuwerts, Anieta M.
author_facet Smid, Marcel
Coebergh van den Braak, Robert R. J.
van de Werken, Harmen J. G.
van Riet, Job
van Galen, Anne
de Weerd, Vanja
van der Vlugt-Daane, Michelle
Bril, Sandra I.
Lalmahomed, Zarina S.
Kloosterman, Wigard P.
Wilting, Saskia M.
Foekens, John A.
IJzermans, Jan N. M.
Martens, John W. M.
Sieuwerts, Anieta M.
author_sort Smid, Marcel
collection PubMed
description BACKGROUND: Current normalization methods for RNA-sequencing data allow either for intersample comparison to identify differentially expressed (DE) genes or for intrasample comparison for the discovery and validation of gene signatures. Most studies on optimization of normalization methods typically use simulated data to validate methodologies. We describe a new method, GeTMM, which allows for both inter- and intrasample analyses with the same normalized data set. We used actual (i.e. not simulated) RNA-seq data from 263 colon cancers (no biological replicates) and used the same read count data to compare GeTMM with the most commonly used normalization methods (i.e. TMM (used by edgeR), RLE (used by DESeq2) and TPM) with respect to distributions, effect of RNA quality, subtype-classification, recurrence score, recall of DE genes and correlation to RT-qPCR data. RESULTS: We observed a clear benefit for GeTMM and TPM with regard to intrasample comparison while GeTMM performed similar to TMM and RLE normalized data in intersample comparisons. Regarding DE genes, recall was found comparable among the normalization methods, while GeTMM showed the lowest number of false-positive DE genes. Remarkably, we observed limited detrimental effects in samples with low RNA quality. CONCLUSIONS: We show that GeTMM outperforms established methods with regard to intrasample comparison while performing equivalent with regard to intersample normalization using the same normalized data. These combined properties enhance the general usefulness of RNA-seq but also the comparability to the many array-based gene expression data in the public domain. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2246-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6013957
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-60139572018-07-05 Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons Smid, Marcel Coebergh van den Braak, Robert R. J. van de Werken, Harmen J. G. van Riet, Job van Galen, Anne de Weerd, Vanja van der Vlugt-Daane, Michelle Bril, Sandra I. Lalmahomed, Zarina S. Kloosterman, Wigard P. Wilting, Saskia M. Foekens, John A. IJzermans, Jan N. M. Martens, John W. M. Sieuwerts, Anieta M. BMC Bioinformatics Methodology Article BACKGROUND: Current normalization methods for RNA-sequencing data allow either for intersample comparison to identify differentially expressed (DE) genes or for intrasample comparison for the discovery and validation of gene signatures. Most studies on optimization of normalization methods typically use simulated data to validate methodologies. We describe a new method, GeTMM, which allows for both inter- and intrasample analyses with the same normalized data set. We used actual (i.e. not simulated) RNA-seq data from 263 colon cancers (no biological replicates) and used the same read count data to compare GeTMM with the most commonly used normalization methods (i.e. TMM (used by edgeR), RLE (used by DESeq2) and TPM) with respect to distributions, effect of RNA quality, subtype-classification, recurrence score, recall of DE genes and correlation to RT-qPCR data. RESULTS: We observed a clear benefit for GeTMM and TPM with regard to intrasample comparison while GeTMM performed similar to TMM and RLE normalized data in intersample comparisons. Regarding DE genes, recall was found comparable among the normalization methods, while GeTMM showed the lowest number of false-positive DE genes. Remarkably, we observed limited detrimental effects in samples with low RNA quality. CONCLUSIONS: We show that GeTMM outperforms established methods with regard to intrasample comparison while performing equivalent with regard to intersample normalization using the same normalized data. These combined properties enhance the general usefulness of RNA-seq but also the comparability to the many array-based gene expression data in the public domain. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2246-7) contains supplementary material, which is available to authorized users. BioMed Central 2018-06-22 /pmc/articles/PMC6013957/ /pubmed/29929481 http://dx.doi.org/10.1186/s12859-018-2246-7 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Smid, Marcel
Coebergh van den Braak, Robert R. J.
van de Werken, Harmen J. G.
van Riet, Job
van Galen, Anne
de Weerd, Vanja
van der Vlugt-Daane, Michelle
Bril, Sandra I.
Lalmahomed, Zarina S.
Kloosterman, Wigard P.
Wilting, Saskia M.
Foekens, John A.
IJzermans, Jan N. M.
Martens, John W. M.
Sieuwerts, Anieta M.
Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons
title Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons
title_full Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons
title_fullStr Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons
title_full_unstemmed Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons
title_short Gene length corrected trimmed mean of M-values (GeTMM) processing of RNA-seq data performs similarly in intersample analyses while improving intrasample comparisons
title_sort gene length corrected trimmed mean of m-values (getmm) processing of rna-seq data performs similarly in intersample analyses while improving intrasample comparisons
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6013957/
https://www.ncbi.nlm.nih.gov/pubmed/29929481
http://dx.doi.org/10.1186/s12859-018-2246-7
work_keys_str_mv AT smidmarcel genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT coeberghvandenbraakrobertrj genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT vandewerkenharmenjg genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT vanrietjob genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT vangalenanne genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT deweerdvanja genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT vandervlugtdaanemichelle genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT brilsandrai genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT lalmahomedzarinas genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT kloostermanwigardp genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT wiltingsaskiam genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT foekensjohna genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT ijzermansjannm genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT martensjohnwm genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons
AT sieuwertsanietam genelengthcorrectedtrimmedmeanofmvaluesgetmmprocessingofrnaseqdataperformssimilarlyinintersampleanalyseswhileimprovingintrasamplecomparisons