Cargando…

Increased comparability between RNA-Seq and microarray data by utilization of gene sets

The field of transcriptomics uses and measures mRNA as a proxy of gene expression. There are currently two major platforms in use for quantifying mRNA, microarray and RNA-Seq. Many comparative studies have shown that their results are not always consistent. In this study we aim to find a robust meth...

Descripción completa

Detalles Bibliográficos
Autores principales: van der Kloet, Frans M., Buurmans, Jeroen, Jonker, Martijs J., Smilde, Age K., Westerhuis, Johan A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7549825/
https://www.ncbi.nlm.nih.gov/pubmed/32997685
http://dx.doi.org/10.1371/journal.pcbi.1008295
_version_ 1783592856043126784
author van der Kloet, Frans M.
Buurmans, Jeroen
Jonker, Martijs J.
Smilde, Age K.
Westerhuis, Johan A.
author_facet van der Kloet, Frans M.
Buurmans, Jeroen
Jonker, Martijs J.
Smilde, Age K.
Westerhuis, Johan A.
author_sort van der Kloet, Frans M.
collection PubMed
description The field of transcriptomics uses and measures mRNA as a proxy of gene expression. There are currently two major platforms in use for quantifying mRNA, microarray and RNA-Seq. Many comparative studies have shown that their results are not always consistent. In this study we aim to find a robust method to increase comparability of both platforms enabling data analysis of merged data from both platforms. We transformed high dimensional transcriptomics data from two different platforms into a lower dimensional, and biologically relevant dataset by calculating enrichment scores based on gene set collections for all samples. We compared the similarity between data from both platforms based on the raw data and on the enrichment scores. We show that the performed data transforms the data in a biologically relevant way and filters out noise which leads to increased platform concordance. We validate the procedure using predictive models built with microarray based enrichment scores to predict subtypes of breast cancer using enrichment scores based on sequenced data. Although microarray and RNA-Seq expression levels might appear different, transforming them into biologically relevant gene set enrichment scores significantly increases their correlation, which is a step forward in data integration of the two platforms. The gene set collections were shown to contain biologically relevant gene sets. More in-depth investigation on the effect of the composition, size, and number of gene sets that are used for the transformation is suggested for future research.
format Online
Article
Text
id pubmed-7549825
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-75498252020-10-20 Increased comparability between RNA-Seq and microarray data by utilization of gene sets van der Kloet, Frans M. Buurmans, Jeroen Jonker, Martijs J. Smilde, Age K. Westerhuis, Johan A. PLoS Comput Biol Research Article The field of transcriptomics uses and measures mRNA as a proxy of gene expression. There are currently two major platforms in use for quantifying mRNA, microarray and RNA-Seq. Many comparative studies have shown that their results are not always consistent. In this study we aim to find a robust method to increase comparability of both platforms enabling data analysis of merged data from both platforms. We transformed high dimensional transcriptomics data from two different platforms into a lower dimensional, and biologically relevant dataset by calculating enrichment scores based on gene set collections for all samples. We compared the similarity between data from both platforms based on the raw data and on the enrichment scores. We show that the performed data transforms the data in a biologically relevant way and filters out noise which leads to increased platform concordance. We validate the procedure using predictive models built with microarray based enrichment scores to predict subtypes of breast cancer using enrichment scores based on sequenced data. Although microarray and RNA-Seq expression levels might appear different, transforming them into biologically relevant gene set enrichment scores significantly increases their correlation, which is a step forward in data integration of the two platforms. The gene set collections were shown to contain biologically relevant gene sets. More in-depth investigation on the effect of the composition, size, and number of gene sets that are used for the transformation is suggested for future research. Public Library of Science 2020-09-30 /pmc/articles/PMC7549825/ /pubmed/32997685 http://dx.doi.org/10.1371/journal.pcbi.1008295 Text en © 2020 van der Kloet et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
van der Kloet, Frans M.
Buurmans, Jeroen
Jonker, Martijs J.
Smilde, Age K.
Westerhuis, Johan A.
Increased comparability between RNA-Seq and microarray data by utilization of gene sets
title Increased comparability between RNA-Seq and microarray data by utilization of gene sets
title_full Increased comparability between RNA-Seq and microarray data by utilization of gene sets
title_fullStr Increased comparability between RNA-Seq and microarray data by utilization of gene sets
title_full_unstemmed Increased comparability between RNA-Seq and microarray data by utilization of gene sets
title_short Increased comparability between RNA-Seq and microarray data by utilization of gene sets
title_sort increased comparability between rna-seq and microarray data by utilization of gene sets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7549825/
https://www.ncbi.nlm.nih.gov/pubmed/32997685
http://dx.doi.org/10.1371/journal.pcbi.1008295
work_keys_str_mv AT vanderkloetfransm increasedcomparabilitybetweenrnaseqandmicroarraydatabyutilizationofgenesets
AT buurmansjeroen increasedcomparabilitybetweenrnaseqandmicroarraydatabyutilizationofgenesets
AT jonkermartijsj increasedcomparabilitybetweenrnaseqandmicroarraydatabyutilizationofgenesets
AT smildeagek increasedcomparabilitybetweenrnaseqandmicroarraydatabyutilizationofgenesets
AT westerhuisjohana increasedcomparabilitybetweenrnaseqandmicroarraydatabyutilizationofgenesets