Cargando…

On the impact of batch effect correction in TCGA isomiR expression data

MicroRNAs (miRNAs) are small non-coding RNAs with diverse functions in post-transcriptional regulation of gene expression. Sequence and length variants of miRNAs are called isomiRs and can exert different functions compared to their canonical counterparts. The Cancer Genome Atlas (TCGA) provides iso...

Descripción completa

Detalles Bibliográficos
Autores principales: Ibing, Susanne, Michels, Birgitta E, Mosdzien, Moritz, Meyer, Helen R, Feuerbach, Lars, Körner, Cindy
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8210273/
https://www.ncbi.nlm.nih.gov/pubmed/34316700
http://dx.doi.org/10.1093/narcan/zcab007
_version_ 1783709276771975168
author Ibing, Susanne
Michels, Birgitta E
Mosdzien, Moritz
Meyer, Helen R
Feuerbach, Lars
Körner, Cindy
author_facet Ibing, Susanne
Michels, Birgitta E
Mosdzien, Moritz
Meyer, Helen R
Feuerbach, Lars
Körner, Cindy
author_sort Ibing, Susanne
collection PubMed
description MicroRNAs (miRNAs) are small non-coding RNAs with diverse functions in post-transcriptional regulation of gene expression. Sequence and length variants of miRNAs are called isomiRs and can exert different functions compared to their canonical counterparts. The Cancer Genome Atlas (TCGA) provides isomiR-level expression data for patients of various cancer entities collected in a multi-center approach over several years. However, the impact of batch effects within individual cohorts has not been systematically investigated and corrected for before. Therefore, the aim of this study was to identify relevant cohort-specific batch variables and generate batch-corrected isomiR expression data for 16 TCGA cohorts. The main batch variables included sequencing platform, plate, sample purity and sequencing depth. Platform bias was related to certain length and sequence features of individual recurrently affected isomiRs. Furthermore, significant downregulation of reported tumor suppressive isomiRs in lung tumor tissue compared to normal samples was only observed after batch correction, highlighting the importance of working with corrected data. Batch-corrected datasets for all cohorts including quality control are provided as supplement. In summary, this study reveals that batch effects present in the TCGA dataset might mask biologically relevant effects and provides a valuable resource for research on isomiRs in cancer (accessible through GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164767).
format Online
Article
Text
id pubmed-8210273
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82102732021-07-26 On the impact of batch effect correction in TCGA isomiR expression data Ibing, Susanne Michels, Birgitta E Mosdzien, Moritz Meyer, Helen R Feuerbach, Lars Körner, Cindy NAR Cancer Cancer Computational Biology MicroRNAs (miRNAs) are small non-coding RNAs with diverse functions in post-transcriptional regulation of gene expression. Sequence and length variants of miRNAs are called isomiRs and can exert different functions compared to their canonical counterparts. The Cancer Genome Atlas (TCGA) provides isomiR-level expression data for patients of various cancer entities collected in a multi-center approach over several years. However, the impact of batch effects within individual cohorts has not been systematically investigated and corrected for before. Therefore, the aim of this study was to identify relevant cohort-specific batch variables and generate batch-corrected isomiR expression data for 16 TCGA cohorts. The main batch variables included sequencing platform, plate, sample purity and sequencing depth. Platform bias was related to certain length and sequence features of individual recurrently affected isomiRs. Furthermore, significant downregulation of reported tumor suppressive isomiRs in lung tumor tissue compared to normal samples was only observed after batch correction, highlighting the importance of working with corrected data. Batch-corrected datasets for all cohorts including quality control are provided as supplement. In summary, this study reveals that batch effects present in the TCGA dataset might mask biologically relevant effects and provides a valuable resource for research on isomiRs in cancer (accessible through GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164767). Oxford University Press 2021-03-11 /pmc/articles/PMC8210273/ /pubmed/34316700 http://dx.doi.org/10.1093/narcan/zcab007 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Cancer. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Cancer Computational Biology
Ibing, Susanne
Michels, Birgitta E
Mosdzien, Moritz
Meyer, Helen R
Feuerbach, Lars
Körner, Cindy
On the impact of batch effect correction in TCGA isomiR expression data
title On the impact of batch effect correction in TCGA isomiR expression data
title_full On the impact of batch effect correction in TCGA isomiR expression data
title_fullStr On the impact of batch effect correction in TCGA isomiR expression data
title_full_unstemmed On the impact of batch effect correction in TCGA isomiR expression data
title_short On the impact of batch effect correction in TCGA isomiR expression data
title_sort on the impact of batch effect correction in tcga isomir expression data
topic Cancer Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8210273/
https://www.ncbi.nlm.nih.gov/pubmed/34316700
http://dx.doi.org/10.1093/narcan/zcab007
work_keys_str_mv AT ibingsusanne ontheimpactofbatcheffectcorrectionintcgaisomirexpressiondata
AT michelsbirgittae ontheimpactofbatcheffectcorrectionintcgaisomirexpressiondata
AT mosdzienmoritz ontheimpactofbatcheffectcorrectionintcgaisomirexpressiondata
AT meyerhelenr ontheimpactofbatcheffectcorrectionintcgaisomirexpressiondata
AT feuerbachlars ontheimpactofbatcheffectcorrectionintcgaisomirexpressiondata
AT kornercindy ontheimpactofbatcheffectcorrectionintcgaisomirexpressiondata