Cargando…
On the impact of batch effect correction in TCGA isomiR expression data
MicroRNAs (miRNAs) are small non-coding RNAs with diverse functions in post-transcriptional regulation of gene expression. Sequence and length variants of miRNAs are called isomiRs and can exert different functions compared to their canonical counterparts. The Cancer Genome Atlas (TCGA) provides iso...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8210273/ https://www.ncbi.nlm.nih.gov/pubmed/34316700 http://dx.doi.org/10.1093/narcan/zcab007 |
_version_ | 1783709276771975168 |
---|---|
author | Ibing, Susanne Michels, Birgitta E Mosdzien, Moritz Meyer, Helen R Feuerbach, Lars Körner, Cindy |
author_facet | Ibing, Susanne Michels, Birgitta E Mosdzien, Moritz Meyer, Helen R Feuerbach, Lars Körner, Cindy |
author_sort | Ibing, Susanne |
collection | PubMed |
description | MicroRNAs (miRNAs) are small non-coding RNAs with diverse functions in post-transcriptional regulation of gene expression. Sequence and length variants of miRNAs are called isomiRs and can exert different functions compared to their canonical counterparts. The Cancer Genome Atlas (TCGA) provides isomiR-level expression data for patients of various cancer entities collected in a multi-center approach over several years. However, the impact of batch effects within individual cohorts has not been systematically investigated and corrected for before. Therefore, the aim of this study was to identify relevant cohort-specific batch variables and generate batch-corrected isomiR expression data for 16 TCGA cohorts. The main batch variables included sequencing platform, plate, sample purity and sequencing depth. Platform bias was related to certain length and sequence features of individual recurrently affected isomiRs. Furthermore, significant downregulation of reported tumor suppressive isomiRs in lung tumor tissue compared to normal samples was only observed after batch correction, highlighting the importance of working with corrected data. Batch-corrected datasets for all cohorts including quality control are provided as supplement. In summary, this study reveals that batch effects present in the TCGA dataset might mask biologically relevant effects and provides a valuable resource for research on isomiRs in cancer (accessible through GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164767). |
format | Online Article Text |
id | pubmed-8210273 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-82102732021-07-26 On the impact of batch effect correction in TCGA isomiR expression data Ibing, Susanne Michels, Birgitta E Mosdzien, Moritz Meyer, Helen R Feuerbach, Lars Körner, Cindy NAR Cancer Cancer Computational Biology MicroRNAs (miRNAs) are small non-coding RNAs with diverse functions in post-transcriptional regulation of gene expression. Sequence and length variants of miRNAs are called isomiRs and can exert different functions compared to their canonical counterparts. The Cancer Genome Atlas (TCGA) provides isomiR-level expression data for patients of various cancer entities collected in a multi-center approach over several years. However, the impact of batch effects within individual cohorts has not been systematically investigated and corrected for before. Therefore, the aim of this study was to identify relevant cohort-specific batch variables and generate batch-corrected isomiR expression data for 16 TCGA cohorts. The main batch variables included sequencing platform, plate, sample purity and sequencing depth. Platform bias was related to certain length and sequence features of individual recurrently affected isomiRs. Furthermore, significant downregulation of reported tumor suppressive isomiRs in lung tumor tissue compared to normal samples was only observed after batch correction, highlighting the importance of working with corrected data. Batch-corrected datasets for all cohorts including quality control are provided as supplement. In summary, this study reveals that batch effects present in the TCGA dataset might mask biologically relevant effects and provides a valuable resource for research on isomiRs in cancer (accessible through GEO: https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE164767). Oxford University Press 2021-03-11 /pmc/articles/PMC8210273/ /pubmed/34316700 http://dx.doi.org/10.1093/narcan/zcab007 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Cancer. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Cancer Computational Biology Ibing, Susanne Michels, Birgitta E Mosdzien, Moritz Meyer, Helen R Feuerbach, Lars Körner, Cindy On the impact of batch effect correction in TCGA isomiR expression data |
title | On the impact of batch effect correction in TCGA isomiR expression data |
title_full | On the impact of batch effect correction in TCGA isomiR expression data |
title_fullStr | On the impact of batch effect correction in TCGA isomiR expression data |
title_full_unstemmed | On the impact of batch effect correction in TCGA isomiR expression data |
title_short | On the impact of batch effect correction in TCGA isomiR expression data |
title_sort | on the impact of batch effect correction in tcga isomir expression data |
topic | Cancer Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8210273/ https://www.ncbi.nlm.nih.gov/pubmed/34316700 http://dx.doi.org/10.1093/narcan/zcab007 |
work_keys_str_mv | AT ibingsusanne ontheimpactofbatcheffectcorrectionintcgaisomirexpressiondata AT michelsbirgittae ontheimpactofbatcheffectcorrectionintcgaisomirexpressiondata AT mosdzienmoritz ontheimpactofbatcheffectcorrectionintcgaisomirexpressiondata AT meyerhelenr ontheimpactofbatcheffectcorrectionintcgaisomirexpressiondata AT feuerbachlars ontheimpactofbatcheffectcorrectionintcgaisomirexpressiondata AT kornercindy ontheimpactofbatcheffectcorrectionintcgaisomirexpressiondata |