Cargando…

On data normalization and batch-effect correction for tumor subtyping with microRNA data

The discovery of new tumor subtypes has been aided by transcriptomics profiling. However, some new subtypes can be irreproducible due to data artifacts that arise from disparate experimental handling. To deal with these artifacts, methods for data normalization and batch-effect correction have been...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Yilin, Yuen, Becky Wing-Yan, Wei, Yingying, Qin, Li-Xuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9830544/
https://www.ncbi.nlm.nih.gov/pubmed/36632610
http://dx.doi.org/10.1093/nargab/lqac100
_version_ 1784867694041890816
author Wu, Yilin
Yuen, Becky Wing-Yan
Wei, Yingying
Qin, Li-Xuan
author_facet Wu, Yilin
Yuen, Becky Wing-Yan
Wei, Yingying
Qin, Li-Xuan
author_sort Wu, Yilin
collection PubMed
description The discovery of new tumor subtypes has been aided by transcriptomics profiling. However, some new subtypes can be irreproducible due to data artifacts that arise from disparate experimental handling. To deal with these artifacts, methods for data normalization and batch-effect correction have been utilized before performing sample clustering for disease subtyping, despite that these methods were primarily developed for group comparison. It remains to be elucidated whether they are effective for sample clustering. We examined this issue with a re-sampling-based simulation study that leverages a pair of microRNA microarray data sets. Our study showed that (i) normalization generally benefited the discovery of sample clusters and quantile normalization tended to be the best performer, (ii) batch-effect correction was harmful when data artifacts confounded with biological signals, and (iii) their performance can be influenced by the choice of clustering method with the Prediction Around Medoid method based on Pearson correlation being consistently a best performer. Our study provides important insights on the use of data normalization and batch-effect correction in connection with the design of array-to-sample assignment and the choice of clustering method for facilitating accurate and reproducible discovery of tumor subtypes with microRNAs.
format Online
Article
Text
id pubmed-9830544
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98305442023-01-10 On data normalization and batch-effect correction for tumor subtyping with microRNA data Wu, Yilin Yuen, Becky Wing-Yan Wei, Yingying Qin, Li-Xuan NAR Genom Bioinform Standard Article The discovery of new tumor subtypes has been aided by transcriptomics profiling. However, some new subtypes can be irreproducible due to data artifacts that arise from disparate experimental handling. To deal with these artifacts, methods for data normalization and batch-effect correction have been utilized before performing sample clustering for disease subtyping, despite that these methods were primarily developed for group comparison. It remains to be elucidated whether they are effective for sample clustering. We examined this issue with a re-sampling-based simulation study that leverages a pair of microRNA microarray data sets. Our study showed that (i) normalization generally benefited the discovery of sample clusters and quantile normalization tended to be the best performer, (ii) batch-effect correction was harmful when data artifacts confounded with biological signals, and (iii) their performance can be influenced by the choice of clustering method with the Prediction Around Medoid method based on Pearson correlation being consistently a best performer. Our study provides important insights on the use of data normalization and batch-effect correction in connection with the design of array-to-sample assignment and the choice of clustering method for facilitating accurate and reproducible discovery of tumor subtypes with microRNAs. Oxford University Press 2023-01-10 /pmc/articles/PMC9830544/ /pubmed/36632610 http://dx.doi.org/10.1093/nargab/lqac100 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Wu, Yilin
Yuen, Becky Wing-Yan
Wei, Yingying
Qin, Li-Xuan
On data normalization and batch-effect correction for tumor subtyping with microRNA data
title On data normalization and batch-effect correction for tumor subtyping with microRNA data
title_full On data normalization and batch-effect correction for tumor subtyping with microRNA data
title_fullStr On data normalization and batch-effect correction for tumor subtyping with microRNA data
title_full_unstemmed On data normalization and batch-effect correction for tumor subtyping with microRNA data
title_short On data normalization and batch-effect correction for tumor subtyping with microRNA data
title_sort on data normalization and batch-effect correction for tumor subtyping with microrna data
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9830544/
https://www.ncbi.nlm.nih.gov/pubmed/36632610
http://dx.doi.org/10.1093/nargab/lqac100
work_keys_str_mv AT wuyilin ondatanormalizationandbatcheffectcorrectionfortumorsubtypingwithmicrornadata
AT yuenbeckywingyan ondatanormalizationandbatcheffectcorrectionfortumorsubtypingwithmicrornadata
AT weiyingying ondatanormalizationandbatcheffectcorrectionfortumorsubtypingwithmicrornadata
AT qinlixuan ondatanormalizationandbatcheffectcorrectionfortumorsubtypingwithmicrornadata