Cargando…
On data normalization and batch-effect correction for tumor subtyping with microRNA data
The discovery of new tumor subtypes has been aided by transcriptomics profiling. However, some new subtypes can be irreproducible due to data artifacts that arise from disparate experimental handling. To deal with these artifacts, methods for data normalization and batch-effect correction have been...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9830544/ https://www.ncbi.nlm.nih.gov/pubmed/36632610 http://dx.doi.org/10.1093/nargab/lqac100 |
_version_ | 1784867694041890816 |
---|---|
author | Wu, Yilin Yuen, Becky Wing-Yan Wei, Yingying Qin, Li-Xuan |
author_facet | Wu, Yilin Yuen, Becky Wing-Yan Wei, Yingying Qin, Li-Xuan |
author_sort | Wu, Yilin |
collection | PubMed |
description | The discovery of new tumor subtypes has been aided by transcriptomics profiling. However, some new subtypes can be irreproducible due to data artifacts that arise from disparate experimental handling. To deal with these artifacts, methods for data normalization and batch-effect correction have been utilized before performing sample clustering for disease subtyping, despite that these methods were primarily developed for group comparison. It remains to be elucidated whether they are effective for sample clustering. We examined this issue with a re-sampling-based simulation study that leverages a pair of microRNA microarray data sets. Our study showed that (i) normalization generally benefited the discovery of sample clusters and quantile normalization tended to be the best performer, (ii) batch-effect correction was harmful when data artifacts confounded with biological signals, and (iii) their performance can be influenced by the choice of clustering method with the Prediction Around Medoid method based on Pearson correlation being consistently a best performer. Our study provides important insights on the use of data normalization and batch-effect correction in connection with the design of array-to-sample assignment and the choice of clustering method for facilitating accurate and reproducible discovery of tumor subtypes with microRNAs. |
format | Online Article Text |
id | pubmed-9830544 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-98305442023-01-10 On data normalization and batch-effect correction for tumor subtyping with microRNA data Wu, Yilin Yuen, Becky Wing-Yan Wei, Yingying Qin, Li-Xuan NAR Genom Bioinform Standard Article The discovery of new tumor subtypes has been aided by transcriptomics profiling. However, some new subtypes can be irreproducible due to data artifacts that arise from disparate experimental handling. To deal with these artifacts, methods for data normalization and batch-effect correction have been utilized before performing sample clustering for disease subtyping, despite that these methods were primarily developed for group comparison. It remains to be elucidated whether they are effective for sample clustering. We examined this issue with a re-sampling-based simulation study that leverages a pair of microRNA microarray data sets. Our study showed that (i) normalization generally benefited the discovery of sample clusters and quantile normalization tended to be the best performer, (ii) batch-effect correction was harmful when data artifacts confounded with biological signals, and (iii) their performance can be influenced by the choice of clustering method with the Prediction Around Medoid method based on Pearson correlation being consistently a best performer. Our study provides important insights on the use of data normalization and batch-effect correction in connection with the design of array-to-sample assignment and the choice of clustering method for facilitating accurate and reproducible discovery of tumor subtypes with microRNAs. Oxford University Press 2023-01-10 /pmc/articles/PMC9830544/ /pubmed/36632610 http://dx.doi.org/10.1093/nargab/lqac100 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Standard Article Wu, Yilin Yuen, Becky Wing-Yan Wei, Yingying Qin, Li-Xuan On data normalization and batch-effect correction for tumor subtyping with microRNA data |
title | On data normalization and batch-effect correction for tumor subtyping with microRNA data |
title_full | On data normalization and batch-effect correction for tumor subtyping with microRNA data |
title_fullStr | On data normalization and batch-effect correction for tumor subtyping with microRNA data |
title_full_unstemmed | On data normalization and batch-effect correction for tumor subtyping with microRNA data |
title_short | On data normalization and batch-effect correction for tumor subtyping with microRNA data |
title_sort | on data normalization and batch-effect correction for tumor subtyping with microrna data |
topic | Standard Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9830544/ https://www.ncbi.nlm.nih.gov/pubmed/36632610 http://dx.doi.org/10.1093/nargab/lqac100 |
work_keys_str_mv | AT wuyilin ondatanormalizationandbatcheffectcorrectionfortumorsubtypingwithmicrornadata AT yuenbeckywingyan ondatanormalizationandbatcheffectcorrectionfortumorsubtypingwithmicrornadata AT weiyingying ondatanormalizationandbatcheffectcorrectionfortumorsubtypingwithmicrornadata AT qinlixuan ondatanormalizationandbatcheffectcorrectionfortumorsubtypingwithmicrornadata |