Cargando…

CuBlock: a cross-platform normalization method for gene-expression microarrays

MOTIVATION: Cross-(multi)platform normalization of gene-expression microarray data remains an unresolved issue. Despite the existence of several algorithms, they are either constrained by the need to normalize all samples of all platforms together, compromising scalability and reuse, by adherence to...

Descripción completa

Detalles Bibliográficos
Autores principales: Junet, Valentin, Farrés, Judith, Mas, José M, Daura, Xavier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8388031/
https://www.ncbi.nlm.nih.gov/pubmed/33609102
http://dx.doi.org/10.1093/bioinformatics/btab105
_version_ 1783742562545172480
author Junet, Valentin
Farrés, Judith
Mas, José M
Daura, Xavier
author_facet Junet, Valentin
Farrés, Judith
Mas, José M
Daura, Xavier
author_sort Junet, Valentin
collection PubMed
description MOTIVATION: Cross-(multi)platform normalization of gene-expression microarray data remains an unresolved issue. Despite the existence of several algorithms, they are either constrained by the need to normalize all samples of all platforms together, compromising scalability and reuse, by adherence to the platforms of a specific provider, or simply by poor performance. In addition, many of the methods presented in the literature have not been specifically tested against multi-platform data and/or other methods applicable in this context. Thus, we set out to develop a normalization algorithm appropriate for gene-expression studies based on multiple, potentially large microarray sets collected along multiple platforms and at different times, applicable in systematic studies aimed at extracting knowledge from the wealth of microarray data available in public repositories; for example, for the extraction of Real-World Data to complement data from Randomized Controlled Trials. Our main focus or criterion for performance was on the capacity of the algorithm to properly separate samples from different biological groups. RESULTS: We present CuBlock, an algorithm addressing this objective, together with a strategy to validate cross-platform normalization methods. To validate the algorithm and benchmark it against existing methods, we used two distinct datasets, one specifically generated for testing and standardization purposes and one from an actual experimental study. Using these datasets, we benchmarked CuBlock against ComBat (Johnson et al., 2007), UPC (Piccolo et al., 2013), YuGene (Lê Cao et al., 2014), DBNorm (Meng et al., 2017), Shambhala (Borisov et al., 2019) and a simple log(2) transform as reference. We note that many other popular normalization methods are not applicable in this context. CuBlock was the only algorithm in this group that could always and clearly differentiate the underlying biological groups after mixing the data, from up to six different platforms in this study. AVAILABILITY AND IMPLEMENTATION: CuBlock can be downloaded from https://www.mathworks.com/matlabcentral/fileexchange/77882-cublock. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8388031
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83880312021-08-26 CuBlock: a cross-platform normalization method for gene-expression microarrays Junet, Valentin Farrés, Judith Mas, José M Daura, Xavier Bioinformatics Original Papers MOTIVATION: Cross-(multi)platform normalization of gene-expression microarray data remains an unresolved issue. Despite the existence of several algorithms, they are either constrained by the need to normalize all samples of all platforms together, compromising scalability and reuse, by adherence to the platforms of a specific provider, or simply by poor performance. In addition, many of the methods presented in the literature have not been specifically tested against multi-platform data and/or other methods applicable in this context. Thus, we set out to develop a normalization algorithm appropriate for gene-expression studies based on multiple, potentially large microarray sets collected along multiple platforms and at different times, applicable in systematic studies aimed at extracting knowledge from the wealth of microarray data available in public repositories; for example, for the extraction of Real-World Data to complement data from Randomized Controlled Trials. Our main focus or criterion for performance was on the capacity of the algorithm to properly separate samples from different biological groups. RESULTS: We present CuBlock, an algorithm addressing this objective, together with a strategy to validate cross-platform normalization methods. To validate the algorithm and benchmark it against existing methods, we used two distinct datasets, one specifically generated for testing and standardization purposes and one from an actual experimental study. Using these datasets, we benchmarked CuBlock against ComBat (Johnson et al., 2007), UPC (Piccolo et al., 2013), YuGene (Lê Cao et al., 2014), DBNorm (Meng et al., 2017), Shambhala (Borisov et al., 2019) and a simple log(2) transform as reference. We note that many other popular normalization methods are not applicable in this context. CuBlock was the only algorithm in this group that could always and clearly differentiate the underlying biological groups after mixing the data, from up to six different platforms in this study. AVAILABILITY AND IMPLEMENTATION: CuBlock can be downloaded from https://www.mathworks.com/matlabcentral/fileexchange/77882-cublock. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-02-20 /pmc/articles/PMC8388031/ /pubmed/33609102 http://dx.doi.org/10.1093/bioinformatics/btab105 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Junet, Valentin
Farrés, Judith
Mas, José M
Daura, Xavier
CuBlock: a cross-platform normalization method for gene-expression microarrays
title CuBlock: a cross-platform normalization method for gene-expression microarrays
title_full CuBlock: a cross-platform normalization method for gene-expression microarrays
title_fullStr CuBlock: a cross-platform normalization method for gene-expression microarrays
title_full_unstemmed CuBlock: a cross-platform normalization method for gene-expression microarrays
title_short CuBlock: a cross-platform normalization method for gene-expression microarrays
title_sort cublock: a cross-platform normalization method for gene-expression microarrays
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8388031/
https://www.ncbi.nlm.nih.gov/pubmed/33609102
http://dx.doi.org/10.1093/bioinformatics/btab105
work_keys_str_mv AT junetvalentin cublockacrossplatformnormalizationmethodforgeneexpressionmicroarrays
AT farresjudith cublockacrossplatformnormalizationmethodforgeneexpressionmicroarrays
AT masjosem cublockacrossplatformnormalizationmethodforgeneexpressionmicroarrays
AT dauraxavier cublockacrossplatformnormalizationmethodforgeneexpressionmicroarrays