Cargando…

Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns

Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a fl...

Descripción completa

Detalles Bibliográficos
Autores principales: Borisov, Nicolas, Tkachev, Victor, Simonov, Alexander, Sorokin, Maxim, Kim, Ella, Kuzmin, Denis, Karademir-Yilmaz, Betul, Buzdin, Anton
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10511763/
https://www.ncbi.nlm.nih.gov/pubmed/37745690
http://dx.doi.org/10.3389/fmolb.2023.1237129
_version_ 1785108213798010880
author Borisov, Nicolas
Tkachev, Victor
Simonov, Alexander
Sorokin, Maxim
Kim, Ella
Kuzmin, Denis
Karademir-Yilmaz, Betul
Buzdin, Anton
author_facet Borisov, Nicolas
Tkachev, Victor
Simonov, Alexander
Sorokin, Maxim
Kim, Ella
Kuzmin, Denis
Karademir-Yilmaz, Betul
Buzdin, Anton
author_sort Borisov, Nicolas
collection PubMed
description Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a flexible format that depends on the individual datasets under analysis. Thus, the output data of such normalizations will be poorly compatible with each other. Recently we proposed a new approach to gene expression data normalization termed Shambhala which returns harmonized data in a uniform shape, where every expression profile is transformed into a pre-defined universal format. We previously showed that following shambhalization of human RNA profiles, overall tissue-specific clustering features are strongly retained while platform-specific clustering is dramatically reduced. Methods: Here, we tested Shambhala performance in retention of fold-change gene expression features and other functional characteristics of gene clusters such as pathway activation levels and predicted cancer drug activity scores. Results: Using 6,793 cancer and 11,135 normal tissue gene expression profiles from the literature and experimental datasets, we applied twelve performance criteria for different versions of Shambhala and other methods of transcriptomic harmonization with flexible output data format. Such criteria dealt with the biological type classifiers, hierarchical clustering, correlation/regression properties, stability of drug efficiency scores, and data quality for using machine learning classifiers. Discussion: Shambhala-2 harmonizer demonstrated the best results with the close to 1 correlation and linear regression coefficients for the comparison of training vs validation datasets and more than two times lesser instability for calculation of drug efficiency scores compared to other methods.
format Online
Article
Text
id pubmed-10511763
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-105117632023-09-22 Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns Borisov, Nicolas Tkachev, Victor Simonov, Alexander Sorokin, Maxim Kim, Ella Kuzmin, Denis Karademir-Yilmaz, Betul Buzdin, Anton Front Mol Biosci Molecular Biosciences Introduction: Co-normalization of RNA profiles obtained using different experimental platforms and protocols opens avenue for comprehensive comparison of relevant features like differentially expressed genes associated with disease. Currently, most of bioinformatic tools enable normalization in a flexible format that depends on the individual datasets under analysis. Thus, the output data of such normalizations will be poorly compatible with each other. Recently we proposed a new approach to gene expression data normalization termed Shambhala which returns harmonized data in a uniform shape, where every expression profile is transformed into a pre-defined universal format. We previously showed that following shambhalization of human RNA profiles, overall tissue-specific clustering features are strongly retained while platform-specific clustering is dramatically reduced. Methods: Here, we tested Shambhala performance in retention of fold-change gene expression features and other functional characteristics of gene clusters such as pathway activation levels and predicted cancer drug activity scores. Results: Using 6,793 cancer and 11,135 normal tissue gene expression profiles from the literature and experimental datasets, we applied twelve performance criteria for different versions of Shambhala and other methods of transcriptomic harmonization with flexible output data format. Such criteria dealt with the biological type classifiers, hierarchical clustering, correlation/regression properties, stability of drug efficiency scores, and data quality for using machine learning classifiers. Discussion: Shambhala-2 harmonizer demonstrated the best results with the close to 1 correlation and linear regression coefficients for the comparison of training vs validation datasets and more than two times lesser instability for calculation of drug efficiency scores compared to other methods. Frontiers Media S.A. 2023-09-06 /pmc/articles/PMC10511763/ /pubmed/37745690 http://dx.doi.org/10.3389/fmolb.2023.1237129 Text en Copyright © 2023 Borisov, Tkachev, Simonov, Sorokin, Kim, Kuzmin, Karademir-Yilmaz and Buzdin. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Molecular Biosciences
Borisov, Nicolas
Tkachev, Victor
Simonov, Alexander
Sorokin, Maxim
Kim, Ella
Kuzmin, Denis
Karademir-Yilmaz, Betul
Buzdin, Anton
Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns
title Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns
title_full Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns
title_fullStr Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns
title_full_unstemmed Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns
title_short Uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns
title_sort uniformly shaped harmonization combines human transcriptomic data from different platforms while retaining their biological properties and differential gene expression patterns
topic Molecular Biosciences
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10511763/
https://www.ncbi.nlm.nih.gov/pubmed/37745690
http://dx.doi.org/10.3389/fmolb.2023.1237129
work_keys_str_mv AT borisovnicolas uniformlyshapedharmonizationcombineshumantranscriptomicdatafromdifferentplatformswhileretainingtheirbiologicalpropertiesanddifferentialgeneexpressionpatterns
AT tkachevvictor uniformlyshapedharmonizationcombineshumantranscriptomicdatafromdifferentplatformswhileretainingtheirbiologicalpropertiesanddifferentialgeneexpressionpatterns
AT simonovalexander uniformlyshapedharmonizationcombineshumantranscriptomicdatafromdifferentplatformswhileretainingtheirbiologicalpropertiesanddifferentialgeneexpressionpatterns
AT sorokinmaxim uniformlyshapedharmonizationcombineshumantranscriptomicdatafromdifferentplatformswhileretainingtheirbiologicalpropertiesanddifferentialgeneexpressionpatterns
AT kimella uniformlyshapedharmonizationcombineshumantranscriptomicdatafromdifferentplatformswhileretainingtheirbiologicalpropertiesanddifferentialgeneexpressionpatterns
AT kuzmindenis uniformlyshapedharmonizationcombineshumantranscriptomicdatafromdifferentplatformswhileretainingtheirbiologicalpropertiesanddifferentialgeneexpressionpatterns
AT karademiryilmazbetul uniformlyshapedharmonizationcombineshumantranscriptomicdatafromdifferentplatformswhileretainingtheirbiologicalpropertiesanddifferentialgeneexpressionpatterns
AT buzdinanton uniformlyshapedharmonizationcombineshumantranscriptomicdatafromdifferentplatformswhileretainingtheirbiologicalpropertiesanddifferentialgeneexpressionpatterns