Cargando…

robustica: customizable robust independent component analysis

BACKGROUND: Independent Component Analysis (ICA) allows the dissection of omic datasets into modules that help to interpret global molecular signatures. The inherent randomness of this algorithm can be overcome by clustering many iterations of ICA together to obtain robust components. Existing algor...

Descripción completa

Detalles Bibliográficos
Autores principales: Anglada-Girotto, Miquel, Miravet-Verde, Samuel, Serrano, Luis, Head, Sarah A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9721028/
https://www.ncbi.nlm.nih.gov/pubmed/36471244
http://dx.doi.org/10.1186/s12859-022-05043-9
_version_ 1784843678788878336
author Anglada-Girotto, Miquel
Miravet-Verde, Samuel
Serrano, Luis
Head, Sarah A.
author_facet Anglada-Girotto, Miquel
Miravet-Verde, Samuel
Serrano, Luis
Head, Sarah A.
author_sort Anglada-Girotto, Miquel
collection PubMed
description BACKGROUND: Independent Component Analysis (ICA) allows the dissection of omic datasets into modules that help to interpret global molecular signatures. The inherent randomness of this algorithm can be overcome by clustering many iterations of ICA together to obtain robust components. Existing algorithms for robust ICA are dependent on the choice of clustering method and on computing a potentially biased and large Pearson distance matrix. RESULTS: We present robustica, a Python-based package to compute robust independent components with a fully customizable clustering algorithm and distance metric. Here, we exploited its customizability to revisit and optimize robust ICA systematically. Of the 6 popular clustering algorithms considered, DBSCAN performed the best at clustering independent components across ICA iterations. To enable using Euclidean distances, we created a subroutine that infers and corrects the components’ signs across ICA iterations. Our subroutine increased the resolution, robustness, and computational efficiency of the algorithm. Finally, we show the applicability of robustica by dissecting over 500 tumor samples from low-grade glioma (LGG) patients, where we define two new gene expression modules with key modulators of tumor progression upon IDH1 and TP53 mutagenesis. CONCLUSION: robustica brings precise, efficient, and customizable robust ICA into the Python toolbox. Through its customizability, we explored how different clustering algorithms and distance metrics can further optimize robust ICA. Then, we showcased how robustica can be used to discover gene modules associated with combinations of features of biological interest. Taken together, given the broad applicability of ICA for omic data analysis, we envision robustica will facilitate the seamless computation and integration of robust independent components in large pipelines. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05043-9.
format Online
Article
Text
id pubmed-9721028
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-97210282022-12-06 robustica: customizable robust independent component analysis Anglada-Girotto, Miquel Miravet-Verde, Samuel Serrano, Luis Head, Sarah A. BMC Bioinformatics Software BACKGROUND: Independent Component Analysis (ICA) allows the dissection of omic datasets into modules that help to interpret global molecular signatures. The inherent randomness of this algorithm can be overcome by clustering many iterations of ICA together to obtain robust components. Existing algorithms for robust ICA are dependent on the choice of clustering method and on computing a potentially biased and large Pearson distance matrix. RESULTS: We present robustica, a Python-based package to compute robust independent components with a fully customizable clustering algorithm and distance metric. Here, we exploited its customizability to revisit and optimize robust ICA systematically. Of the 6 popular clustering algorithms considered, DBSCAN performed the best at clustering independent components across ICA iterations. To enable using Euclidean distances, we created a subroutine that infers and corrects the components’ signs across ICA iterations. Our subroutine increased the resolution, robustness, and computational efficiency of the algorithm. Finally, we show the applicability of robustica by dissecting over 500 tumor samples from low-grade glioma (LGG) patients, where we define two new gene expression modules with key modulators of tumor progression upon IDH1 and TP53 mutagenesis. CONCLUSION: robustica brings precise, efficient, and customizable robust ICA into the Python toolbox. Through its customizability, we explored how different clustering algorithms and distance metrics can further optimize robust ICA. Then, we showcased how robustica can be used to discover gene modules associated with combinations of features of biological interest. Taken together, given the broad applicability of ICA for omic data analysis, we envision robustica will facilitate the seamless computation and integration of robust independent components in large pipelines. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-05043-9. BioMed Central 2022-12-05 /pmc/articles/PMC9721028/ /pubmed/36471244 http://dx.doi.org/10.1186/s12859-022-05043-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Anglada-Girotto, Miquel
Miravet-Verde, Samuel
Serrano, Luis
Head, Sarah A.
robustica: customizable robust independent component analysis
title robustica: customizable robust independent component analysis
title_full robustica: customizable robust independent component analysis
title_fullStr robustica: customizable robust independent component analysis
title_full_unstemmed robustica: customizable robust independent component analysis
title_short robustica: customizable robust independent component analysis
title_sort robustica: customizable robust independent component analysis
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9721028/
https://www.ncbi.nlm.nih.gov/pubmed/36471244
http://dx.doi.org/10.1186/s12859-022-05043-9
work_keys_str_mv AT angladagirottomiquel robusticacustomizablerobustindependentcomponentanalysis
AT miravetverdesamuel robusticacustomizablerobustindependentcomponentanalysis
AT serranoluis robusticacustomizablerobustindependentcomponentanalysis
AT headsaraha robusticacustomizablerobustindependentcomponentanalysis