Cargando…

Fast and interpretable genomic data analysis using multiple approximate kernel learning

MOTIVATION: Dataset sizes in computational biology have been increased drastically with the help of improved data collection tools and increasing size of patient cohorts. Previous kernel-based machine learning algorithms proposed for increased interpretability started to fail with large sample sizes...

Descripción completa

Detalles Bibliográficos
Autores principales: Bektaş, Ayyüce Begüm, Ak, Çiğdem, Gönen, Mehmet
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235505/
https://www.ncbi.nlm.nih.gov/pubmed/35758810
http://dx.doi.org/10.1093/bioinformatics/btac241
_version_ 1784736326706266112
author Bektaş, Ayyüce Begüm
Ak, Çiğdem
Gönen, Mehmet
author_facet Bektaş, Ayyüce Begüm
Ak, Çiğdem
Gönen, Mehmet
author_sort Bektaş, Ayyüce Begüm
collection PubMed
description MOTIVATION: Dataset sizes in computational biology have been increased drastically with the help of improved data collection tools and increasing size of patient cohorts. Previous kernel-based machine learning algorithms proposed for increased interpretability started to fail with large sample sizes, owing to their lack of scalability. To overcome this problem, we proposed a fast and efficient multiple kernel learning (MKL) algorithm to be particularly used with large-scale data that integrates kernel approximation and group Lasso formulations into a conjoint model. Our method extracts significant and meaningful information from the genomic data while conjointly learning a model for out-of-sample prediction. It is scalable with increasing sample size by approximating instead of calculating distinct kernel matrices. RESULTS: To test our computational framework, namely, Multiple Approximate Kernel Learning (MAKL), we demonstrated our experiments on three cancer datasets and showed that MAKL is capable to outperform the baseline algorithm while using only a small fraction of the input features. We also reported selection frequencies of approximated kernel matrices associated with feature subsets (i.e. gene sets/pathways), which helps to see their relevance for the given classification task. Our fast and interpretable MKL algorithm producing sparse solutions is promising for computational biology applications considering its scalability and highly correlated structure of genomic datasets, and it can be used to discover new biomarkers and new therapeutic guidelines. AVAILABILITY AND IMPLEMENTATION: MAKL is available at https://github.com/begumbektas/makl together with the scripts that replicate the reported experiments. MAKL is also available as an R package at https://cran.r-project.org/web/packages/MAKL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9235505
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-92355052022-06-29 Fast and interpretable genomic data analysis using multiple approximate kernel learning Bektaş, Ayyüce Begüm Ak, Çiğdem Gönen, Mehmet Bioinformatics ISCB/Ismb 2022 MOTIVATION: Dataset sizes in computational biology have been increased drastically with the help of improved data collection tools and increasing size of patient cohorts. Previous kernel-based machine learning algorithms proposed for increased interpretability started to fail with large sample sizes, owing to their lack of scalability. To overcome this problem, we proposed a fast and efficient multiple kernel learning (MKL) algorithm to be particularly used with large-scale data that integrates kernel approximation and group Lasso formulations into a conjoint model. Our method extracts significant and meaningful information from the genomic data while conjointly learning a model for out-of-sample prediction. It is scalable with increasing sample size by approximating instead of calculating distinct kernel matrices. RESULTS: To test our computational framework, namely, Multiple Approximate Kernel Learning (MAKL), we demonstrated our experiments on three cancer datasets and showed that MAKL is capable to outperform the baseline algorithm while using only a small fraction of the input features. We also reported selection frequencies of approximated kernel matrices associated with feature subsets (i.e. gene sets/pathways), which helps to see their relevance for the given classification task. Our fast and interpretable MKL algorithm producing sparse solutions is promising for computational biology applications considering its scalability and highly correlated structure of genomic datasets, and it can be used to discover new biomarkers and new therapeutic guidelines. AVAILABILITY AND IMPLEMENTATION: MAKL is available at https://github.com/begumbektas/makl together with the scripts that replicate the reported experiments. MAKL is also available as an R package at https://cran.r-project.org/web/packages/MAKL. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-06-27 /pmc/articles/PMC9235505/ /pubmed/35758810 http://dx.doi.org/10.1093/bioinformatics/btac241 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle ISCB/Ismb 2022
Bektaş, Ayyüce Begüm
Ak, Çiğdem
Gönen, Mehmet
Fast and interpretable genomic data analysis using multiple approximate kernel learning
title Fast and interpretable genomic data analysis using multiple approximate kernel learning
title_full Fast and interpretable genomic data analysis using multiple approximate kernel learning
title_fullStr Fast and interpretable genomic data analysis using multiple approximate kernel learning
title_full_unstemmed Fast and interpretable genomic data analysis using multiple approximate kernel learning
title_short Fast and interpretable genomic data analysis using multiple approximate kernel learning
title_sort fast and interpretable genomic data analysis using multiple approximate kernel learning
topic ISCB/Ismb 2022
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9235505/
https://www.ncbi.nlm.nih.gov/pubmed/35758810
http://dx.doi.org/10.1093/bioinformatics/btac241
work_keys_str_mv AT bektasayyucebegum fastandinterpretablegenomicdataanalysisusingmultipleapproximatekernellearning
AT akcigdem fastandinterpretablegenomicdataanalysisusingmultipleapproximatekernellearning
AT gonenmehmet fastandinterpretablegenomicdataanalysisusingmultipleapproximatekernellearning