Cargando…

Identifying homogeneous subgroups of patients and important features: a topological machine learning approach

BACKGROUND: This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. RESULTS: We present a pipeline to identify and summarise clusters based on statistically signifi...

Descripción completa

Detalles Bibliográficos
Autores principales: Carr, Ewan, Carrière, Mathieu, Michel, Bertrand, Chazal, Frédéric, Iniesta, Raquel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8451168/
https://www.ncbi.nlm.nih.gov/pubmed/34544357
http://dx.doi.org/10.1186/s12859-021-04360-9
_version_ 1784569785166594048
author Carr, Ewan
Carrière, Mathieu
Michel, Bertrand
Chazal, Frédéric
Iniesta, Raquel
author_facet Carr, Ewan
Carrière, Mathieu
Michel, Bertrand
Chazal, Frédéric
Iniesta, Raquel
author_sort Carr, Ewan
collection PubMed
description BACKGROUND: This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. RESULTS: We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper. CONCLUSIONS: Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline.
format Online
Article
Text
id pubmed-8451168
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-84511682021-09-20 Identifying homogeneous subgroups of patients and important features: a topological machine learning approach Carr, Ewan Carrière, Mathieu Michel, Bertrand Chazal, Frédéric Iniesta, Raquel BMC Bioinformatics Software BACKGROUND: This paper exploits recent developments in topological data analysis to present a pipeline for clustering based on Mapper, an algorithm that reduces complex data into a one-dimensional graph. RESULTS: We present a pipeline to identify and summarise clusters based on statistically significant topological features from a point cloud using Mapper. CONCLUSIONS: Key strengths of this pipeline include the integration of prior knowledge to inform the clustering process and the selection of optimal clusters; the use of the bootstrap to restrict the search to robust topological features; the use of machine learning to inspect clusters; and the ability to incorporate mixed data types. Our pipeline can be downloaded under the GNU GPLv3 license at https://github.com/kcl-bhi/mapper-pipeline. BioMed Central 2021-09-20 /pmc/articles/PMC8451168/ /pubmed/34544357 http://dx.doi.org/10.1186/s12859-021-04360-9 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Carr, Ewan
Carrière, Mathieu
Michel, Bertrand
Chazal, Frédéric
Iniesta, Raquel
Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
title Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
title_full Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
title_fullStr Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
title_full_unstemmed Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
title_short Identifying homogeneous subgroups of patients and important features: a topological machine learning approach
title_sort identifying homogeneous subgroups of patients and important features: a topological machine learning approach
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8451168/
https://www.ncbi.nlm.nih.gov/pubmed/34544357
http://dx.doi.org/10.1186/s12859-021-04360-9
work_keys_str_mv AT carrewan identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach
AT carrieremathieu identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach
AT michelbertrand identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach
AT chazalfrederic identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach
AT iniestaraquel identifyinghomogeneoussubgroupsofpatientsandimportantfeaturesatopologicalmachinelearningapproach