Cargando…

Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia

OBJECTIVE: Computing patients’ similarity is of great interest in precision oncology since it supports clustering and subgroup identification, eventually leading to tailored therapies. The availability of large amounts of biomedical data, characterized by large feature sets and sparse content, motiv...

Descripción completa

Detalles Bibliográficos
Autores principales:	Vitali, F, Marini, S, Pala, D, Demartini, A, Montoli, S, Zambelli, A, Bellazzi, R
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6951984/ https://www.ncbi.nlm.nih.gov/pubmed/31984320 http://dx.doi.org/10.1093/jamiaopen/ooy008

_version_	1783486369719386112
author	Vitali, F Marini, S Pala, D Demartini, A Montoli, S Zambelli, A Bellazzi, R
author_facet	Vitali, F Marini, S Pala, D Demartini, A Montoli, S Zambelli, A Bellazzi, R
author_sort	Vitali, F
collection	PubMed
description	OBJECTIVE: Computing patients’ similarity is of great interest in precision oncology since it supports clustering and subgroup identification, eventually leading to tailored therapies. The availability of large amounts of biomedical data, characterized by large feature sets and sparse content, motivates the development of new methods to compute patient similarities able to fuse heterogeneous data sources with the available knowledge. MATERIALS AND METHODS: In this work, we developed a data integration approach based on matrix trifactorization to compute patient similarities by integrating several sources of data and knowledge. We assess the accuracy of the proposed method: (1) on several synthetic data sets which similarity structures are affected by increasing levels of noise and data sparsity, and (2) on a real data set coming from an acute myeloid leukemia (AML) study. The results obtained are finally compared with the ones of traditional similarity calculation methods. RESULTS: In the analysis of the synthetic data set, where the ground truth is known, we measured the capability of reconstructing the correct clusters, while in the AML study we evaluated the Kaplan-Meier curves obtained with the different clusters and measured their statistical difference by means of the log-rank test. In presence of noise and sparse data, our data integration method outperform other techniques, both in the synthetic and in the AML data. DISCUSSION: In case of multiple heterogeneous data sources, a matrix trifactorization technique can successfully fuse all the information in a joint model. We demonstrated how this approach can be efficiently applied to discover meaningful patient similarities and therefore may be considered a reliable data driven strategy for the definition of new research hypothesis for precision oncology. CONCLUSION: The better performance of the proposed approach presents an advantage over previous methods to provide accurate patient similarities supporting precision medicine.
format	Online Article Text
id	pubmed-6951984
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-69519842020-01-24 Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia Vitali, F Marini, S Pala, D Demartini, A Montoli, S Zambelli, A Bellazzi, R JAMIA Open Research and Applications OBJECTIVE: Computing patients’ similarity is of great interest in precision oncology since it supports clustering and subgroup identification, eventually leading to tailored therapies. The availability of large amounts of biomedical data, characterized by large feature sets and sparse content, motivates the development of new methods to compute patient similarities able to fuse heterogeneous data sources with the available knowledge. MATERIALS AND METHODS: In this work, we developed a data integration approach based on matrix trifactorization to compute patient similarities by integrating several sources of data and knowledge. We assess the accuracy of the proposed method: (1) on several synthetic data sets which similarity structures are affected by increasing levels of noise and data sparsity, and (2) on a real data set coming from an acute myeloid leukemia (AML) study. The results obtained are finally compared with the ones of traditional similarity calculation methods. RESULTS: In the analysis of the synthetic data set, where the ground truth is known, we measured the capability of reconstructing the correct clusters, while in the AML study we evaluated the Kaplan-Meier curves obtained with the different clusters and measured their statistical difference by means of the log-rank test. In presence of noise and sparse data, our data integration method outperform other techniques, both in the synthetic and in the AML data. DISCUSSION: In case of multiple heterogeneous data sources, a matrix trifactorization technique can successfully fuse all the information in a joint model. We demonstrated how this approach can be efficiently applied to discover meaningful patient similarities and therefore may be considered a reliable data driven strategy for the definition of new research hypothesis for precision oncology. CONCLUSION: The better performance of the proposed approach presents an advantage over previous methods to provide accurate patient similarities supporting precision medicine. Oxford University Press 2018-05-14 /pmc/articles/PMC6951984/ /pubmed/31984320 http://dx.doi.org/10.1093/jamiaopen/ooy008 Text en © The Author(s) 2018. Published by Oxford University Press on behalf of the American Medical Informatics Association. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Vitali, F Marini, S Pala, D Demartini, A Montoli, S Zambelli, A Bellazzi, R Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia
title	Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia
title_full	Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia
title_fullStr	Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia
title_full_unstemmed	Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia
title_short	Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia
title_sort	patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6951984/ https://www.ncbi.nlm.nih.gov/pubmed/31984320 http://dx.doi.org/10.1093/jamiaopen/ooy008
work_keys_str_mv	AT vitalif patientsimilaritybyjointmatrixtrifactorizationtoidentifysubgroupsinacutemyeloidleukemia AT marinis patientsimilaritybyjointmatrixtrifactorizationtoidentifysubgroupsinacutemyeloidleukemia AT palad patientsimilaritybyjointmatrixtrifactorizationtoidentifysubgroupsinacutemyeloidleukemia AT demartinia patientsimilaritybyjointmatrixtrifactorizationtoidentifysubgroupsinacutemyeloidleukemia AT montolis patientsimilaritybyjointmatrixtrifactorizationtoidentifysubgroupsinacutemyeloidleukemia AT zambellia patientsimilaritybyjointmatrixtrifactorizationtoidentifysubgroupsinacutemyeloidleukemia AT bellazzir patientsimilaritybyjointmatrixtrifactorizationtoidentifysubgroupsinacutemyeloidleukemia

Patient similarity by joint matrix trifactorization to identify subgroups in acute myeloid leukemia

Ejemplares similares