Cargando…

Discovering Subgroups of Patients from DNA Copy Number Data Using NMF on Compacted Matrices

In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA cop...

Descripción completa

Detalles Bibliográficos
Autores principales: de Campos, Cassio P., Rancoita, Paola M. V., Kwee, Ivo, Zucca, Emanuele, Zaffalon, Marco, Bertoni, Francesco
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3835832/
https://www.ncbi.nlm.nih.gov/pubmed/24278162
http://dx.doi.org/10.1371/journal.pone.0079720
_version_ 1782292214879092736
author de Campos, Cassio P.
Rancoita, Paola M. V.
Kwee, Ivo
Zucca, Emanuele
Zaffalon, Marco
Bertoni, Francesco
author_facet de Campos, Cassio P.
Rancoita, Paola M. V.
Kwee, Ivo
Zucca, Emanuele
Zaffalon, Marco
Bertoni, Francesco
author_sort de Campos, Cassio P.
collection PubMed
description In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases.
format Online
Article
Text
id pubmed-3835832
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38358322013-11-25 Discovering Subgroups of Patients from DNA Copy Number Data Using NMF on Compacted Matrices de Campos, Cassio P. Rancoita, Paola M. V. Kwee, Ivo Zucca, Emanuele Zaffalon, Marco Bertoni, Francesco PLoS One Research Article In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases. Public Library of Science 2013-11-20 /pmc/articles/PMC3835832/ /pubmed/24278162 http://dx.doi.org/10.1371/journal.pone.0079720 Text en © 2013 de Campos et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
de Campos, Cassio P.
Rancoita, Paola M. V.
Kwee, Ivo
Zucca, Emanuele
Zaffalon, Marco
Bertoni, Francesco
Discovering Subgroups of Patients from DNA Copy Number Data Using NMF on Compacted Matrices
title Discovering Subgroups of Patients from DNA Copy Number Data Using NMF on Compacted Matrices
title_full Discovering Subgroups of Patients from DNA Copy Number Data Using NMF on Compacted Matrices
title_fullStr Discovering Subgroups of Patients from DNA Copy Number Data Using NMF on Compacted Matrices
title_full_unstemmed Discovering Subgroups of Patients from DNA Copy Number Data Using NMF on Compacted Matrices
title_short Discovering Subgroups of Patients from DNA Copy Number Data Using NMF on Compacted Matrices
title_sort discovering subgroups of patients from dna copy number data using nmf on compacted matrices
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3835832/
https://www.ncbi.nlm.nih.gov/pubmed/24278162
http://dx.doi.org/10.1371/journal.pone.0079720
work_keys_str_mv AT decamposcassiop discoveringsubgroupsofpatientsfromdnacopynumberdatausingnmfoncompactedmatrices
AT rancoitapaolamv discoveringsubgroupsofpatientsfromdnacopynumberdatausingnmfoncompactedmatrices
AT kweeivo discoveringsubgroupsofpatientsfromdnacopynumberdatausingnmfoncompactedmatrices
AT zuccaemanuele discoveringsubgroupsofpatientsfromdnacopynumberdatausingnmfoncompactedmatrices
AT zaffalonmarco discoveringsubgroupsofpatientsfromdnacopynumberdatausingnmfoncompactedmatrices
AT bertonifrancesco discoveringsubgroupsofpatientsfromdnacopynumberdatausingnmfoncompactedmatrices