Cargando…
Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data
Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facili...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6107146/ https://www.ncbi.nlm.nih.gov/pubmed/30138442 http://dx.doi.org/10.1371/journal.pone.0201950 |
_version_ | 1783349919809011712 |
---|---|
author | Konopka, Bogumil M. Lwow, Felicja Owczarz, Magdalena Łaczmański, Łukasz |
author_facet | Konopka, Bogumil M. Lwow, Felicja Owczarz, Magdalena Łaczmański, Łukasz |
author_sort | Konopka, Bogumil M. |
collection | PubMed |
description | Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facilitate data interpretation is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical dataset and perform a case study analysis on a set of 515 patients. The proposed procedure comprises several steps: 1) robust data normalization, 2) outlier detection with Mahalanobis (MD) and robust Mahalanobis distances (rMD), 3) hierarchical clustering with Ward’s algorithm, 4) Principal Component Analysis with biplot vectors. The analyzed set comprised elderly patients that participated in the PolSenior project. Each patient was characterized by over 40 biochemical and socio-geographical attributes. Introductory analysis showed that the case-study dataset comprises two clusters separated along the axis of sex hormone attributes. Further analysis was carried out separately for male and female patients. The most optimal partitioning in the male set resulted in five subgroups. Two of them were related to diseased patients: 1) diabetes and 2) hypogonadism patients. Analysis of the female set suggested that it was more homogeneous than the male dataset. No evidence of pathological patient subgroups was found. In the study we showed that outlier detection with MD and rMD allows not only to identify outliers, but can also assess the heterogeneity of a dataset. The case study proved that our procedure is well suited for identification and visualization of biologically meaningful patient subgroups. |
format | Online Article Text |
id | pubmed-6107146 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-61071462018-08-30 Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data Konopka, Bogumil M. Lwow, Felicja Owczarz, Magdalena Łaczmański, Łukasz PLoS One Research Article Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facilitate data interpretation is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical dataset and perform a case study analysis on a set of 515 patients. The proposed procedure comprises several steps: 1) robust data normalization, 2) outlier detection with Mahalanobis (MD) and robust Mahalanobis distances (rMD), 3) hierarchical clustering with Ward’s algorithm, 4) Principal Component Analysis with biplot vectors. The analyzed set comprised elderly patients that participated in the PolSenior project. Each patient was characterized by over 40 biochemical and socio-geographical attributes. Introductory analysis showed that the case-study dataset comprises two clusters separated along the axis of sex hormone attributes. Further analysis was carried out separately for male and female patients. The most optimal partitioning in the male set resulted in five subgroups. Two of them were related to diseased patients: 1) diabetes and 2) hypogonadism patients. Analysis of the female set suggested that it was more homogeneous than the male dataset. No evidence of pathological patient subgroups was found. In the study we showed that outlier detection with MD and rMD allows not only to identify outliers, but can also assess the heterogeneity of a dataset. The case study proved that our procedure is well suited for identification and visualization of biologically meaningful patient subgroups. Public Library of Science 2018-08-23 /pmc/articles/PMC6107146/ /pubmed/30138442 http://dx.doi.org/10.1371/journal.pone.0201950 Text en © 2018 Konopka et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Konopka, Bogumil M. Lwow, Felicja Owczarz, Magdalena Łaczmański, Łukasz Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data |
title | Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data |
title_full | Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data |
title_fullStr | Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data |
title_full_unstemmed | Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data |
title_short | Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data |
title_sort | exploratory data analysis of a clinical study group: development of a procedure for exploring multidimensional data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6107146/ https://www.ncbi.nlm.nih.gov/pubmed/30138442 http://dx.doi.org/10.1371/journal.pone.0201950 |
work_keys_str_mv | AT konopkabogumilm exploratorydataanalysisofaclinicalstudygroupdevelopmentofaprocedureforexploringmultidimensionaldata AT lwowfelicja exploratorydataanalysisofaclinicalstudygroupdevelopmentofaprocedureforexploringmultidimensionaldata AT owczarzmagdalena exploratorydataanalysisofaclinicalstudygroupdevelopmentofaprocedureforexploringmultidimensionaldata AT łaczmanskiłukasz exploratorydataanalysisofaclinicalstudygroupdevelopmentofaprocedureforexploringmultidimensionaldata |