Cargando…

Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data

Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facili...

Descripción completa

Detalles Bibliográficos
Autores principales: Konopka, Bogumil M., Lwow, Felicja, Owczarz, Magdalena, Łaczmański, Łukasz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6107146/
https://www.ncbi.nlm.nih.gov/pubmed/30138442
http://dx.doi.org/10.1371/journal.pone.0201950
_version_ 1783349919809011712
author Konopka, Bogumil M.
Lwow, Felicja
Owczarz, Magdalena
Łaczmański, Łukasz
author_facet Konopka, Bogumil M.
Lwow, Felicja
Owczarz, Magdalena
Łaczmański, Łukasz
author_sort Konopka, Bogumil M.
collection PubMed
description Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facilitate data interpretation is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical dataset and perform a case study analysis on a set of 515 patients. The proposed procedure comprises several steps: 1) robust data normalization, 2) outlier detection with Mahalanobis (MD) and robust Mahalanobis distances (rMD), 3) hierarchical clustering with Ward’s algorithm, 4) Principal Component Analysis with biplot vectors. The analyzed set comprised elderly patients that participated in the PolSenior project. Each patient was characterized by over 40 biochemical and socio-geographical attributes. Introductory analysis showed that the case-study dataset comprises two clusters separated along the axis of sex hormone attributes. Further analysis was carried out separately for male and female patients. The most optimal partitioning in the male set resulted in five subgroups. Two of them were related to diseased patients: 1) diabetes and 2) hypogonadism patients. Analysis of the female set suggested that it was more homogeneous than the male dataset. No evidence of pathological patient subgroups was found. In the study we showed that outlier detection with MD and rMD allows not only to identify outliers, but can also assess the heterogeneity of a dataset. The case study proved that our procedure is well suited for identification and visualization of biologically meaningful patient subgroups.
format Online
Article
Text
id pubmed-6107146
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-61071462018-08-30 Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data Konopka, Bogumil M. Lwow, Felicja Owczarz, Magdalena Łaczmański, Łukasz PLoS One Research Article Thorough knowledge of the structure of analyzed data allows to form detailed scientific hypotheses and research questions. The structure of data can be revealed with methods for exploratory data analysis. Due to multitude of available methods, selecting those which will work together well and facilitate data interpretation is not an easy task. In this work we present a well fitted set of tools for a complete exploratory analysis of a clinical dataset and perform a case study analysis on a set of 515 patients. The proposed procedure comprises several steps: 1) robust data normalization, 2) outlier detection with Mahalanobis (MD) and robust Mahalanobis distances (rMD), 3) hierarchical clustering with Ward’s algorithm, 4) Principal Component Analysis with biplot vectors. The analyzed set comprised elderly patients that participated in the PolSenior project. Each patient was characterized by over 40 biochemical and socio-geographical attributes. Introductory analysis showed that the case-study dataset comprises two clusters separated along the axis of sex hormone attributes. Further analysis was carried out separately for male and female patients. The most optimal partitioning in the male set resulted in five subgroups. Two of them were related to diseased patients: 1) diabetes and 2) hypogonadism patients. Analysis of the female set suggested that it was more homogeneous than the male dataset. No evidence of pathological patient subgroups was found. In the study we showed that outlier detection with MD and rMD allows not only to identify outliers, but can also assess the heterogeneity of a dataset. The case study proved that our procedure is well suited for identification and visualization of biologically meaningful patient subgroups. Public Library of Science 2018-08-23 /pmc/articles/PMC6107146/ /pubmed/30138442 http://dx.doi.org/10.1371/journal.pone.0201950 Text en © 2018 Konopka et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Konopka, Bogumil M.
Lwow, Felicja
Owczarz, Magdalena
Łaczmański, Łukasz
Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data
title Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data
title_full Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data
title_fullStr Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data
title_full_unstemmed Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data
title_short Exploratory data analysis of a clinical study group: Development of a procedure for exploring multidimensional data
title_sort exploratory data analysis of a clinical study group: development of a procedure for exploring multidimensional data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6107146/
https://www.ncbi.nlm.nih.gov/pubmed/30138442
http://dx.doi.org/10.1371/journal.pone.0201950
work_keys_str_mv AT konopkabogumilm exploratorydataanalysisofaclinicalstudygroupdevelopmentofaprocedureforexploringmultidimensionaldata
AT lwowfelicja exploratorydataanalysisofaclinicalstudygroupdevelopmentofaprocedureforexploringmultidimensionaldata
AT owczarzmagdalena exploratorydataanalysisofaclinicalstudygroupdevelopmentofaprocedureforexploringmultidimensionaldata
AT łaczmanskiłukasz exploratorydataanalysisofaclinicalstudygroupdevelopmentofaprocedureforexploringmultidimensionaldata