Cargando…

Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)

The evaluation of pharmacological data using machine learning requires high data quality. Therefore, data preprocessing, that is, cleaning analytical laboratory errors, replacing missing values or outliers, and transforming data adequately before actual data analysis, is crucial. Because current too...

Descripción completa

Detalles Bibliográficos
Autores principales: Malkusch, Sebastian, Hahnefeld, Lisa, Gurke, Robert, Lötsch, Jörn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8592507/
https://www.ncbi.nlm.nih.gov/pubmed/34598320
http://dx.doi.org/10.1002/psp4.12704
_version_ 1784599478573989888
author Malkusch, Sebastian
Hahnefeld, Lisa
Gurke, Robert
Lötsch, Jörn
author_facet Malkusch, Sebastian
Hahnefeld, Lisa
Gurke, Robert
Lötsch, Jörn
author_sort Malkusch, Sebastian
collection PubMed
description The evaluation of pharmacological data using machine learning requires high data quality. Therefore, data preprocessing, that is, cleaning analytical laboratory errors, replacing missing values or outliers, and transforming data adequately before actual data analysis, is crucial. Because current tools available for this purpose often require programming skills, preprocessing tools with graphical user interfaces that can be used interactively are needed. In collaboration between data scientists and experts in bioanalytical diagnostics, a graphical software package for data preprocessing called pguIMP is proposed, which contains a fixed sequence of preprocessing steps to enable reproducible interactive data preprocessing. As an R‐based package, it also allows direct integration into this data science environment without requiring any programming knowledge. The implementation of contemporary data processing methods, including machine‐learning‐based imputation techniques, ensures the generation of corrected and cleaned bioanalytical data sets that preserve data structures such as clusters better than is possible with classical methods. This was evaluated on bioanalytical data sets from lipidomics and drug research using k‐nearest‐neighbors‐based imputation followed by k‐means clustering and density‐based spatial clustering of applications with noise. The R package provides a Shiny‐based web interface designed to be easy to use for non–data analysis experts. It is demonstrated that the spectrum of methods provided is suitable as a standard pipeline for preprocessing bioanalytical data in biomedical research domains. The R package pguIMP is freely available at the comprehensive R archive network (https://cran.r‐project.org/web/packages/pguIMP/index.html).
format Online
Article
Text
id pubmed-8592507
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-85925072021-11-22 Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP) Malkusch, Sebastian Hahnefeld, Lisa Gurke, Robert Lötsch, Jörn CPT Pharmacometrics Syst Pharmacol Research The evaluation of pharmacological data using machine learning requires high data quality. Therefore, data preprocessing, that is, cleaning analytical laboratory errors, replacing missing values or outliers, and transforming data adequately before actual data analysis, is crucial. Because current tools available for this purpose often require programming skills, preprocessing tools with graphical user interfaces that can be used interactively are needed. In collaboration between data scientists and experts in bioanalytical diagnostics, a graphical software package for data preprocessing called pguIMP is proposed, which contains a fixed sequence of preprocessing steps to enable reproducible interactive data preprocessing. As an R‐based package, it also allows direct integration into this data science environment without requiring any programming knowledge. The implementation of contemporary data processing methods, including machine‐learning‐based imputation techniques, ensures the generation of corrected and cleaned bioanalytical data sets that preserve data structures such as clusters better than is possible with classical methods. This was evaluated on bioanalytical data sets from lipidomics and drug research using k‐nearest‐neighbors‐based imputation followed by k‐means clustering and density‐based spatial clustering of applications with noise. The R package provides a Shiny‐based web interface designed to be easy to use for non–data analysis experts. It is demonstrated that the spectrum of methods provided is suitable as a standard pipeline for preprocessing bioanalytical data in biomedical research domains. The R package pguIMP is freely available at the comprehensive R archive network (https://cran.r‐project.org/web/packages/pguIMP/index.html). John Wiley and Sons Inc. 2021-10-01 2021-11 /pmc/articles/PMC8592507/ /pubmed/34598320 http://dx.doi.org/10.1002/psp4.12704 Text en © 2021 The Authors. CPT: Pharmacometrics & Systems Pharmacology published by Wiley Periodicals LLC on behalf of American Society for Clinical Pharmacology and Therapeutics. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle Research
Malkusch, Sebastian
Hahnefeld, Lisa
Gurke, Robert
Lötsch, Jörn
Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)
title Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)
title_full Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)
title_fullStr Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)
title_full_unstemmed Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)
title_short Visually guided preprocessing of bioanalytical laboratory data using an interactive R notebook (pguIMP)
title_sort visually guided preprocessing of bioanalytical laboratory data using an interactive r notebook (pguimp)
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8592507/
https://www.ncbi.nlm.nih.gov/pubmed/34598320
http://dx.doi.org/10.1002/psp4.12704
work_keys_str_mv AT malkuschsebastian visuallyguidedpreprocessingofbioanalyticallaboratorydatausinganinteractivernotebookpguimp
AT hahnefeldlisa visuallyguidedpreprocessingofbioanalyticallaboratorydatausinganinteractivernotebookpguimp
AT gurkerobert visuallyguidedpreprocessingofbioanalyticallaboratorydatausinganinteractivernotebookpguimp
AT lotschjorn visuallyguidedpreprocessingofbioanalyticallaboratorydatausinganinteractivernotebookpguimp