Cargando…

-Omics biomarker identification pipeline for translational medicine

BACKGROUND: Translational medicine (TM) is an emerging domain that aims to facilitate medical or biological advances efficiently from the scientist to the clinician. Central to the TM vision is to narrow the gap between basic science and applied science in terms of time, cost and early diagnosis of...

Descripción completa

Detalles Bibliográficos
Autores principales: Bravo-Merodio, Laura, Williams, John A., Gkoutos, Georgios V., Acharjee, Animesh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6518609/
https://www.ncbi.nlm.nih.gov/pubmed/31088492
http://dx.doi.org/10.1186/s12967-019-1912-5
_version_ 1783418487070261248
author Bravo-Merodio, Laura
Williams, John A.
Gkoutos, Georgios V.
Acharjee, Animesh
author_facet Bravo-Merodio, Laura
Williams, John A.
Gkoutos, Georgios V.
Acharjee, Animesh
author_sort Bravo-Merodio, Laura
collection PubMed
description BACKGROUND: Translational medicine (TM) is an emerging domain that aims to facilitate medical or biological advances efficiently from the scientist to the clinician. Central to the TM vision is to narrow the gap between basic science and applied science in terms of time, cost and early diagnosis of the disease state. Biomarker identification is one of the main challenges within TM. The identification of disease biomarkers from -omics data will not only help the stratification of diverse patient cohorts but will also provide early diagnostic information which could improve patient management and potentially prevent adverse outcomes. However, biomarker identification needs to be robust and reproducible. Hence a robust unbiased computational framework that can help clinicians identify those biomarkers is necessary. METHODS: We developed a pipeline (workflow) that includes two different supervised classification techniques based on regularization methods to identify biomarkers from -omics or other high dimension clinical datasets. The pipeline includes several important steps such as quality control and stability of selected biomarkers. The process takes input files (outcome and independent variables or -omics data) and pre-processes (normalization, missing values) them. After a random division of samples into training and test sets, Least Absolute Shrinkage and Selection Operator and Elastic Net feature selection methods are applied to identify the most important features representing potential biomarker candidates. The penalization parameters are optimised using 10-fold cross validation and the process undergoes 100 iterations and a combinatorial analysis to select the best performing multivariate model. An empirical unbiased assessment of their quality as biomarkers for clinical use is performed through a Receiver Operating Characteristic curve and its Area Under the Curve analysis on both permuted and real data for 1000 different randomized training and test sets. We validated this pipeline against previously published biomarkers. RESULTS: We applied this pipeline to three different datasets with previously published biomarkers: lipidomics data by Acharjee et al. (Metabolomics 13:25, 2017) and transcriptomics data by Rajamani and Bhasin (Genome Med 8:38, 2016) and Mills et al. (Blood 114:1063–1072, 2009). Our results demonstrate that our method was able to identify both previously published biomarkers as well as new variables that add value to the published results. CONCLUSIONS: We developed a robust pipeline to identify clinically relevant biomarkers that can be applied to different -omics datasets. Such identification reveals potentially novel drug targets and can be used as a part of a machine-learning based patient stratification framework in the translational medicine settings. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12967-019-1912-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6518609
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-65186092019-05-21 -Omics biomarker identification pipeline for translational medicine Bravo-Merodio, Laura Williams, John A. Gkoutos, Georgios V. Acharjee, Animesh J Transl Med Research BACKGROUND: Translational medicine (TM) is an emerging domain that aims to facilitate medical or biological advances efficiently from the scientist to the clinician. Central to the TM vision is to narrow the gap between basic science and applied science in terms of time, cost and early diagnosis of the disease state. Biomarker identification is one of the main challenges within TM. The identification of disease biomarkers from -omics data will not only help the stratification of diverse patient cohorts but will also provide early diagnostic information which could improve patient management and potentially prevent adverse outcomes. However, biomarker identification needs to be robust and reproducible. Hence a robust unbiased computational framework that can help clinicians identify those biomarkers is necessary. METHODS: We developed a pipeline (workflow) that includes two different supervised classification techniques based on regularization methods to identify biomarkers from -omics or other high dimension clinical datasets. The pipeline includes several important steps such as quality control and stability of selected biomarkers. The process takes input files (outcome and independent variables or -omics data) and pre-processes (normalization, missing values) them. After a random division of samples into training and test sets, Least Absolute Shrinkage and Selection Operator and Elastic Net feature selection methods are applied to identify the most important features representing potential biomarker candidates. The penalization parameters are optimised using 10-fold cross validation and the process undergoes 100 iterations and a combinatorial analysis to select the best performing multivariate model. An empirical unbiased assessment of their quality as biomarkers for clinical use is performed through a Receiver Operating Characteristic curve and its Area Under the Curve analysis on both permuted and real data for 1000 different randomized training and test sets. We validated this pipeline against previously published biomarkers. RESULTS: We applied this pipeline to three different datasets with previously published biomarkers: lipidomics data by Acharjee et al. (Metabolomics 13:25, 2017) and transcriptomics data by Rajamani and Bhasin (Genome Med 8:38, 2016) and Mills et al. (Blood 114:1063–1072, 2009). Our results demonstrate that our method was able to identify both previously published biomarkers as well as new variables that add value to the published results. CONCLUSIONS: We developed a robust pipeline to identify clinically relevant biomarkers that can be applied to different -omics datasets. Such identification reveals potentially novel drug targets and can be used as a part of a machine-learning based patient stratification framework in the translational medicine settings. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12967-019-1912-5) contains supplementary material, which is available to authorized users. BioMed Central 2019-05-14 /pmc/articles/PMC6518609/ /pubmed/31088492 http://dx.doi.org/10.1186/s12967-019-1912-5 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Bravo-Merodio, Laura
Williams, John A.
Gkoutos, Georgios V.
Acharjee, Animesh
-Omics biomarker identification pipeline for translational medicine
title -Omics biomarker identification pipeline for translational medicine
title_full -Omics biomarker identification pipeline for translational medicine
title_fullStr -Omics biomarker identification pipeline for translational medicine
title_full_unstemmed -Omics biomarker identification pipeline for translational medicine
title_short -Omics biomarker identification pipeline for translational medicine
title_sort -omics biomarker identification pipeline for translational medicine
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6518609/
https://www.ncbi.nlm.nih.gov/pubmed/31088492
http://dx.doi.org/10.1186/s12967-019-1912-5
work_keys_str_mv AT bravomerodiolaura omicsbiomarkeridentificationpipelinefortranslationalmedicine
AT williamsjohna omicsbiomarkeridentificationpipelinefortranslationalmedicine
AT gkoutosgeorgiosv omicsbiomarkeridentificationpipelinefortranslationalmedicine
AT acharjeeanimesh omicsbiomarkeridentificationpipelinefortranslationalmedicine