Cargando…

DNA methylation biomarker selected by an ensemble machine learning approach predicts mortality risk in an HIV-positive veteran population

Background: With the improved life expectancy of people living with HIV (PLWH), identifying vulnerable subpopulations at high mortality risk is important. Evidences showed that DNA methylation (DNAm) is associated with mortality in non-HIV populations. Here, we established a panel of DNAm biomarkers...

Descripción completa

Detalles Bibliográficos
Autores principales: Shu, Chang, Justice, Amy C., Zhang, Xinyu, Marconi, Vincent C., Hancock, Dana B., Johnson, Eric O., Xu, Ke
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Taylor & Francis 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8216205/
https://www.ncbi.nlm.nih.gov/pubmed/33092459
http://dx.doi.org/10.1080/15592294.2020.1824097
_version_ 1783710372383948800
author Shu, Chang
Justice, Amy C.
Zhang, Xinyu
Marconi, Vincent C.
Hancock, Dana B.
Johnson, Eric O.
Xu, Ke
author_facet Shu, Chang
Justice, Amy C.
Zhang, Xinyu
Marconi, Vincent C.
Hancock, Dana B.
Johnson, Eric O.
Xu, Ke
author_sort Shu, Chang
collection PubMed
description Background: With the improved life expectancy of people living with HIV (PLWH), identifying vulnerable subpopulations at high mortality risk is important. Evidences showed that DNA methylation (DNAm) is associated with mortality in non-HIV populations. Here, we established a panel of DNAm biomarkers that can predict mortality risk among PLWH. Methods: 1,081 HIV-positive participants from the Veterans Ageing Cohort Study (VACS) were divided into training (N = 460), validation (N = 114), and testing (N = 507) sets. VACS index was used as a measure of mortality risk among PLWH. Model training and fine-tuning were conducted using the ensemble method in the training and validation sets and prediction performance was assessed in the testing set. The survival analysis comparing the predicted high and low mortality risk groups and the Gene Ontology enrichment analysis of the predictive CpG sites were performed. Results: We selected a panel of 393 CpGs for the ensemble prediction model that showed excellent performance in predicting high mortality risk with an auROC of 0.809 (95%CI: 0.767,0.851) and a balanced accuracy of 0.653 (95%CI: 0.611, 0.693) in the testing set. The high mortality risk group was significantly associated with 10-year mortality (hazard ratio = 1.79, p = 4E-05) compared with low risk group. These 393 CpGs were located in 280 genes enriched in immune and inflammation response pathways. Conclusions: We identified a panel of DNAm features associated with mortality risk in PLWH. These DNAm features may serve as predictive biomarkers for mortality risk among PLWH. Abbreviations: AUC: Area Under Curve; CI: Confidence interval; DMR: differentially methylated region; DNA: Deoxyribonucleic acid; DNAm: DNA methylation; DAVID: Database for Annotation, Visualization, and Integrated Discovery; EWA: epigenome-wide association; FDR: False discovery rate; FWER: Family-wise error rate; GLMNET: elastic-net-regularized generalized linear models; GO: Gene ontology; HIV: Human immunodeficiency virus; HM450K: Human Methylation 450 K BeadChip; k-NN: k-nearest neighbours; NK: Natural killer; PC: Principal component; PLWH: people living with HIV; QC: Quality control; SVM: Support Vector Machines; VACS: Veterans Ageing Cohort Study; XGBoost: Extreme Gradient Boosting Tree
format Online
Article
Text
id pubmed-8216205
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Taylor & Francis
record_format MEDLINE/PubMed
spelling pubmed-82162052021-07-06 DNA methylation biomarker selected by an ensemble machine learning approach predicts mortality risk in an HIV-positive veteran population Shu, Chang Justice, Amy C. Zhang, Xinyu Marconi, Vincent C. Hancock, Dana B. Johnson, Eric O. Xu, Ke Epigenetics Research Paper Background: With the improved life expectancy of people living with HIV (PLWH), identifying vulnerable subpopulations at high mortality risk is important. Evidences showed that DNA methylation (DNAm) is associated with mortality in non-HIV populations. Here, we established a panel of DNAm biomarkers that can predict mortality risk among PLWH. Methods: 1,081 HIV-positive participants from the Veterans Ageing Cohort Study (VACS) were divided into training (N = 460), validation (N = 114), and testing (N = 507) sets. VACS index was used as a measure of mortality risk among PLWH. Model training and fine-tuning were conducted using the ensemble method in the training and validation sets and prediction performance was assessed in the testing set. The survival analysis comparing the predicted high and low mortality risk groups and the Gene Ontology enrichment analysis of the predictive CpG sites were performed. Results: We selected a panel of 393 CpGs for the ensemble prediction model that showed excellent performance in predicting high mortality risk with an auROC of 0.809 (95%CI: 0.767,0.851) and a balanced accuracy of 0.653 (95%CI: 0.611, 0.693) in the testing set. The high mortality risk group was significantly associated with 10-year mortality (hazard ratio = 1.79, p = 4E-05) compared with low risk group. These 393 CpGs were located in 280 genes enriched in immune and inflammation response pathways. Conclusions: We identified a panel of DNAm features associated with mortality risk in PLWH. These DNAm features may serve as predictive biomarkers for mortality risk among PLWH. Abbreviations: AUC: Area Under Curve; CI: Confidence interval; DMR: differentially methylated region; DNA: Deoxyribonucleic acid; DNAm: DNA methylation; DAVID: Database for Annotation, Visualization, and Integrated Discovery; EWA: epigenome-wide association; FDR: False discovery rate; FWER: Family-wise error rate; GLMNET: elastic-net-regularized generalized linear models; GO: Gene ontology; HIV: Human immunodeficiency virus; HM450K: Human Methylation 450 K BeadChip; k-NN: k-nearest neighbours; NK: Natural killer; PC: Principal component; PLWH: people living with HIV; QC: Quality control; SVM: Support Vector Machines; VACS: Veterans Ageing Cohort Study; XGBoost: Extreme Gradient Boosting Tree Taylor & Francis 2020-10-22 /pmc/articles/PMC8216205/ /pubmed/33092459 http://dx.doi.org/10.1080/15592294.2020.1824097 Text en © 2020 Yale University. Published by Informa UK Limited, trading as Taylor & Francis Group. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way.
spellingShingle Research Paper
Shu, Chang
Justice, Amy C.
Zhang, Xinyu
Marconi, Vincent C.
Hancock, Dana B.
Johnson, Eric O.
Xu, Ke
DNA methylation biomarker selected by an ensemble machine learning approach predicts mortality risk in an HIV-positive veteran population
title DNA methylation biomarker selected by an ensemble machine learning approach predicts mortality risk in an HIV-positive veteran population
title_full DNA methylation biomarker selected by an ensemble machine learning approach predicts mortality risk in an HIV-positive veteran population
title_fullStr DNA methylation biomarker selected by an ensemble machine learning approach predicts mortality risk in an HIV-positive veteran population
title_full_unstemmed DNA methylation biomarker selected by an ensemble machine learning approach predicts mortality risk in an HIV-positive veteran population
title_short DNA methylation biomarker selected by an ensemble machine learning approach predicts mortality risk in an HIV-positive veteran population
title_sort dna methylation biomarker selected by an ensemble machine learning approach predicts mortality risk in an hiv-positive veteran population
topic Research Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8216205/
https://www.ncbi.nlm.nih.gov/pubmed/33092459
http://dx.doi.org/10.1080/15592294.2020.1824097
work_keys_str_mv AT shuchang dnamethylationbiomarkerselectedbyanensemblemachinelearningapproachpredictsmortalityriskinanhivpositiveveteranpopulation
AT justiceamyc dnamethylationbiomarkerselectedbyanensemblemachinelearningapproachpredictsmortalityriskinanhivpositiveveteranpopulation
AT zhangxinyu dnamethylationbiomarkerselectedbyanensemblemachinelearningapproachpredictsmortalityriskinanhivpositiveveteranpopulation
AT marconivincentc dnamethylationbiomarkerselectedbyanensemblemachinelearningapproachpredictsmortalityriskinanhivpositiveveteranpopulation
AT hancockdanab dnamethylationbiomarkerselectedbyanensemblemachinelearningapproachpredictsmortalityriskinanhivpositiveveteranpopulation
AT johnsonerico dnamethylationbiomarkerselectedbyanensemblemachinelearningapproachpredictsmortalityriskinanhivpositiveveteranpopulation
AT xuke dnamethylationbiomarkerselectedbyanensemblemachinelearningapproachpredictsmortalityriskinanhivpositiveveteranpopulation