Cargando…

eClock: An ensemble-based method to accurately predict ages with a biased distribution from DNA methylation data

DNA methylation is closely related to senescence, so it has been used to develop statistical models, called clock models, to predict chronological ages accurately. However, because the training data always have a biased age distribution, the model performance becomes weak for the samples with a smal...

Descripción completa

Detalles Bibliográficos
Autor principal: Liu, Yu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9075636/
https://www.ncbi.nlm.nih.gov/pubmed/35522643
http://dx.doi.org/10.1371/journal.pone.0267349
_version_ 1784701731488137216
author Liu, Yu
author_facet Liu, Yu
author_sort Liu, Yu
collection PubMed
description DNA methylation is closely related to senescence, so it has been used to develop statistical models, called clock models, to predict chronological ages accurately. However, because the training data always have a biased age distribution, the model performance becomes weak for the samples with a small age distribution density. To solve this problem, we developed the R package eClock, which uses a bagging-SMOTE method to adjust the biased distribution and predict age with an ensemble model. Moreover, it also provides a bootstrapped model based on bagging only and a traditional clock model. The performance on three datasets showed that the bagging-SMOTE model significantly improved rare sample age prediction. In addition to model construction, the package also provides other functions such as data visualization and methylation feature conversion to facilitate the research in relevant areas.
format Online
Article
Text
id pubmed-9075636
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-90756362022-05-07 eClock: An ensemble-based method to accurately predict ages with a biased distribution from DNA methylation data Liu, Yu PLoS One Research Article DNA methylation is closely related to senescence, so it has been used to develop statistical models, called clock models, to predict chronological ages accurately. However, because the training data always have a biased age distribution, the model performance becomes weak for the samples with a small age distribution density. To solve this problem, we developed the R package eClock, which uses a bagging-SMOTE method to adjust the biased distribution and predict age with an ensemble model. Moreover, it also provides a bootstrapped model based on bagging only and a traditional clock model. The performance on three datasets showed that the bagging-SMOTE model significantly improved rare sample age prediction. In addition to model construction, the package also provides other functions such as data visualization and methylation feature conversion to facilitate the research in relevant areas. Public Library of Science 2022-05-06 /pmc/articles/PMC9075636/ /pubmed/35522643 http://dx.doi.org/10.1371/journal.pone.0267349 Text en https://creativecommons.org/publicdomain/zero/1.0/This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Liu, Yu
eClock: An ensemble-based method to accurately predict ages with a biased distribution from DNA methylation data
title eClock: An ensemble-based method to accurately predict ages with a biased distribution from DNA methylation data
title_full eClock: An ensemble-based method to accurately predict ages with a biased distribution from DNA methylation data
title_fullStr eClock: An ensemble-based method to accurately predict ages with a biased distribution from DNA methylation data
title_full_unstemmed eClock: An ensemble-based method to accurately predict ages with a biased distribution from DNA methylation data
title_short eClock: An ensemble-based method to accurately predict ages with a biased distribution from DNA methylation data
title_sort eclock: an ensemble-based method to accurately predict ages with a biased distribution from dna methylation data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9075636/
https://www.ncbi.nlm.nih.gov/pubmed/35522643
http://dx.doi.org/10.1371/journal.pone.0267349
work_keys_str_mv AT liuyu eclockanensemblebasedmethodtoaccuratelypredictageswithabiaseddistributionfromdnamethylationdata