Cargando…
DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning
PURPOSE: Age can be an important clue in uncovering the identity of persons that left biological evidence at crime scenes. With the availability of DNA methylation data, several age prediction models are developed by using statistical and machine learning methods. From epigenetic studies, it has bee...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8803417/ https://www.ncbi.nlm.nih.gov/pubmed/35111213 http://dx.doi.org/10.1155/2022/8393498 |
_version_ | 1784642863498264576 |
---|---|
author | Zaguia, Atef Pandey, Deepak Painuly, Sandeep Pal, Saurabh Kumar Garg, Vivek Kumar Goel, Neelam |
author_facet | Zaguia, Atef Pandey, Deepak Painuly, Sandeep Pal, Saurabh Kumar Garg, Vivek Kumar Goel, Neelam |
author_sort | Zaguia, Atef |
collection | PubMed |
description | PURPOSE: Age can be an important clue in uncovering the identity of persons that left biological evidence at crime scenes. With the availability of DNA methylation data, several age prediction models are developed by using statistical and machine learning methods. From epigenetic studies, it has been demonstrated that there is a close association between aging and DNA methylation. Most of the existing studies focused on healthy samples, whereas diseases may have a significant impact on human age. Therefore, in this article, an age prediction model is proposed using DNA methylation biomarkers for healthy and diseased samples. METHODS: The dataset contains 454 healthy samples and 400 diseased samples from publicly available sources with age (1–89 years old). Six CpG sites are identified from this data having a high correlation with age using Pearson's correlation coefficient. In this work, the age prediction model is developed using four different machine learning techniques, namely, Multiple Linear Regression, Support Vector Regression, Gradient Boosting Regression, and Random Forest Regression. Separate models are designed for healthy and diseased data. The data are split randomly into 80 : 20 ratios for training and testing, respectively. RESULTS: Among all the techniques, the model designed using Random Forest Regression shows the best performance, and Gradient Boosting Regression is the second best model. In the case of healthy samples, the model achieved a MAD of 2.51 years for training data and 4.85 for testing data. Also, for diseased samples, a MAD of 3.83 years is obtained for training and 9.53 years for testing. CONCLUSION: These results showed that the proposed model can predict age for healthy and diseased samples. |
format | Online Article Text |
id | pubmed-8803417 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-88034172022-02-01 DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning Zaguia, Atef Pandey, Deepak Painuly, Sandeep Pal, Saurabh Kumar Garg, Vivek Kumar Goel, Neelam Comput Intell Neurosci Research Article PURPOSE: Age can be an important clue in uncovering the identity of persons that left biological evidence at crime scenes. With the availability of DNA methylation data, several age prediction models are developed by using statistical and machine learning methods. From epigenetic studies, it has been demonstrated that there is a close association between aging and DNA methylation. Most of the existing studies focused on healthy samples, whereas diseases may have a significant impact on human age. Therefore, in this article, an age prediction model is proposed using DNA methylation biomarkers for healthy and diseased samples. METHODS: The dataset contains 454 healthy samples and 400 diseased samples from publicly available sources with age (1–89 years old). Six CpG sites are identified from this data having a high correlation with age using Pearson's correlation coefficient. In this work, the age prediction model is developed using four different machine learning techniques, namely, Multiple Linear Regression, Support Vector Regression, Gradient Boosting Regression, and Random Forest Regression. Separate models are designed for healthy and diseased data. The data are split randomly into 80 : 20 ratios for training and testing, respectively. RESULTS: Among all the techniques, the model designed using Random Forest Regression shows the best performance, and Gradient Boosting Regression is the second best model. In the case of healthy samples, the model achieved a MAD of 2.51 years for training data and 4.85 for testing data. Also, for diseased samples, a MAD of 3.83 years is obtained for training and 9.53 years for testing. CONCLUSION: These results showed that the proposed model can predict age for healthy and diseased samples. Hindawi 2022-01-24 /pmc/articles/PMC8803417/ /pubmed/35111213 http://dx.doi.org/10.1155/2022/8393498 Text en Copyright © 2022 Atef Zaguia et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Zaguia, Atef Pandey, Deepak Painuly, Sandeep Pal, Saurabh Kumar Garg, Vivek Kumar Goel, Neelam DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning |
title | DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning |
title_full | DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning |
title_fullStr | DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning |
title_full_unstemmed | DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning |
title_short | DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning |
title_sort | dna methylation biomarkers-based human age prediction using machine learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8803417/ https://www.ncbi.nlm.nih.gov/pubmed/35111213 http://dx.doi.org/10.1155/2022/8393498 |
work_keys_str_mv | AT zaguiaatef dnamethylationbiomarkersbasedhumanagepredictionusingmachinelearning AT pandeydeepak dnamethylationbiomarkersbasedhumanagepredictionusingmachinelearning AT painulysandeep dnamethylationbiomarkersbasedhumanagepredictionusingmachinelearning AT palsaurabhkumar dnamethylationbiomarkersbasedhumanagepredictionusingmachinelearning AT gargvivekkumar dnamethylationbiomarkersbasedhumanagepredictionusingmachinelearning AT goelneelam dnamethylationbiomarkersbasedhumanagepredictionusingmachinelearning |