Cargando…
DNA Methylation Biomarkers-Based Human Age Prediction Using Machine Learning
PURPOSE: Age can be an important clue in uncovering the identity of persons that left biological evidence at crime scenes. With the availability of DNA methylation data, several age prediction models are developed by using statistical and machine learning methods. From epigenetic studies, it has bee...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8803417/ https://www.ncbi.nlm.nih.gov/pubmed/35111213 http://dx.doi.org/10.1155/2022/8393498 |
Sumario: | PURPOSE: Age can be an important clue in uncovering the identity of persons that left biological evidence at crime scenes. With the availability of DNA methylation data, several age prediction models are developed by using statistical and machine learning methods. From epigenetic studies, it has been demonstrated that there is a close association between aging and DNA methylation. Most of the existing studies focused on healthy samples, whereas diseases may have a significant impact on human age. Therefore, in this article, an age prediction model is proposed using DNA methylation biomarkers for healthy and diseased samples. METHODS: The dataset contains 454 healthy samples and 400 diseased samples from publicly available sources with age (1–89 years old). Six CpG sites are identified from this data having a high correlation with age using Pearson's correlation coefficient. In this work, the age prediction model is developed using four different machine learning techniques, namely, Multiple Linear Regression, Support Vector Regression, Gradient Boosting Regression, and Random Forest Regression. Separate models are designed for healthy and diseased data. The data are split randomly into 80 : 20 ratios for training and testing, respectively. RESULTS: Among all the techniques, the model designed using Random Forest Regression shows the best performance, and Gradient Boosting Regression is the second best model. In the case of healthy samples, the model achieved a MAD of 2.51 years for training data and 4.85 for testing data. Also, for diseased samples, a MAD of 3.83 years is obtained for training and 9.53 years for testing. CONCLUSION: These results showed that the proposed model can predict age for healthy and diseased samples. |
---|