Cargando…
Novel feature selection methods for construction of accurate epigenetic clocks
Epigenetic clocks allow us to accurately predict the age and future health of individuals based on the methylation status of specific CpG sites in the genome and are a powerful tool to measure the effectiveness of longevity interventions. There is a growing need for methods to efficiently construct...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9432708/ https://www.ncbi.nlm.nih.gov/pubmed/35984867 http://dx.doi.org/10.1371/journal.pcbi.1009938 |
_version_ | 1784780445321265152 |
---|---|
author | Li, Adam Mueller, Amber English, Brad Arena, Anthony Vera, Daniel Kane, Alice E. Sinclair, David A. |
author_facet | Li, Adam Mueller, Amber English, Brad Arena, Anthony Vera, Daniel Kane, Alice E. Sinclair, David A. |
author_sort | Li, Adam |
collection | PubMed |
description | Epigenetic clocks allow us to accurately predict the age and future health of individuals based on the methylation status of specific CpG sites in the genome and are a powerful tool to measure the effectiveness of longevity interventions. There is a growing need for methods to efficiently construct epigenetic clocks. The most common approach is to create clocks using elastic net regression modelling of all measured CpG sites, without first identifying specific features or CpGs of interest. The addition of feature selection approaches provides the opportunity to optimise the identification of predictive CpG sites. Here, we apply novel feature selection methods and combinatorial approaches including newly adapted neural networks, genetic algorithms, and ‘chained’ combinations. Human whole blood methylation data of ~470,000 CpGs was used to develop clocks that predict age with R2 correlation scores of greater than 0.73, the most predictive of which uses 35 CpG sites for a R2 correlation score of 0.87. The five most frequent sites across all clocks were modelled to build a clock with a R2 correlation score of 0.83. These two clocks are validated on two external datasets where they maintain excellent predictive accuracy. When compared with three published epigenetic clocks (Hannum, Horvath, Weidner) also applied to these validation datasets, our clocks outperformed all three models. We identified gene regulatory regions associated with selected CpGs as possible targets for future aging studies. Thus, our feature selection algorithms build accurate, generalizable clocks with a low number of CpG sites, providing important tools for the field. |
format | Online Article Text |
id | pubmed-9432708 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-94327082022-09-01 Novel feature selection methods for construction of accurate epigenetic clocks Li, Adam Mueller, Amber English, Brad Arena, Anthony Vera, Daniel Kane, Alice E. Sinclair, David A. PLoS Comput Biol Research Article Epigenetic clocks allow us to accurately predict the age and future health of individuals based on the methylation status of specific CpG sites in the genome and are a powerful tool to measure the effectiveness of longevity interventions. There is a growing need for methods to efficiently construct epigenetic clocks. The most common approach is to create clocks using elastic net regression modelling of all measured CpG sites, without first identifying specific features or CpGs of interest. The addition of feature selection approaches provides the opportunity to optimise the identification of predictive CpG sites. Here, we apply novel feature selection methods and combinatorial approaches including newly adapted neural networks, genetic algorithms, and ‘chained’ combinations. Human whole blood methylation data of ~470,000 CpGs was used to develop clocks that predict age with R2 correlation scores of greater than 0.73, the most predictive of which uses 35 CpG sites for a R2 correlation score of 0.87. The five most frequent sites across all clocks were modelled to build a clock with a R2 correlation score of 0.83. These two clocks are validated on two external datasets where they maintain excellent predictive accuracy. When compared with three published epigenetic clocks (Hannum, Horvath, Weidner) also applied to these validation datasets, our clocks outperformed all three models. We identified gene regulatory regions associated with selected CpGs as possible targets for future aging studies. Thus, our feature selection algorithms build accurate, generalizable clocks with a low number of CpG sites, providing important tools for the field. Public Library of Science 2022-08-19 /pmc/articles/PMC9432708/ /pubmed/35984867 http://dx.doi.org/10.1371/journal.pcbi.1009938 Text en © 2022 Li et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Li, Adam Mueller, Amber English, Brad Arena, Anthony Vera, Daniel Kane, Alice E. Sinclair, David A. Novel feature selection methods for construction of accurate epigenetic clocks |
title | Novel feature selection methods for construction of accurate epigenetic clocks |
title_full | Novel feature selection methods for construction of accurate epigenetic clocks |
title_fullStr | Novel feature selection methods for construction of accurate epigenetic clocks |
title_full_unstemmed | Novel feature selection methods for construction of accurate epigenetic clocks |
title_short | Novel feature selection methods for construction of accurate epigenetic clocks |
title_sort | novel feature selection methods for construction of accurate epigenetic clocks |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9432708/ https://www.ncbi.nlm.nih.gov/pubmed/35984867 http://dx.doi.org/10.1371/journal.pcbi.1009938 |
work_keys_str_mv | AT liadam novelfeatureselectionmethodsforconstructionofaccurateepigeneticclocks AT muelleramber novelfeatureselectionmethodsforconstructionofaccurateepigeneticclocks AT englishbrad novelfeatureselectionmethodsforconstructionofaccurateepigeneticclocks AT arenaanthony novelfeatureselectionmethodsforconstructionofaccurateepigeneticclocks AT veradaniel novelfeatureselectionmethodsforconstructionofaccurateepigeneticclocks AT kanealicee novelfeatureselectionmethodsforconstructionofaccurateepigeneticclocks AT sinclairdavida novelfeatureselectionmethodsforconstructionofaccurateepigeneticclocks |