Cargando…
Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection
Machine Learning (ML) approaches are increasingly being used in biomedical applications. Important challenges of ML include choosing the right algorithm and tuning the parameters for optimal performance. Automated ML (AutoML) methods, such as Tree-based Pipeline Optimization Tool (TPOT), have been d...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9291719/ https://www.ncbi.nlm.nih.gov/pubmed/34310318 http://dx.doi.org/10.1109/TCBB.2021.3099068 |
_version_ | 1784749197664190464 |
---|---|
author | Manduchi, Elisabetta Le, Trang T. Fu, Weixuan Moore, Jason H. |
author_facet | Manduchi, Elisabetta Le, Trang T. Fu, Weixuan Moore, Jason H. |
author_sort | Manduchi, Elisabetta |
collection | PubMed |
description | Machine Learning (ML) approaches are increasingly being used in biomedical applications. Important challenges of ML include choosing the right algorithm and tuning the parameters for optimal performance. Automated ML (AutoML) methods, such as Tree-based Pipeline Optimization Tool (TPOT), have been developed to take some of the guesswork out of ML thus making this technology available to users from more diverse backgrounds. The goals of this study were to assess applicability of TPOT to genomics and to identify combinations of single nucleotide polymorphisms (SNPs) associated with coronary artery disease (CAD), with a focus on genes with high likelihood of being good CAD drug targets. We leveraged public functional genomic resources to group SNPs into biologically meaningful sets to be selected by TPOT. We applied this strategy to data from the UK Biobank, detecting a strikingly recurrent signal stemming from a group of 28 SNPs. Importance analysis of these SNPs uncovered functional relevance of the top SNPs to genes whose association with CAD is supported in the literature and other resources. Furthermore, we employed game-theory based metrics to study SNP contributions to individual-level TPOT predictions and discover distinct clusters of well-predicted CAD cases. The latter indicates a promising approach towards precision medicine. |
format | Online Article Text |
id | pubmed-9291719 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
record_format | MEDLINE/PubMed |
spelling | pubmed-92917192022-07-18 Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection Manduchi, Elisabetta Le, Trang T. Fu, Weixuan Moore, Jason H. IEEE/ACM Trans Comput Biol Bioinform Article Machine Learning (ML) approaches are increasingly being used in biomedical applications. Important challenges of ML include choosing the right algorithm and tuning the parameters for optimal performance. Automated ML (AutoML) methods, such as Tree-based Pipeline Optimization Tool (TPOT), have been developed to take some of the guesswork out of ML thus making this technology available to users from more diverse backgrounds. The goals of this study were to assess applicability of TPOT to genomics and to identify combinations of single nucleotide polymorphisms (SNPs) associated with coronary artery disease (CAD), with a focus on genes with high likelihood of being good CAD drug targets. We leveraged public functional genomic resources to group SNPs into biologically meaningful sets to be selected by TPOT. We applied this strategy to data from the UK Biobank, detecting a strikingly recurrent signal stemming from a group of 28 SNPs. Importance analysis of these SNPs uncovered functional relevance of the top SNPs to genes whose association with CAD is supported in the literature and other resources. Furthermore, we employed game-theory based metrics to study SNP contributions to individual-level TPOT predictions and discover distinct clusters of well-predicted CAD cases. The latter indicates a promising approach towards precision medicine. 2022 2022-06-03 /pmc/articles/PMC9291719/ /pubmed/34310318 http://dx.doi.org/10.1109/TCBB.2021.3099068 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 License. For more information, See https://creativecommons.org/licenses/by/4.0/ |
spellingShingle | Article Manduchi, Elisabetta Le, Trang T. Fu, Weixuan Moore, Jason H. Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection |
title | Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection |
title_full | Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection |
title_fullStr | Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection |
title_full_unstemmed | Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection |
title_short | Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection |
title_sort | genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9291719/ https://www.ncbi.nlm.nih.gov/pubmed/34310318 http://dx.doi.org/10.1109/TCBB.2021.3099068 |
work_keys_str_mv | AT manduchielisabetta geneticanalysisofcoronaryarterydiseaseusingtreebasedautomatedmachinelearninginformedbybiologybasedfeatureselection AT letrangt geneticanalysisofcoronaryarterydiseaseusingtreebasedautomatedmachinelearninginformedbybiologybasedfeatureselection AT fuweixuan geneticanalysisofcoronaryarterydiseaseusingtreebasedautomatedmachinelearninginformedbybiologybasedfeatureselection AT moorejasonh geneticanalysisofcoronaryarterydiseaseusingtreebasedautomatedmachinelearninginformedbybiologybasedfeatureselection |