Cargando…

Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection

Machine Learning (ML) approaches are increasingly being used in biomedical applications. Important challenges of ML include choosing the right algorithm and tuning the parameters for optimal performance. Automated ML (AutoML) methods, such as Tree-based Pipeline Optimization Tool (TPOT), have been d...

Descripción completa

Detalles Bibliográficos
Autores principales: Manduchi, Elisabetta, Le, Trang T., Fu, Weixuan, Moore, Jason H.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9291719/
https://www.ncbi.nlm.nih.gov/pubmed/34310318
http://dx.doi.org/10.1109/TCBB.2021.3099068
_version_ 1784749197664190464
author Manduchi, Elisabetta
Le, Trang T.
Fu, Weixuan
Moore, Jason H.
author_facet Manduchi, Elisabetta
Le, Trang T.
Fu, Weixuan
Moore, Jason H.
author_sort Manduchi, Elisabetta
collection PubMed
description Machine Learning (ML) approaches are increasingly being used in biomedical applications. Important challenges of ML include choosing the right algorithm and tuning the parameters for optimal performance. Automated ML (AutoML) methods, such as Tree-based Pipeline Optimization Tool (TPOT), have been developed to take some of the guesswork out of ML thus making this technology available to users from more diverse backgrounds. The goals of this study were to assess applicability of TPOT to genomics and to identify combinations of single nucleotide polymorphisms (SNPs) associated with coronary artery disease (CAD), with a focus on genes with high likelihood of being good CAD drug targets. We leveraged public functional genomic resources to group SNPs into biologically meaningful sets to be selected by TPOT. We applied this strategy to data from the UK Biobank, detecting a strikingly recurrent signal stemming from a group of 28 SNPs. Importance analysis of these SNPs uncovered functional relevance of the top SNPs to genes whose association with CAD is supported in the literature and other resources. Furthermore, we employed game-theory based metrics to study SNP contributions to individual-level TPOT predictions and discover distinct clusters of well-predicted CAD cases. The latter indicates a promising approach towards precision medicine.
format Online
Article
Text
id pubmed-9291719
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-92917192022-07-18 Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection Manduchi, Elisabetta Le, Trang T. Fu, Weixuan Moore, Jason H. IEEE/ACM Trans Comput Biol Bioinform Article Machine Learning (ML) approaches are increasingly being used in biomedical applications. Important challenges of ML include choosing the right algorithm and tuning the parameters for optimal performance. Automated ML (AutoML) methods, such as Tree-based Pipeline Optimization Tool (TPOT), have been developed to take some of the guesswork out of ML thus making this technology available to users from more diverse backgrounds. The goals of this study were to assess applicability of TPOT to genomics and to identify combinations of single nucleotide polymorphisms (SNPs) associated with coronary artery disease (CAD), with a focus on genes with high likelihood of being good CAD drug targets. We leveraged public functional genomic resources to group SNPs into biologically meaningful sets to be selected by TPOT. We applied this strategy to data from the UK Biobank, detecting a strikingly recurrent signal stemming from a group of 28 SNPs. Importance analysis of these SNPs uncovered functional relevance of the top SNPs to genes whose association with CAD is supported in the literature and other resources. Furthermore, we employed game-theory based metrics to study SNP contributions to individual-level TPOT predictions and discover distinct clusters of well-predicted CAD cases. The latter indicates a promising approach towards precision medicine. 2022 2022-06-03 /pmc/articles/PMC9291719/ /pubmed/34310318 http://dx.doi.org/10.1109/TCBB.2021.3099068 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 License. For more information, See https://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Manduchi, Elisabetta
Le, Trang T.
Fu, Weixuan
Moore, Jason H.
Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection
title Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection
title_full Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection
title_fullStr Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection
title_full_unstemmed Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection
title_short Genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection
title_sort genetic analysis of coronary artery disease using tree-based automated machine learning informed by biology-based feature selection
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9291719/
https://www.ncbi.nlm.nih.gov/pubmed/34310318
http://dx.doi.org/10.1109/TCBB.2021.3099068
work_keys_str_mv AT manduchielisabetta geneticanalysisofcoronaryarterydiseaseusingtreebasedautomatedmachinelearninginformedbybiologybasedfeatureselection
AT letrangt geneticanalysisofcoronaryarterydiseaseusingtreebasedautomatedmachinelearninginformedbybiologybasedfeatureselection
AT fuweixuan geneticanalysisofcoronaryarterydiseaseusingtreebasedautomatedmachinelearninginformedbybiologybasedfeatureselection
AT moorejasonh geneticanalysisofcoronaryarterydiseaseusingtreebasedautomatedmachinelearninginformedbybiologybasedfeatureselection