Cargando…

Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci

Genome-wide association studies (GWAS) have revealed thousands of genetic loci that underpin the complex biology of many human traits. However, the strength of GWAS – the ability to detect genetic association by linkage disequilibrium (LD) – is also its limitation. Whilst the ever-increasing study s...

Descripción completa

Detalles Bibliográficos
Autores principales: Nicholls, Hannah L., John, Christopher R., Watson, David S., Munroe, Patricia B., Barnes, Michael R., Cabrera, Claudia P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174742/
https://www.ncbi.nlm.nih.gov/pubmed/32351543
http://dx.doi.org/10.3389/fgene.2020.00350
_version_ 1783524690088689664
author Nicholls, Hannah L.
John, Christopher R.
Watson, David S.
Munroe, Patricia B.
Barnes, Michael R.
Cabrera, Claudia P.
author_facet Nicholls, Hannah L.
John, Christopher R.
Watson, David S.
Munroe, Patricia B.
Barnes, Michael R.
Cabrera, Claudia P.
author_sort Nicholls, Hannah L.
collection PubMed
description Genome-wide association studies (GWAS) have revealed thousands of genetic loci that underpin the complex biology of many human traits. However, the strength of GWAS – the ability to detect genetic association by linkage disequilibrium (LD) – is also its limitation. Whilst the ever-increasing study size and improved design have augmented the power of GWAS to detect effects, differentiation of causal variants or genes from other highly correlated genes associated by LD remains the real challenge. This has severely hindered the biological insights and clinical translation of GWAS findings. Although thousands of disease susceptibility loci have been reported, causal genes at these loci remain elusive. Machine learning (ML) techniques offer an opportunity to dissect the heterogeneity of variant and gene signals in the post-GWAS analysis phase. ML models for GWAS prioritization vary greatly in their complexity, ranging from relatively simple logistic regression approaches to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models, i.e., neural networks. Paired with functional validation, these methods show important promise for clinical translation, providing a strong evidence-based approach to direct post-GWAS research. However, as ML approaches continue to evolve to meet the challenge of causal gene identification, a critical assessment of the underlying methodologies and their applicability to the GWAS prioritization problem is needed. This review investigates the landscape of ML applications in three parts: selected models, input features, and output model performance, with a focus on prioritizations of complex disease associated loci. Overall, we explore the contributions ML has made towards reaching the GWAS end-game with consequent wide-ranging translational impact.
format Online
Article
Text
id pubmed-7174742
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-71747422020-04-29 Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci Nicholls, Hannah L. John, Christopher R. Watson, David S. Munroe, Patricia B. Barnes, Michael R. Cabrera, Claudia P. Front Genet Genetics Genome-wide association studies (GWAS) have revealed thousands of genetic loci that underpin the complex biology of many human traits. However, the strength of GWAS – the ability to detect genetic association by linkage disequilibrium (LD) – is also its limitation. Whilst the ever-increasing study size and improved design have augmented the power of GWAS to detect effects, differentiation of causal variants or genes from other highly correlated genes associated by LD remains the real challenge. This has severely hindered the biological insights and clinical translation of GWAS findings. Although thousands of disease susceptibility loci have been reported, causal genes at these loci remain elusive. Machine learning (ML) techniques offer an opportunity to dissect the heterogeneity of variant and gene signals in the post-GWAS analysis phase. ML models for GWAS prioritization vary greatly in their complexity, ranging from relatively simple logistic regression approaches to more complex ensemble models such as random forests and gradient boosting, as well as deep learning models, i.e., neural networks. Paired with functional validation, these methods show important promise for clinical translation, providing a strong evidence-based approach to direct post-GWAS research. However, as ML approaches continue to evolve to meet the challenge of causal gene identification, a critical assessment of the underlying methodologies and their applicability to the GWAS prioritization problem is needed. This review investigates the landscape of ML applications in three parts: selected models, input features, and output model performance, with a focus on prioritizations of complex disease associated loci. Overall, we explore the contributions ML has made towards reaching the GWAS end-game with consequent wide-ranging translational impact. Frontiers Media S.A. 2020-04-15 /pmc/articles/PMC7174742/ /pubmed/32351543 http://dx.doi.org/10.3389/fgene.2020.00350 Text en Copyright © 2020 Nicholls, John, Watson, Munroe, Barnes and Cabrera. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Nicholls, Hannah L.
John, Christopher R.
Watson, David S.
Munroe, Patricia B.
Barnes, Michael R.
Cabrera, Claudia P.
Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci
title Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci
title_full Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci
title_fullStr Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci
title_full_unstemmed Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci
title_short Reaching the End-Game for GWAS: Machine Learning Approaches for the Prioritization of Complex Disease Loci
title_sort reaching the end-game for gwas: machine learning approaches for the prioritization of complex disease loci
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174742/
https://www.ncbi.nlm.nih.gov/pubmed/32351543
http://dx.doi.org/10.3389/fgene.2020.00350
work_keys_str_mv AT nichollshannahl reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT johnchristopherr reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT watsondavids reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT munroepatriciab reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT barnesmichaelr reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci
AT cabreraclaudiap reachingtheendgameforgwasmachinelearningapproachesfortheprioritizationofcomplexdiseaseloci