Cargando…

A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci

Identifying disease-associated susceptibility loci is one of the most pressing and crucial challenges in modeling complex diseases. Existing approaches to biomarker discovery are subject to several limitations including underpowered detection, neglect for variant interactions, and restrictive depend...

Descripción completa

Detalles Bibliográficos
Autores principales: Silva, Princess P., Gaudillo, Joverlyn D., Vilela, Julianne A., Roxas-Villanueva, Ranzivelle Marianne L., Tiangco, Beatrice J., Domingo, Mario R., Albia, Jason R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9499949/
https://www.ncbi.nlm.nih.gov/pubmed/36138111
http://dx.doi.org/10.1038/s41598-022-19708-1
_version_ 1784795110447251456
author Silva, Princess P.
Gaudillo, Joverlyn D.
Vilela, Julianne A.
Roxas-Villanueva, Ranzivelle Marianne L.
Tiangco, Beatrice J.
Domingo, Mario R.
Albia, Jason R.
author_facet Silva, Princess P.
Gaudillo, Joverlyn D.
Vilela, Julianne A.
Roxas-Villanueva, Ranzivelle Marianne L.
Tiangco, Beatrice J.
Domingo, Mario R.
Albia, Jason R.
author_sort Silva, Princess P.
collection PubMed
description Identifying disease-associated susceptibility loci is one of the most pressing and crucial challenges in modeling complex diseases. Existing approaches to biomarker discovery are subject to several limitations including underpowered detection, neglect for variant interactions, and restrictive dependence on prior biological knowledge. Addressing these challenges necessitates more ingenious ways of approaching the “missing heritability” problem. This study aims to discover disease-associated susceptibility loci by augmenting previous genome-wide association study (GWAS) using the integration of random forest and cluster analysis. The proposed integrated framework is applied to a hepatitis B virus surface antigen (HBsAg) seroclearance GWAS data. Multiple cluster analyses were performed on (1) single nucleotide polymorphisms (SNPs) considered significant by GWAS and (2) SNPs with the highest feature importance scores obtained using random forest. The resulting SNP-sets from the cluster analyses were subsequently tested for trait-association. Three susceptibility loci possibly associated with HBsAg seroclearance were identified: (1) SNP rs2399971, (2) gene LINC00578, and (3) locus 11p15. SNP rs2399971 is a biomarker reported in the literature to be significantly associated with HBsAg seroclearance in patients who had received antiviral treatment. The latter two loci are linked with diseases influenced by the presence of hepatitis B virus infection. These findings demonstrate the potential of the proposed integrated framework in identifying disease-associated susceptibility loci. With further validation, results herein could aid in better understanding complex disease etiologies and provide inputs for a more advanced disease risk assessment for patients.
format Online
Article
Text
id pubmed-9499949
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-94999492022-09-24 A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci Silva, Princess P. Gaudillo, Joverlyn D. Vilela, Julianne A. Roxas-Villanueva, Ranzivelle Marianne L. Tiangco, Beatrice J. Domingo, Mario R. Albia, Jason R. Sci Rep Article Identifying disease-associated susceptibility loci is one of the most pressing and crucial challenges in modeling complex diseases. Existing approaches to biomarker discovery are subject to several limitations including underpowered detection, neglect for variant interactions, and restrictive dependence on prior biological knowledge. Addressing these challenges necessitates more ingenious ways of approaching the “missing heritability” problem. This study aims to discover disease-associated susceptibility loci by augmenting previous genome-wide association study (GWAS) using the integration of random forest and cluster analysis. The proposed integrated framework is applied to a hepatitis B virus surface antigen (HBsAg) seroclearance GWAS data. Multiple cluster analyses were performed on (1) single nucleotide polymorphisms (SNPs) considered significant by GWAS and (2) SNPs with the highest feature importance scores obtained using random forest. The resulting SNP-sets from the cluster analyses were subsequently tested for trait-association. Three susceptibility loci possibly associated with HBsAg seroclearance were identified: (1) SNP rs2399971, (2) gene LINC00578, and (3) locus 11p15. SNP rs2399971 is a biomarker reported in the literature to be significantly associated with HBsAg seroclearance in patients who had received antiviral treatment. The latter two loci are linked with diseases influenced by the presence of hepatitis B virus infection. These findings demonstrate the potential of the proposed integrated framework in identifying disease-associated susceptibility loci. With further validation, results herein could aid in better understanding complex disease etiologies and provide inputs for a more advanced disease risk assessment for patients. Nature Publishing Group UK 2022-09-22 /pmc/articles/PMC9499949/ /pubmed/36138111 http://dx.doi.org/10.1038/s41598-022-19708-1 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Silva, Princess P.
Gaudillo, Joverlyn D.
Vilela, Julianne A.
Roxas-Villanueva, Ranzivelle Marianne L.
Tiangco, Beatrice J.
Domingo, Mario R.
Albia, Jason R.
A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci
title A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci
title_full A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci
title_fullStr A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci
title_full_unstemmed A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci
title_short A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci
title_sort machine learning-based snp-set analysis approach for identifying disease-associated susceptibility loci
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9499949/
https://www.ncbi.nlm.nih.gov/pubmed/36138111
http://dx.doi.org/10.1038/s41598-022-19708-1
work_keys_str_mv AT silvaprincessp amachinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT gaudillojoverlynd amachinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT vilelajuliannea amachinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT roxasvillanuevaranzivellemariannel amachinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT tiangcobeatricej amachinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT domingomarior amachinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT albiajasonr amachinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT silvaprincessp machinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT gaudillojoverlynd machinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT vilelajuliannea machinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT roxasvillanuevaranzivellemariannel machinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT tiangcobeatricej machinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT domingomarior machinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci
AT albiajasonr machinelearningbasedsnpsetanalysisapproachforidentifyingdiseaseassociatedsusceptibilityloci