Cargando…

LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis

Background: The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary...

Descripción completa

Detalles Bibliográficos
Autores principales: Gerussi, Alessio, Verda, Damiano, Cappadona, Claudio, Cristoferi, Laura, Bernasconi, Davide Paolo, Bottaro, Sandro, Carbone, Marco, Muselli, Marco, Invernizzi, Pietro, Asselta, Rosanna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604872/
https://www.ncbi.nlm.nih.gov/pubmed/36294727
http://dx.doi.org/10.3390/jpm12101587
_version_ 1784817923773169664
author Gerussi, Alessio
Verda, Damiano
Cappadona, Claudio
Cristoferi, Laura
Bernasconi, Davide Paolo
Bottaro, Sandro
Carbone, Marco
Muselli, Marco
Invernizzi, Pietro
Asselta, Rosanna
author_facet Gerussi, Alessio
Verda, Damiano
Cappadona, Claudio
Cristoferi, Laura
Bernasconi, Davide Paolo
Bottaro, Sandro
Carbone, Marco
Muselli, Marco
Invernizzi, Pietro
Asselta, Rosanna
author_sort Gerussi, Alessio
collection PubMed
description Background: The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary Cholangitis (PBC). Methods: Genome-wide significant variants identified in subjects of European ancestry in the recently released second international meta-analysis of GWAS in PBC were used as input data. Quality-checked, individual genomic data from two Italian cohorts were used. The ML included the following steps: import of genotype and phenotype data, genetic variant selection, supervised classification of PBC by genotype, generation of “if-then” rules for disease prediction by logic learning machine (LLM), and model validation in a different cohort. Results: The training cohort included 1345 individuals: 444 were PBC cases and 901 were healthy controls. After pre-processing, 41,899 variants entered the analysis. Several configurations of parameters related to feature selection were simulated. The best LLM model reached an Accuracy of 71.7%, a Matthews correlation coefficient of 0.29, a Youden’s value of 0.21, a Sensitivity of 0.28, a Specificity of 0.93, a Positive Predictive Value of 0.66, and a Negative Predictive Value of 0.72. Thirty-eight rules were generated. The rule with the highest covering (19.14) included the following genes: RIN3, KANSL1, TIMMDC1, TNPO3. The validation cohort included 834 individuals: 255 cases and 579 controls. By applying the ruleset derived in the training cohort, the Area under the Curve of the model was 0.73. Conclusions: This study represents the first illustration of an ML model applied to common variants associated with PBC. Our approach is computationally feasible, leverages individual-level data to generate intelligible rules, and can be used for disease prediction in at-risk individuals.
format Online
Article
Text
id pubmed-9604872
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-96048722022-10-27 LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis Gerussi, Alessio Verda, Damiano Cappadona, Claudio Cristoferi, Laura Bernasconi, Davide Paolo Bottaro, Sandro Carbone, Marco Muselli, Marco Invernizzi, Pietro Asselta, Rosanna J Pers Med Article Background: The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary Cholangitis (PBC). Methods: Genome-wide significant variants identified in subjects of European ancestry in the recently released second international meta-analysis of GWAS in PBC were used as input data. Quality-checked, individual genomic data from two Italian cohorts were used. The ML included the following steps: import of genotype and phenotype data, genetic variant selection, supervised classification of PBC by genotype, generation of “if-then” rules for disease prediction by logic learning machine (LLM), and model validation in a different cohort. Results: The training cohort included 1345 individuals: 444 were PBC cases and 901 were healthy controls. After pre-processing, 41,899 variants entered the analysis. Several configurations of parameters related to feature selection were simulated. The best LLM model reached an Accuracy of 71.7%, a Matthews correlation coefficient of 0.29, a Youden’s value of 0.21, a Sensitivity of 0.28, a Specificity of 0.93, a Positive Predictive Value of 0.66, and a Negative Predictive Value of 0.72. Thirty-eight rules were generated. The rule with the highest covering (19.14) included the following genes: RIN3, KANSL1, TIMMDC1, TNPO3. The validation cohort included 834 individuals: 255 cases and 579 controls. By applying the ruleset derived in the training cohort, the Area under the Curve of the model was 0.73. Conclusions: This study represents the first illustration of an ML model applied to common variants associated with PBC. Our approach is computationally feasible, leverages individual-level data to generate intelligible rules, and can be used for disease prediction in at-risk individuals. MDPI 2022-09-26 /pmc/articles/PMC9604872/ /pubmed/36294727 http://dx.doi.org/10.3390/jpm12101587 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Gerussi, Alessio
Verda, Damiano
Cappadona, Claudio
Cristoferi, Laura
Bernasconi, Davide Paolo
Bottaro, Sandro
Carbone, Marco
Muselli, Marco
Invernizzi, Pietro
Asselta, Rosanna
LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
title LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
title_full LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
title_fullStr LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
title_full_unstemmed LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
title_short LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
title_sort llm-pbc: logic learning machine-based explainable rules accurately stratify the genetic risk of primary biliary cholangitis
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604872/
https://www.ncbi.nlm.nih.gov/pubmed/36294727
http://dx.doi.org/10.3390/jpm12101587
work_keys_str_mv AT gerussialessio llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis
AT verdadamiano llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis
AT cappadonaclaudio llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis
AT cristoferilaura llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis
AT bernasconidavidepaolo llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis
AT bottarosandro llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis
AT carbonemarco llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis
AT musellimarco llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis
AT invernizzipietro llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis
AT asseltarosanna llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis
AT llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis