Cargando…
LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
Background: The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604872/ https://www.ncbi.nlm.nih.gov/pubmed/36294727 http://dx.doi.org/10.3390/jpm12101587 |
_version_ | 1784817923773169664 |
---|---|
author | Gerussi, Alessio Verda, Damiano Cappadona, Claudio Cristoferi, Laura Bernasconi, Davide Paolo Bottaro, Sandro Carbone, Marco Muselli, Marco Invernizzi, Pietro Asselta, Rosanna |
author_facet | Gerussi, Alessio Verda, Damiano Cappadona, Claudio Cristoferi, Laura Bernasconi, Davide Paolo Bottaro, Sandro Carbone, Marco Muselli, Marco Invernizzi, Pietro Asselta, Rosanna |
author_sort | Gerussi, Alessio |
collection | PubMed |
description | Background: The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary Cholangitis (PBC). Methods: Genome-wide significant variants identified in subjects of European ancestry in the recently released second international meta-analysis of GWAS in PBC were used as input data. Quality-checked, individual genomic data from two Italian cohorts were used. The ML included the following steps: import of genotype and phenotype data, genetic variant selection, supervised classification of PBC by genotype, generation of “if-then” rules for disease prediction by logic learning machine (LLM), and model validation in a different cohort. Results: The training cohort included 1345 individuals: 444 were PBC cases and 901 were healthy controls. After pre-processing, 41,899 variants entered the analysis. Several configurations of parameters related to feature selection were simulated. The best LLM model reached an Accuracy of 71.7%, a Matthews correlation coefficient of 0.29, a Youden’s value of 0.21, a Sensitivity of 0.28, a Specificity of 0.93, a Positive Predictive Value of 0.66, and a Negative Predictive Value of 0.72. Thirty-eight rules were generated. The rule with the highest covering (19.14) included the following genes: RIN3, KANSL1, TIMMDC1, TNPO3. The validation cohort included 834 individuals: 255 cases and 579 controls. By applying the ruleset derived in the training cohort, the Area under the Curve of the model was 0.73. Conclusions: This study represents the first illustration of an ML model applied to common variants associated with PBC. Our approach is computationally feasible, leverages individual-level data to generate intelligible rules, and can be used for disease prediction in at-risk individuals. |
format | Online Article Text |
id | pubmed-9604872 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-96048722022-10-27 LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis Gerussi, Alessio Verda, Damiano Cappadona, Claudio Cristoferi, Laura Bernasconi, Davide Paolo Bottaro, Sandro Carbone, Marco Muselli, Marco Invernizzi, Pietro Asselta, Rosanna J Pers Med Article Background: The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary Cholangitis (PBC). Methods: Genome-wide significant variants identified in subjects of European ancestry in the recently released second international meta-analysis of GWAS in PBC were used as input data. Quality-checked, individual genomic data from two Italian cohorts were used. The ML included the following steps: import of genotype and phenotype data, genetic variant selection, supervised classification of PBC by genotype, generation of “if-then” rules for disease prediction by logic learning machine (LLM), and model validation in a different cohort. Results: The training cohort included 1345 individuals: 444 were PBC cases and 901 were healthy controls. After pre-processing, 41,899 variants entered the analysis. Several configurations of parameters related to feature selection were simulated. The best LLM model reached an Accuracy of 71.7%, a Matthews correlation coefficient of 0.29, a Youden’s value of 0.21, a Sensitivity of 0.28, a Specificity of 0.93, a Positive Predictive Value of 0.66, and a Negative Predictive Value of 0.72. Thirty-eight rules were generated. The rule with the highest covering (19.14) included the following genes: RIN3, KANSL1, TIMMDC1, TNPO3. The validation cohort included 834 individuals: 255 cases and 579 controls. By applying the ruleset derived in the training cohort, the Area under the Curve of the model was 0.73. Conclusions: This study represents the first illustration of an ML model applied to common variants associated with PBC. Our approach is computationally feasible, leverages individual-level data to generate intelligible rules, and can be used for disease prediction in at-risk individuals. MDPI 2022-09-26 /pmc/articles/PMC9604872/ /pubmed/36294727 http://dx.doi.org/10.3390/jpm12101587 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Gerussi, Alessio Verda, Damiano Cappadona, Claudio Cristoferi, Laura Bernasconi, Davide Paolo Bottaro, Sandro Carbone, Marco Muselli, Marco Invernizzi, Pietro Asselta, Rosanna LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis |
title | LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis |
title_full | LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis |
title_fullStr | LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis |
title_full_unstemmed | LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis |
title_short | LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis |
title_sort | llm-pbc: logic learning machine-based explainable rules accurately stratify the genetic risk of primary biliary cholangitis |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604872/ https://www.ncbi.nlm.nih.gov/pubmed/36294727 http://dx.doi.org/10.3390/jpm12101587 |
work_keys_str_mv | AT gerussialessio llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT verdadamiano llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT cappadonaclaudio llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT cristoferilaura llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT bernasconidavidepaolo llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT bottarosandro llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT carbonemarco llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT musellimarco llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT invernizzipietro llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT asseltarosanna llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis |