Cargando…

LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis

Background: The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gerussi, Alessio, Verda, Damiano, Cappadona, Claudio, Cristoferi, Laura, Bernasconi, Davide Paolo, Bottaro, Sandro, Carbone, Marco, Muselli, Marco, Invernizzi, Pietro, Asselta, Rosanna
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604872/ https://www.ncbi.nlm.nih.gov/pubmed/36294727 http://dx.doi.org/10.3390/jpm12101587

_version_	1784817923773169664
author	Gerussi, Alessio Verda, Damiano Cappadona, Claudio Cristoferi, Laura Bernasconi, Davide Paolo Bottaro, Sandro Carbone, Marco Muselli, Marco Invernizzi, Pietro Asselta, Rosanna
author_facet	Gerussi, Alessio Verda, Damiano Cappadona, Claudio Cristoferi, Laura Bernasconi, Davide Paolo Bottaro, Sandro Carbone, Marco Muselli, Marco Invernizzi, Pietro Asselta, Rosanna
author_sort	Gerussi, Alessio
collection	PubMed
description	Background: The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary Cholangitis (PBC). Methods: Genome-wide significant variants identified in subjects of European ancestry in the recently released second international meta-analysis of GWAS in PBC were used as input data. Quality-checked, individual genomic data from two Italian cohorts were used. The ML included the following steps: import of genotype and phenotype data, genetic variant selection, supervised classification of PBC by genotype, generation of “if-then” rules for disease prediction by logic learning machine (LLM), and model validation in a different cohort. Results: The training cohort included 1345 individuals: 444 were PBC cases and 901 were healthy controls. After pre-processing, 41,899 variants entered the analysis. Several configurations of parameters related to feature selection were simulated. The best LLM model reached an Accuracy of 71.7%, a Matthews correlation coefficient of 0.29, a Youden’s value of 0.21, a Sensitivity of 0.28, a Specificity of 0.93, a Positive Predictive Value of 0.66, and a Negative Predictive Value of 0.72. Thirty-eight rules were generated. The rule with the highest covering (19.14) included the following genes: RIN3, KANSL1, TIMMDC1, TNPO3. The validation cohort included 834 individuals: 255 cases and 579 controls. By applying the ruleset derived in the training cohort, the Area under the Curve of the model was 0.73. Conclusions: This study represents the first illustration of an ML model applied to common variants associated with PBC. Our approach is computationally feasible, leverages individual-level data to generate intelligible rules, and can be used for disease prediction in at-risk individuals.
format	Online Article Text
id	pubmed-9604872
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-96048722022-10-27 LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis Gerussi, Alessio Verda, Damiano Cappadona, Claudio Cristoferi, Laura Bernasconi, Davide Paolo Bottaro, Sandro Carbone, Marco Muselli, Marco Invernizzi, Pietro Asselta, Rosanna J Pers Med Article Background: The application of Machine Learning (ML) to genetic individual-level data represents a foreseeable advancement for the field, which is still in its infancy. Here, we aimed to evaluate the feasibility and accuracy of an ML-based model for disease risk prediction applied to Primary Biliary Cholangitis (PBC). Methods: Genome-wide significant variants identified in subjects of European ancestry in the recently released second international meta-analysis of GWAS in PBC were used as input data. Quality-checked, individual genomic data from two Italian cohorts were used. The ML included the following steps: import of genotype and phenotype data, genetic variant selection, supervised classification of PBC by genotype, generation of “if-then” rules for disease prediction by logic learning machine (LLM), and model validation in a different cohort. Results: The training cohort included 1345 individuals: 444 were PBC cases and 901 were healthy controls. After pre-processing, 41,899 variants entered the analysis. Several configurations of parameters related to feature selection were simulated. The best LLM model reached an Accuracy of 71.7%, a Matthews correlation coefficient of 0.29, a Youden’s value of 0.21, a Sensitivity of 0.28, a Specificity of 0.93, a Positive Predictive Value of 0.66, and a Negative Predictive Value of 0.72. Thirty-eight rules were generated. The rule with the highest covering (19.14) included the following genes: RIN3, KANSL1, TIMMDC1, TNPO3. The validation cohort included 834 individuals: 255 cases and 579 controls. By applying the ruleset derived in the training cohort, the Area under the Curve of the model was 0.73. Conclusions: This study represents the first illustration of an ML model applied to common variants associated with PBC. Our approach is computationally feasible, leverages individual-level data to generate intelligible rules, and can be used for disease prediction in at-risk individuals. MDPI 2022-09-26 /pmc/articles/PMC9604872/ /pubmed/36294727 http://dx.doi.org/10.3390/jpm12101587 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Gerussi, Alessio Verda, Damiano Cappadona, Claudio Cristoferi, Laura Bernasconi, Davide Paolo Bottaro, Sandro Carbone, Marco Muselli, Marco Invernizzi, Pietro Asselta, Rosanna LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
title	LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
title_full	LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
title_fullStr	LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
title_full_unstemmed	LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
title_short	LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis
title_sort	llm-pbc: logic learning machine-based explainable rules accurately stratify the genetic risk of primary biliary cholangitis
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9604872/ https://www.ncbi.nlm.nih.gov/pubmed/36294727 http://dx.doi.org/10.3390/jpm12101587
work_keys_str_mv	AT gerussialessio llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT verdadamiano llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT cappadonaclaudio llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT cristoferilaura llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT bernasconidavidepaolo llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT bottarosandro llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT carbonemarco llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT musellimarco llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT invernizzipietro llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT asseltarosanna llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis AT llmpbclogiclearningmachinebasedexplainablerulesaccuratelystratifythegeneticriskofprimarybiliarycholangitis

LLM-PBC: Logic Learning Machine-Based Explainable Rules Accurately Stratify the Genetic Risk of Primary Biliary Cholangitis

Ejemplares similares