Cargando…

An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients

Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and t...

Descripción completa

Detalles Bibliográficos
Autores principales: Raimondi, Daniele, Simm, Jaak, Arany, Adam, Fariselli, Piero, Cleynen, Isabelle, Moreau, Yves
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671306/
https://www.ncbi.nlm.nih.gov/pubmed/33575557
http://dx.doi.org/10.1093/nargab/lqaa011
_version_ 1783610905014042624
author Raimondi, Daniele
Simm, Jaak
Arany, Adam
Fariselli, Piero
Cleynen, Isabelle
Moreau, Yves
author_facet Raimondi, Daniele
Simm, Jaak
Arany, Adam
Fariselli, Piero
Cleynen, Isabelle
Moreau, Yves
author_sort Raimondi, Daniele
collection PubMed
description Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge, the relative scarcity of the data and issues such as batch effects and data heterogeneity, which are confounder factors for machine learning (ML) methods. Here, we propose a method for the exome-based in-silico diagnosis of Crohn’s disease (CD) patients which addresses many of the current methodological issues. First, we devise a rational ML-friendly feature representation for WES data based on the gene mutational burden concept, which is suitable for small sample sizes datasets. Second, we propose a Neural Network (NN) with parameter tying and heavy regularization, in order to limit its complexity and thus the risk of over-fitting. We trained and tested our NN on 3 CD case-controls datasets, comparing the performance with the participants of previous CAGI challenges. We show that, notwithstanding the limited NN complexity, it outperforms the previous approaches. Moreover, we interpret the NN predictions by analyzing the learned patterns at the variant and gene level and investigating the decision process leading to each prediction.
format Online
Article
Text
id pubmed-7671306
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76713062021-02-10 An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients Raimondi, Daniele Simm, Jaak Arany, Adam Fariselli, Piero Cleynen, Isabelle Moreau, Yves NAR Genom Bioinform Methods Article Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge, the relative scarcity of the data and issues such as batch effects and data heterogeneity, which are confounder factors for machine learning (ML) methods. Here, we propose a method for the exome-based in-silico diagnosis of Crohn’s disease (CD) patients which addresses many of the current methodological issues. First, we devise a rational ML-friendly feature representation for WES data based on the gene mutational burden concept, which is suitable for small sample sizes datasets. Second, we propose a Neural Network (NN) with parameter tying and heavy regularization, in order to limit its complexity and thus the risk of over-fitting. We trained and tested our NN on 3 CD case-controls datasets, comparing the performance with the participants of previous CAGI challenges. We show that, notwithstanding the limited NN complexity, it outperforms the previous approaches. Moreover, we interpret the NN predictions by analyzing the learned patterns at the variant and gene level and investigating the decision process leading to each prediction. Oxford University Press 2020-02-21 /pmc/articles/PMC7671306/ /pubmed/33575557 http://dx.doi.org/10.1093/nargab/lqaa011 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Article
Raimondi, Daniele
Simm, Jaak
Arany, Adam
Fariselli, Piero
Cleynen, Isabelle
Moreau, Yves
An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients
title An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients
title_full An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients
title_fullStr An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients
title_full_unstemmed An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients
title_short An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients
title_sort interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of crohn’s disease patients
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671306/
https://www.ncbi.nlm.nih.gov/pubmed/33575557
http://dx.doi.org/10.1093/nargab/lqaa011
work_keys_str_mv AT raimondidaniele aninterpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients
AT simmjaak aninterpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients
AT aranyadam aninterpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients
AT farisellipiero aninterpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients
AT cleynenisabelle aninterpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients
AT moreauyves aninterpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients
AT raimondidaniele interpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients
AT simmjaak interpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients
AT aranyadam interpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients
AT farisellipiero interpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients
AT cleynenisabelle interpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients
AT moreauyves interpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients