Cargando…
An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients
Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and t...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671306/ https://www.ncbi.nlm.nih.gov/pubmed/33575557 http://dx.doi.org/10.1093/nargab/lqaa011 |
_version_ | 1783610905014042624 |
---|---|
author | Raimondi, Daniele Simm, Jaak Arany, Adam Fariselli, Piero Cleynen, Isabelle Moreau, Yves |
author_facet | Raimondi, Daniele Simm, Jaak Arany, Adam Fariselli, Piero Cleynen, Isabelle Moreau, Yves |
author_sort | Raimondi, Daniele |
collection | PubMed |
description | Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge, the relative scarcity of the data and issues such as batch effects and data heterogeneity, which are confounder factors for machine learning (ML) methods. Here, we propose a method for the exome-based in-silico diagnosis of Crohn’s disease (CD) patients which addresses many of the current methodological issues. First, we devise a rational ML-friendly feature representation for WES data based on the gene mutational burden concept, which is suitable for small sample sizes datasets. Second, we propose a Neural Network (NN) with parameter tying and heavy regularization, in order to limit its complexity and thus the risk of over-fitting. We trained and tested our NN on 3 CD case-controls datasets, comparing the performance with the participants of previous CAGI challenges. We show that, notwithstanding the limited NN complexity, it outperforms the previous approaches. Moreover, we interpret the NN predictions by analyzing the learned patterns at the variant and gene level and investigating the decision process leading to each prediction. |
format | Online Article Text |
id | pubmed-7671306 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-76713062021-02-10 An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients Raimondi, Daniele Simm, Jaak Arany, Adam Fariselli, Piero Cleynen, Isabelle Moreau, Yves NAR Genom Bioinform Methods Article Whole exome sequencing (WES) data are allowing researchers to pinpoint the causes of many Mendelian disorders. In time, sequencing data will be crucial to solve the genome interpretation puzzle, which aims at uncovering the genotype-to-phenotype relationship, but for the moment many conceptual and technical problems need to be addressed. In particular, very few attempts at the in-silico diagnosis of oligo-to-polygenic disorders have been made so far, due to the complexity of the challenge, the relative scarcity of the data and issues such as batch effects and data heterogeneity, which are confounder factors for machine learning (ML) methods. Here, we propose a method for the exome-based in-silico diagnosis of Crohn’s disease (CD) patients which addresses many of the current methodological issues. First, we devise a rational ML-friendly feature representation for WES data based on the gene mutational burden concept, which is suitable for small sample sizes datasets. Second, we propose a Neural Network (NN) with parameter tying and heavy regularization, in order to limit its complexity and thus the risk of over-fitting. We trained and tested our NN on 3 CD case-controls datasets, comparing the performance with the participants of previous CAGI challenges. We show that, notwithstanding the limited NN complexity, it outperforms the previous approaches. Moreover, we interpret the NN predictions by analyzing the learned patterns at the variant and gene level and investigating the decision process leading to each prediction. Oxford University Press 2020-02-21 /pmc/articles/PMC7671306/ /pubmed/33575557 http://dx.doi.org/10.1093/nargab/lqaa011 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Article Raimondi, Daniele Simm, Jaak Arany, Adam Fariselli, Piero Cleynen, Isabelle Moreau, Yves An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients |
title | An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients |
title_full | An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients |
title_fullStr | An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients |
title_full_unstemmed | An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients |
title_short | An interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of Crohn’s disease patients |
title_sort | interpretable low-complexity machine learning framework for robust exome-based in-silico diagnosis of crohn’s disease patients |
topic | Methods Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671306/ https://www.ncbi.nlm.nih.gov/pubmed/33575557 http://dx.doi.org/10.1093/nargab/lqaa011 |
work_keys_str_mv | AT raimondidaniele aninterpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients AT simmjaak aninterpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients AT aranyadam aninterpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients AT farisellipiero aninterpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients AT cleynenisabelle aninterpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients AT moreauyves aninterpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients AT raimondidaniele interpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients AT simmjaak interpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients AT aranyadam interpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients AT farisellipiero interpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients AT cleynenisabelle interpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients AT moreauyves interpretablelowcomplexitymachinelearningframeworkforrobustexomebasedinsilicodiagnosisofcrohnsdiseasepatients |