Cargando…

From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data

In many cases, the unprecedented availability of data provided by high-throughput sequencing has shifted the bottleneck from a data availability issue to a data interpretation issue, thus delaying the promised breakthroughs in genetics and precision medicine, for what concerns Human genetics, and ph...

Descripción completa

Detalles Bibliográficos
Autores principales: Raimondi, Daniele, Corso, Massimiliano, Fariselli, Piero, Moreau, Yves
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8860592/
https://www.ncbi.nlm.nih.gov/pubmed/34792168
http://dx.doi.org/10.1093/nar/gkab1099
_version_ 1784654710722002944
author Raimondi, Daniele
Corso, Massimiliano
Fariselli, Piero
Moreau, Yves
author_facet Raimondi, Daniele
Corso, Massimiliano
Fariselli, Piero
Moreau, Yves
author_sort Raimondi, Daniele
collection PubMed
description In many cases, the unprecedented availability of data provided by high-throughput sequencing has shifted the bottleneck from a data availability issue to a data interpretation issue, thus delaying the promised breakthroughs in genetics and precision medicine, for what concerns Human genetics, and phenotype prediction to improve plant adaptation to climate change and resistance to bioagressors, for what concerns plant sciences. In this paper, we propose a novel Genome Interpretation paradigm, which aims at directly modeling the genotype-to-phenotype relationship, and we focus on A. thaliana since it is the best studied model organism in plant genetics. Our model, called Galiana, is the first end-to-end Neural Network (NN) approach following the genomes in/phenotypes out paradigm and it is trained to predict 288 real-valued Arabidopsis thaliana phenotypes from Whole Genome sequencing data. We show that 75 of these phenotypes are predicted with a Pearson correlation ≥0.4, and are mostly related to flowering traits. We show that our end-to-end NN approach achieves better performances and larger phenotype coverage than models predicting single phenotypes from the GWAS-derived known associated genes. Galiana is also fully interpretable, thanks to the Saliency Maps gradient-based approaches. We followed this interpretation approach to identify 36 novel genes that are likely to be associated with flowering traits, finding evidence for 6 of them in the existing literature.
format Online
Article
Text
id pubmed-8860592
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-88605922022-02-22 From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data Raimondi, Daniele Corso, Massimiliano Fariselli, Piero Moreau, Yves Nucleic Acids Res Methods Online In many cases, the unprecedented availability of data provided by high-throughput sequencing has shifted the bottleneck from a data availability issue to a data interpretation issue, thus delaying the promised breakthroughs in genetics and precision medicine, for what concerns Human genetics, and phenotype prediction to improve plant adaptation to climate change and resistance to bioagressors, for what concerns plant sciences. In this paper, we propose a novel Genome Interpretation paradigm, which aims at directly modeling the genotype-to-phenotype relationship, and we focus on A. thaliana since it is the best studied model organism in plant genetics. Our model, called Galiana, is the first end-to-end Neural Network (NN) approach following the genomes in/phenotypes out paradigm and it is trained to predict 288 real-valued Arabidopsis thaliana phenotypes from Whole Genome sequencing data. We show that 75 of these phenotypes are predicted with a Pearson correlation ≥0.4, and are mostly related to flowering traits. We show that our end-to-end NN approach achieves better performances and larger phenotype coverage than models predicting single phenotypes from the GWAS-derived known associated genes. Galiana is also fully interpretable, thanks to the Saliency Maps gradient-based approaches. We followed this interpretation approach to identify 36 novel genes that are likely to be associated with flowering traits, finding evidence for 6 of them in the existing literature. Oxford University Press 2021-11-18 /pmc/articles/PMC8860592/ /pubmed/34792168 http://dx.doi.org/10.1093/nar/gkab1099 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Raimondi, Daniele
Corso, Massimiliano
Fariselli, Piero
Moreau, Yves
From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data
title From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data
title_full From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data
title_fullStr From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data
title_full_unstemmed From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data
title_short From genotype to phenotype in Arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data
title_sort from genotype to phenotype in arabidopsis thaliana: in-silico genome interpretation predicts 288 phenotypes from sequencing data
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8860592/
https://www.ncbi.nlm.nih.gov/pubmed/34792168
http://dx.doi.org/10.1093/nar/gkab1099
work_keys_str_mv AT raimondidaniele fromgenotypetophenotypeinarabidopsisthalianainsilicogenomeinterpretationpredicts288phenotypesfromsequencingdata
AT corsomassimiliano fromgenotypetophenotypeinarabidopsisthalianainsilicogenomeinterpretationpredicts288phenotypesfromsequencingdata
AT farisellipiero fromgenotypetophenotypeinarabidopsisthalianainsilicogenomeinterpretationpredicts288phenotypesfromsequencingdata
AT moreauyves fromgenotypetophenotypeinarabidopsisthalianainsilicogenomeinterpretationpredicts288phenotypesfromsequencingdata