Cargando…

Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings

Deep learning methods have recently become the state-of-the-art in a variety of regulatory genomic tasks(1–6) including the prediction of gene expression from genomic DNA. As such, these methods promise to serve as important tools in interpreting the full spectrum of genetic variation observed in pe...

Descripción completa

Detalles Bibliográficos
Autores principales: Sasse, Alexander, Ng, Bernard, Spiro, Anna E., Tasaki, Shinya, Bennett, David A., Gaiteri, Christopher, De Jager, Philip L., Chikina, Maria, Mostafavi, Sara
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055057/
https://www.ncbi.nlm.nih.gov/pubmed/36993652
http://dx.doi.org/10.1101/2023.03.16.532969
_version_ 1785015812933812224
author Sasse, Alexander
Ng, Bernard
Spiro, Anna E.
Tasaki, Shinya
Bennett, David A.
Gaiteri, Christopher
De Jager, Philip L.
Chikina, Maria
Mostafavi, Sara
author_facet Sasse, Alexander
Ng, Bernard
Spiro, Anna E.
Tasaki, Shinya
Bennett, David A.
Gaiteri, Christopher
De Jager, Philip L.
Chikina, Maria
Mostafavi, Sara
author_sort Sasse, Alexander
collection PubMed
description Deep learning methods have recently become the state-of-the-art in a variety of regulatory genomic tasks(1–6) including the prediction of gene expression from genomic DNA. As such, these methods promise to serve as important tools in interpreting the full spectrum of genetic variation observed in personal genomes. Previous evaluation strategies have assessed their predictions of gene expression across genomic regions, however, systematic benchmarking is lacking to assess their predictions across individuals, which would directly evaluates their utility as personal DNA interpreters. We used paired Whole Genome Sequencing and gene expression from 839 individuals in the ROSMAP study(7) to evaluate the ability of current methods to predict gene expression variation across individuals at varied loci. Our approach identifies a limitation of current methods to correctly predict the direction of variant effects. We show that this limitation stems from insufficiently learnt sequence motif grammar, and suggest new model training strategies to improve performance.
format Online
Article
Text
id pubmed-10055057
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-100550572023-03-30 Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings Sasse, Alexander Ng, Bernard Spiro, Anna E. Tasaki, Shinya Bennett, David A. Gaiteri, Christopher De Jager, Philip L. Chikina, Maria Mostafavi, Sara bioRxiv Article Deep learning methods have recently become the state-of-the-art in a variety of regulatory genomic tasks(1–6) including the prediction of gene expression from genomic DNA. As such, these methods promise to serve as important tools in interpreting the full spectrum of genetic variation observed in personal genomes. Previous evaluation strategies have assessed their predictions of gene expression across genomic regions, however, systematic benchmarking is lacking to assess their predictions across individuals, which would directly evaluates their utility as personal DNA interpreters. We used paired Whole Genome Sequencing and gene expression from 839 individuals in the ROSMAP study(7) to evaluate the ability of current methods to predict gene expression variation across individuals at varied loci. Our approach identifies a limitation of current methods to correctly predict the direction of variant effects. We show that this limitation stems from insufficiently learnt sequence motif grammar, and suggest new model training strategies to improve performance. Cold Spring Harbor Laboratory 2023-09-28 /pmc/articles/PMC10055057/ /pubmed/36993652 http://dx.doi.org/10.1101/2023.03.16.532969 Text en https://creativecommons.org/licenses/by-nd/4.0/This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Sasse, Alexander
Ng, Bernard
Spiro, Anna E.
Tasaki, Shinya
Bennett, David A.
Gaiteri, Christopher
De Jager, Philip L.
Chikina, Maria
Mostafavi, Sara
Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings
title Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings
title_full Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings
title_fullStr Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings
title_full_unstemmed Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings
title_short Benchmarking of deep neural networks for predicting personal gene expression from DNA sequence highlights shortcomings
title_sort benchmarking of deep neural networks for predicting personal gene expression from dna sequence highlights shortcomings
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055057/
https://www.ncbi.nlm.nih.gov/pubmed/36993652
http://dx.doi.org/10.1101/2023.03.16.532969
work_keys_str_mv AT sassealexander benchmarkingofdeepneuralnetworksforpredictingpersonalgeneexpressionfromdnasequencehighlightsshortcomings
AT ngbernard benchmarkingofdeepneuralnetworksforpredictingpersonalgeneexpressionfromdnasequencehighlightsshortcomings
AT spiroannae benchmarkingofdeepneuralnetworksforpredictingpersonalgeneexpressionfromdnasequencehighlightsshortcomings
AT tasakishinya benchmarkingofdeepneuralnetworksforpredictingpersonalgeneexpressionfromdnasequencehighlightsshortcomings
AT bennettdavida benchmarkingofdeepneuralnetworksforpredictingpersonalgeneexpressionfromdnasequencehighlightsshortcomings
AT gaiterichristopher benchmarkingofdeepneuralnetworksforpredictingpersonalgeneexpressionfromdnasequencehighlightsshortcomings
AT dejagerphilipl benchmarkingofdeepneuralnetworksforpredictingpersonalgeneexpressionfromdnasequencehighlightsshortcomings
AT chikinamaria benchmarkingofdeepneuralnetworksforpredictingpersonalgeneexpressionfromdnasequencehighlightsshortcomings
AT mostafavisara benchmarkingofdeepneuralnetworksforpredictingpersonalgeneexpressionfromdnasequencehighlightsshortcomings