Cargando…

Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development

Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organi...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Jian, Schor, Ignacio E., Yao, Victoria, Theesfeld, Chandra L., Marco-Ferreres, Raquel, Tadych, Alicja, Furlong, Eileen E. M., Troyanskaya, Olga G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6779412/
https://www.ncbi.nlm.nih.gov/pubmed/31553718
http://dx.doi.org/10.1371/journal.pgen.1008382
_version_ 1783456918649438208
author Zhou, Jian
Schor, Ignacio E.
Yao, Victoria
Theesfeld, Chandra L.
Marco-Ferreres, Raquel
Tadych, Alicja
Furlong, Eileen E. M.
Troyanskaya, Olga G.
author_facet Zhou, Jian
Schor, Ignacio E.
Yao, Victoria
Theesfeld, Chandra L.
Marco-Ferreres, Raquel
Tadych, Alicja
Furlong, Eileen E. M.
Troyanskaya, Olga G.
author_sort Zhou, Jian
collection PubMed
description Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatio-temporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles.
format Online
Article
Text
id pubmed-6779412
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67794122019-10-18 Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development Zhou, Jian Schor, Ignacio E. Yao, Victoria Theesfeld, Chandra L. Marco-Ferreres, Raquel Tadych, Alicja Furlong, Eileen E. M. Troyanskaya, Olga G. PLoS Genet Research Article Comprehensive information on the timing and location of gene expression is fundamental to our understanding of embryonic development and tissue formation. While high-throughput in situ hybridization projects provide invaluable information about developmental gene expression patterns for model organisms like Drosophila, the output of these experiments is primarily qualitative, and a high proportion of protein coding genes and most non-coding genes lack any annotation. Accurate data-centric predictions of spatio-temporal gene expression will therefore complement current in situ hybridization efforts. Here, we applied a machine learning approach by training models on all public gene expression and chromatin data, even from whole-organism experiments, to provide genome-wide, quantitative spatio-temporal predictions for all genes. We developed structured in silico nano-dissection, a computational approach that predicts gene expression in >200 tissue-developmental stages. The algorithm integrates expression signals from a compendium of 6,378 genome-wide expression and chromatin profiling experiments in a cell lineage-aware fashion. We systematically evaluated our performance via cross-validation and experimentally confirmed 22 new predictions for four different embryonic tissues. The model also predicts complex, multi-tissue expression and developmental regulation with high accuracy. We further show the potential of applying these genome-wide predictions to extract tissue specificity signals from non-tissue-dissected experiments, and to prioritize tissues and stages for disease modeling. This resource, together with the exploratory tools are freely available at our webserver http://find.princeton.edu, which provides a valuable tool for a range of applications, from predicting spatio-temporal expression patterns to recognizing tissue signatures from differential gene expression profiles. Public Library of Science 2019-09-25 /pmc/articles/PMC6779412/ /pubmed/31553718 http://dx.doi.org/10.1371/journal.pgen.1008382 Text en © 2019 Zhou et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhou, Jian
Schor, Ignacio E.
Yao, Victoria
Theesfeld, Chandra L.
Marco-Ferreres, Raquel
Tadych, Alicja
Furlong, Eileen E. M.
Troyanskaya, Olga G.
Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
title Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
title_full Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
title_fullStr Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
title_full_unstemmed Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
title_short Accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
title_sort accurate genome-wide predictions of spatio-temporal gene expression during embryonic development
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6779412/
https://www.ncbi.nlm.nih.gov/pubmed/31553718
http://dx.doi.org/10.1371/journal.pgen.1008382
work_keys_str_mv AT zhoujian accurategenomewidepredictionsofspatiotemporalgeneexpressionduringembryonicdevelopment
AT schorignacioe accurategenomewidepredictionsofspatiotemporalgeneexpressionduringembryonicdevelopment
AT yaovictoria accurategenomewidepredictionsofspatiotemporalgeneexpressionduringembryonicdevelopment
AT theesfeldchandral accurategenomewidepredictionsofspatiotemporalgeneexpressionduringembryonicdevelopment
AT marcoferreresraquel accurategenomewidepredictionsofspatiotemporalgeneexpressionduringembryonicdevelopment
AT tadychalicja accurategenomewidepredictionsofspatiotemporalgeneexpressionduringembryonicdevelopment
AT furlongeileenem accurategenomewidepredictionsofspatiotemporalgeneexpressionduringembryonicdevelopment
AT troyanskayaolgag accurategenomewidepredictionsofspatiotemporalgeneexpressionduringembryonicdevelopment