Cargando…

Distributed gene expression modelling for exploring variability in epigenetic function

BACKGROUND: Predictive gene expression modelling is an important tool in computational biology due to the volume of high-throughput sequencing data generated by recent consortia. However, the scope of previous studies has been restricted to a small set of cell-lines or experimental conditions due an...

Descripción completa

Detalles Bibliográficos
Autores principales: Budden, David M., Crampin, Edmund J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5097851/
https://www.ncbi.nlm.nih.gov/pubmed/27816056
http://dx.doi.org/10.1186/s12859-016-1313-1
_version_ 1782465679819014144
author Budden, David M.
Crampin, Edmund J.
author_facet Budden, David M.
Crampin, Edmund J.
author_sort Budden, David M.
collection PubMed
description BACKGROUND: Predictive gene expression modelling is an important tool in computational biology due to the volume of high-throughput sequencing data generated by recent consortia. However, the scope of previous studies has been restricted to a small set of cell-lines or experimental conditions due an inability to leverage distributed processing architectures for large, sharded data-sets. RESULTS: We present a distributed implementation of gene expression modelling using the MapReduce paradigm and prove that performance improves as a linear function of available processor cores. We then leverage the computational efficiency of this framework to explore the variability of epigenetic function across fifty histone modification data-sets from variety of cancerous and non-cancerous cell-lines. CONCLUSIONS: We demonstrate that the genome-wide relationships between histone modifications and mRNA transcription are lineage, tissue and karyotype-invariant, and that models trained on matched -omics data from non-cancerous cell-lines are able to predict cancerous expression with equivalent genome-wide fidelity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1313-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5097851
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50978512016-11-08 Distributed gene expression modelling for exploring variability in epigenetic function Budden, David M. Crampin, Edmund J. BMC Bioinformatics Research Article BACKGROUND: Predictive gene expression modelling is an important tool in computational biology due to the volume of high-throughput sequencing data generated by recent consortia. However, the scope of previous studies has been restricted to a small set of cell-lines or experimental conditions due an inability to leverage distributed processing architectures for large, sharded data-sets. RESULTS: We present a distributed implementation of gene expression modelling using the MapReduce paradigm and prove that performance improves as a linear function of available processor cores. We then leverage the computational efficiency of this framework to explore the variability of epigenetic function across fifty histone modification data-sets from variety of cancerous and non-cancerous cell-lines. CONCLUSIONS: We demonstrate that the genome-wide relationships between histone modifications and mRNA transcription are lineage, tissue and karyotype-invariant, and that models trained on matched -omics data from non-cancerous cell-lines are able to predict cancerous expression with equivalent genome-wide fidelity. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1313-1) contains supplementary material, which is available to authorized users. BioMed Central 2016-11-05 /pmc/articles/PMC5097851/ /pubmed/27816056 http://dx.doi.org/10.1186/s12859-016-1313-1 Text en © The Author(s) 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Budden, David M.
Crampin, Edmund J.
Distributed gene expression modelling for exploring variability in epigenetic function
title Distributed gene expression modelling for exploring variability in epigenetic function
title_full Distributed gene expression modelling for exploring variability in epigenetic function
title_fullStr Distributed gene expression modelling for exploring variability in epigenetic function
title_full_unstemmed Distributed gene expression modelling for exploring variability in epigenetic function
title_short Distributed gene expression modelling for exploring variability in epigenetic function
title_sort distributed gene expression modelling for exploring variability in epigenetic function
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5097851/
https://www.ncbi.nlm.nih.gov/pubmed/27816056
http://dx.doi.org/10.1186/s12859-016-1313-1
work_keys_str_mv AT buddendavidm distributedgeneexpressionmodellingforexploringvariabilityinepigeneticfunction
AT crampinedmundj distributedgeneexpressionmodellingforexploringvariabilityinepigeneticfunction