Cargando…

Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputed data

MOTIVATION: Successful science often involves not only performing experiments well, but also choosing well among many possible experiments. In a hypothesis generation setting, choosing an experiment well means choosing an experiment whose results are interesting or novel. In this work, we formalize...

Descripción completa

Detalles Bibliográficos
Autores principales:	Schreiber, Jacob, Bilmes, Jeffrey, Noble, William Stafford
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8088321/ https://www.ncbi.nlm.nih.gov/pubmed/32966546 http://dx.doi.org/10.1093/bioinformatics/btaa830

_version_	1783686825543467008
author	Schreiber, Jacob Bilmes, Jeffrey Noble, William Stafford
author_facet	Schreiber, Jacob Bilmes, Jeffrey Noble, William Stafford
author_sort	Schreiber, Jacob
collection	PubMed
description	MOTIVATION: Successful science often involves not only performing experiments well, but also choosing well among many possible experiments. In a hypothesis generation setting, choosing an experiment well means choosing an experiment whose results are interesting or novel. In this work, we formalize this selection procedure in the context of genomics and epigenomics data generation. Specifically, we consider the task faced by a scientific consortium such as the National Institutes of Health ENCODE Consortium, whose goal is to characterize all of the functional elements in the human genome. Given a list of possible cell types or tissue types (‘biosamples’) and a list of possible high-throughput sequencing assays, where at least one experiment has been performed in each biosample and for each assay, we ask ‘Which experiments should ENCODE perform next?’ RESULTS: We demonstrate how to represent this task as a submodular optimization problem, where the goal is to choose a panel of experiments that maximize the facility location function. A key aspect of our approach is that we use imputed data, rather than experimental data, to directly answer the posed question. We find that, across several evaluations, our method chooses a panel of experiments that span a diversity of biochemical activity. Finally, we propose two modifications of the facility location function, including a novel submodular–supermodular function, that allow incorporation of domain knowledge or constraints into the optimization procedure. AVAILABILITY AND IMPLEMENTATION: Our method is available as a Python package at https://github.com/jmschrei/kiwano and can be installed using the command pip install kiwano. The source code used here and the similarity matrix can be found at http://doi.org/10.5281/zenodo.3708538. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-8088321
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-80883212021-05-05 Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputed data Schreiber, Jacob Bilmes, Jeffrey Noble, William Stafford Bioinformatics Original Papers MOTIVATION: Successful science often involves not only performing experiments well, but also choosing well among many possible experiments. In a hypothesis generation setting, choosing an experiment well means choosing an experiment whose results are interesting or novel. In this work, we formalize this selection procedure in the context of genomics and epigenomics data generation. Specifically, we consider the task faced by a scientific consortium such as the National Institutes of Health ENCODE Consortium, whose goal is to characterize all of the functional elements in the human genome. Given a list of possible cell types or tissue types (‘biosamples’) and a list of possible high-throughput sequencing assays, where at least one experiment has been performed in each biosample and for each assay, we ask ‘Which experiments should ENCODE perform next?’ RESULTS: We demonstrate how to represent this task as a submodular optimization problem, where the goal is to choose a panel of experiments that maximize the facility location function. A key aspect of our approach is that we use imputed data, rather than experimental data, to directly answer the posed question. We find that, across several evaluations, our method chooses a panel of experiments that span a diversity of biochemical activity. Finally, we propose two modifications of the facility location function, including a novel submodular–supermodular function, that allow incorporation of domain knowledge or constraints into the optimization procedure. AVAILABILITY AND IMPLEMENTATION: Our method is available as a Python package at https://github.com/jmschrei/kiwano and can be installed using the command pip install kiwano. The source code used here and the similarity matrix can be found at http://doi.org/10.5281/zenodo.3708538. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-09-23 /pmc/articles/PMC8088321/ /pubmed/32966546 http://dx.doi.org/10.1093/bioinformatics/btaa830 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Schreiber, Jacob Bilmes, Jeffrey Noble, William Stafford Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputed data
title	Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputed data
title_full	Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputed data
title_fullStr	Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputed data
title_full_unstemmed	Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputed data
title_short	Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputed data
title_sort	prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputed data
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8088321/ https://www.ncbi.nlm.nih.gov/pubmed/32966546 http://dx.doi.org/10.1093/bioinformatics/btaa830
work_keys_str_mv	AT schreiberjacob prioritizingtranscriptomicandepigenomicexperimentsusinganoptimizationstrategythatleveragesimputeddata AT bilmesjeffrey prioritizingtranscriptomicandepigenomicexperimentsusinganoptimizationstrategythatleveragesimputeddata AT noblewilliamstafford prioritizingtranscriptomicandepigenomicexperimentsusinganoptimizationstrategythatleveragesimputeddata

Prioritizing transcriptomic and epigenomic experiments using an optimization strategy that leverages imputed data

Ejemplares similares