Cargando…

A cloud-based workflow to quantify transcript-expression levels in public cancer compendia

Public compendia of sequencing data are now measured in petabytes. Accordingly, it is infeasible for researchers to transfer these data to local computers. Recently, the National Cancer Institute began exploring opportunities to work with molecular data in cloud-computing environments. With this app...

Descripción completa

Detalles Bibliográficos
Autores principales: Tatlow, PJ, Piccolo, Stephen R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5159871/
https://www.ncbi.nlm.nih.gov/pubmed/27982081
http://dx.doi.org/10.1038/srep39259
_version_ 1782481837606567936
author Tatlow, PJ
Piccolo, Stephen R.
author_facet Tatlow, PJ
Piccolo, Stephen R.
author_sort Tatlow, PJ
collection PubMed
description Public compendia of sequencing data are now measured in petabytes. Accordingly, it is infeasible for researchers to transfer these data to local computers. Recently, the National Cancer Institute began exploring opportunities to work with molecular data in cloud-computing environments. With this approach, it becomes possible for scientists to take their tools to the data and thereby avoid large data transfers. It also becomes feasible to scale computing resources to the needs of a given analysis. We quantified transcript-expression levels for 12,307 RNA-Sequencing samples from the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas. We used two cloud-based configurations and examined the performance and cost profiles of each configuration. Using preemptible virtual machines, we processed the samples for as little as $0.09 (USD) per sample. As the samples were processed, we collected performance metrics, which helped us track the duration of each processing step and quantified computational resources used at different stages of sample processing. Although the computational demands of reference alignment and expression quantification have decreased considerably, there remains a critical need for researchers to optimize preprocessing steps. We have stored the software, scripts, and processed data in a publicly accessible repository (https://osf.io/gqrz9).
format Online
Article
Text
id pubmed-5159871
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-51598712016-12-21 A cloud-based workflow to quantify transcript-expression levels in public cancer compendia Tatlow, PJ Piccolo, Stephen R. Sci Rep Article Public compendia of sequencing data are now measured in petabytes. Accordingly, it is infeasible for researchers to transfer these data to local computers. Recently, the National Cancer Institute began exploring opportunities to work with molecular data in cloud-computing environments. With this approach, it becomes possible for scientists to take their tools to the data and thereby avoid large data transfers. It also becomes feasible to scale computing resources to the needs of a given analysis. We quantified transcript-expression levels for 12,307 RNA-Sequencing samples from the Cancer Cell Line Encyclopedia and The Cancer Genome Atlas. We used two cloud-based configurations and examined the performance and cost profiles of each configuration. Using preemptible virtual machines, we processed the samples for as little as $0.09 (USD) per sample. As the samples were processed, we collected performance metrics, which helped us track the duration of each processing step and quantified computational resources used at different stages of sample processing. Although the computational demands of reference alignment and expression quantification have decreased considerably, there remains a critical need for researchers to optimize preprocessing steps. We have stored the software, scripts, and processed data in a publicly accessible repository (https://osf.io/gqrz9). Nature Publishing Group 2016-12-16 /pmc/articles/PMC5159871/ /pubmed/27982081 http://dx.doi.org/10.1038/srep39259 Text en Copyright © 2016, The Author(s) http://creativecommons.org/licenses/by/4.0/ This work is licensed under a Creative Commons Attribution 4.0 International License. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in the credit line; if the material is not included under the Creative Commons license, users will need to obtain permission from the license holder to reproduce the material. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/
spellingShingle Article
Tatlow, PJ
Piccolo, Stephen R.
A cloud-based workflow to quantify transcript-expression levels in public cancer compendia
title A cloud-based workflow to quantify transcript-expression levels in public cancer compendia
title_full A cloud-based workflow to quantify transcript-expression levels in public cancer compendia
title_fullStr A cloud-based workflow to quantify transcript-expression levels in public cancer compendia
title_full_unstemmed A cloud-based workflow to quantify transcript-expression levels in public cancer compendia
title_short A cloud-based workflow to quantify transcript-expression levels in public cancer compendia
title_sort cloud-based workflow to quantify transcript-expression levels in public cancer compendia
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5159871/
https://www.ncbi.nlm.nih.gov/pubmed/27982081
http://dx.doi.org/10.1038/srep39259
work_keys_str_mv AT tatlowpj acloudbasedworkflowtoquantifytranscriptexpressionlevelsinpubliccancercompendia
AT piccolostephenr acloudbasedworkflowtoquantifytranscriptexpressionlevelsinpubliccancercompendia
AT tatlowpj cloudbasedworkflowtoquantifytranscriptexpressionlevelsinpubliccancercompendia
AT piccolostephenr cloudbasedworkflowtoquantifytranscriptexpressionlevelsinpubliccancercompendia