Cargando…

Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce

Motivation: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officia...

Descripción completa

Detalles Bibliográficos
Autores principales: Nellore, Abhinav, Wilks, Christopher, Hansen, Kasper D., Leek, Jeffrey T., Langmead, Ben
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978928/
https://www.ncbi.nlm.nih.gov/pubmed/27153614
http://dx.doi.org/10.1093/bioinformatics/btw177
_version_ 1782447242485956608
author Nellore, Abhinav
Wilks, Christopher
Hansen, Kasper D.
Leek, Jeffrey T.
Langmead, Ben
author_facet Nellore, Abhinav
Wilks, Christopher
Hansen, Kasper D.
Leek, Jeffrey T.
Langmead, Ben
author_sort Nellore, Abhinav
collection PubMed
description Motivation: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data. Results: We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise. Availability and Implementation: Rail-RNA is available from http://rail.bio. Technical details on the Rail-dbGaP protocol as well as an implementation walkthrough are available at https://github.com/nellore/rail-dbgap. Detailed instructions on running Rail-RNA on dbGaP-protected data using Amazon Web Services are available at http://docs.rail.bio/dbgap/. Contacts: anellore@gmail.com or langmea@cs.jhu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4978928
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-49789282016-08-11 Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce Nellore, Abhinav Wilks, Christopher Hansen, Kasper D. Leek, Jeffrey T. Langmead, Ben Bioinformatics Applications Notes Motivation: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data. Results: We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise. Availability and Implementation: Rail-RNA is available from http://rail.bio. Technical details on the Rail-dbGaP protocol as well as an implementation walkthrough are available at https://github.com/nellore/rail-dbgap. Detailed instructions on running Rail-RNA on dbGaP-protected data using Amazon Web Services are available at http://docs.rail.bio/dbgap/. Contacts: anellore@gmail.com or langmea@cs.jhu.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-08-15 2016-04-21 /pmc/articles/PMC4978928/ /pubmed/27153614 http://dx.doi.org/10.1093/bioinformatics/btw177 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Nellore, Abhinav
Wilks, Christopher
Hansen, Kasper D.
Leek, Jeffrey T.
Langmead, Ben
Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce
title Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce
title_full Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce
title_fullStr Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce
title_full_unstemmed Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce
title_short Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce
title_sort rail-dbgap: analyzing dbgap-protected data in the cloud with amazon elastic mapreduce
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978928/
https://www.ncbi.nlm.nih.gov/pubmed/27153614
http://dx.doi.org/10.1093/bioinformatics/btw177
work_keys_str_mv AT nelloreabhinav raildbgapanalyzingdbgapprotecteddatainthecloudwithamazonelasticmapreduce
AT wilkschristopher raildbgapanalyzingdbgapprotecteddatainthecloudwithamazonelasticmapreduce
AT hansenkasperd raildbgapanalyzingdbgapprotecteddatainthecloudwithamazonelasticmapreduce
AT leekjeffreyt raildbgapanalyzingdbgapprotecteddatainthecloudwithamazonelasticmapreduce
AT langmeadben raildbgapanalyzingdbgapprotecteddatainthecloudwithamazonelasticmapreduce