Cargando…
Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce
Motivation: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officia...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978928/ https://www.ncbi.nlm.nih.gov/pubmed/27153614 http://dx.doi.org/10.1093/bioinformatics/btw177 |
_version_ | 1782447242485956608 |
---|---|
author | Nellore, Abhinav Wilks, Christopher Hansen, Kasper D. Leek, Jeffrey T. Langmead, Ben |
author_facet | Nellore, Abhinav Wilks, Christopher Hansen, Kasper D. Leek, Jeffrey T. Langmead, Ben |
author_sort | Nellore, Abhinav |
collection | PubMed |
description | Motivation: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data. Results: We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise. Availability and Implementation: Rail-RNA is available from http://rail.bio. Technical details on the Rail-dbGaP protocol as well as an implementation walkthrough are available at https://github.com/nellore/rail-dbgap. Detailed instructions on running Rail-RNA on dbGaP-protected data using Amazon Web Services are available at http://docs.rail.bio/dbgap/. Contacts: anellore@gmail.com or langmea@cs.jhu.edu Supplementary information: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-4978928 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-49789282016-08-11 Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce Nellore, Abhinav Wilks, Christopher Hansen, Kasper D. Leek, Jeffrey T. Langmead, Ben Bioinformatics Applications Notes Motivation: Public archives contain thousands of trillions of bases of valuable sequencing data. More than 40% of the Sequence Read Archive is human data protected by provisions such as dbGaP. To analyse dbGaP-protected data, researchers must typically work with IT administrators and signing officials to ensure all levels of security are implemented at their institution. This is a major obstacle, impeding reproducibility and reducing the utility of archived data. Results: We present a protocol and software tool for analyzing protected data in a commercial cloud. The protocol, Rail-dbGaP, is applicable to any tool running on Amazon Web Services Elastic MapReduce. The tool, Rail-RNA v0.2, is a spliced aligner for RNA-seq data, which we demonstrate by running on 9662 samples from the dbGaP-protected GTEx consortium dataset. The Rail-dbGaP protocol makes explicit for the first time the steps an investigator must take to develop Elastic MapReduce pipelines that analyse dbGaP-protected data in a manner compliant with NIH guidelines. Rail-RNA automates implementation of the protocol, making it easy for typical biomedical investigators to study protected RNA-seq data, regardless of their local IT resources or expertise. Availability and Implementation: Rail-RNA is available from http://rail.bio. Technical details on the Rail-dbGaP protocol as well as an implementation walkthrough are available at https://github.com/nellore/rail-dbgap. Detailed instructions on running Rail-RNA on dbGaP-protected data using Amazon Web Services are available at http://docs.rail.bio/dbgap/. Contacts: anellore@gmail.com or langmea@cs.jhu.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-08-15 2016-04-21 /pmc/articles/PMC4978928/ /pubmed/27153614 http://dx.doi.org/10.1093/bioinformatics/btw177 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Applications Notes Nellore, Abhinav Wilks, Christopher Hansen, Kasper D. Leek, Jeffrey T. Langmead, Ben Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce |
title | Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce |
title_full | Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce |
title_fullStr | Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce |
title_full_unstemmed | Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce |
title_short | Rail-dbGaP: analyzing dbGaP-protected data in the cloud with Amazon Elastic MapReduce |
title_sort | rail-dbgap: analyzing dbgap-protected data in the cloud with amazon elastic mapreduce |
topic | Applications Notes |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978928/ https://www.ncbi.nlm.nih.gov/pubmed/27153614 http://dx.doi.org/10.1093/bioinformatics/btw177 |
work_keys_str_mv | AT nelloreabhinav raildbgapanalyzingdbgapprotecteddatainthecloudwithamazonelasticmapreduce AT wilkschristopher raildbgapanalyzingdbgapprotecteddatainthecloudwithamazonelasticmapreduce AT hansenkasperd raildbgapanalyzingdbgapprotecteddatainthecloudwithamazonelasticmapreduce AT leekjeffreyt raildbgapanalyzingdbgapprotecteddatainthecloudwithamazonelasticmapreduce AT langmeadben raildbgapanalyzingdbgapprotecteddatainthecloudwithamazonelasticmapreduce |