Cargando…

Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users

MOTIVATION: The ability to centralize and store data for long periods on an end user’s computational resources is increasingly difficult for many scientific disciplines. For example, genomics data is increasingly large and distributed, and the data needs to be moved into workflow execution sites ran...

Descripción completa

Detalles Bibliográficos
Autores principales: Watts, Nicholas A, Feltus, Frank A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408802/
https://www.ncbi.nlm.nih.gov/pubmed/27797780
http://dx.doi.org/10.1093/bioinformatics/btw679
_version_ 1783232367871131648
author Watts, Nicholas A
Feltus, Frank A
author_facet Watts, Nicholas A
Feltus, Frank A
author_sort Watts, Nicholas A
collection PubMed
description MOTIVATION: The ability to centralize and store data for long periods on an end user’s computational resources is increasingly difficult for many scientific disciplines. For example, genomics data is increasingly large and distributed, and the data needs to be moved into workflow execution sites ranging from lab workstations to the cloud. However, the typical user is not always informed on emerging network technology or the most efficient methods to move and share data. Thus, the user defaults to using inefficient methods for transfer across the commercial internet. RESULTS: To accelerate large data transfer, we created a tool called the Big Data Smart Socket (BDSS) that abstracts data transfer methodology from the user. The user provides BDSS with a manifest of datasets stored in a remote storage repository. BDSS then queries a metadata repository for curated data transfer mechanisms and optimal path to move each of the files in the manifest to the site of workflow execution. BDSS functions as a standalone tool or can be directly integrated into a computational workflow such as provided by the Galaxy Project. To demonstrate applicability, we use BDSS within a biological context, although it is applicable to any scientific domain. AVAILABILITY AND IMPLEMENTATION: BDSS is available under version 2 of the GNU General Public License at https://github.com/feltus/BDSS.
format Online
Article
Text
id pubmed-5408802
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54088022017-05-03 Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users Watts, Nicholas A Feltus, Frank A Bioinformatics Applications Notes MOTIVATION: The ability to centralize and store data for long periods on an end user’s computational resources is increasingly difficult for many scientific disciplines. For example, genomics data is increasingly large and distributed, and the data needs to be moved into workflow execution sites ranging from lab workstations to the cloud. However, the typical user is not always informed on emerging network technology or the most efficient methods to move and share data. Thus, the user defaults to using inefficient methods for transfer across the commercial internet. RESULTS: To accelerate large data transfer, we created a tool called the Big Data Smart Socket (BDSS) that abstracts data transfer methodology from the user. The user provides BDSS with a manifest of datasets stored in a remote storage repository. BDSS then queries a metadata repository for curated data transfer mechanisms and optimal path to move each of the files in the manifest to the site of workflow execution. BDSS functions as a standalone tool or can be directly integrated into a computational workflow such as provided by the Galaxy Project. To demonstrate applicability, we use BDSS within a biological context, although it is applicable to any scientific domain. AVAILABILITY AND IMPLEMENTATION: BDSS is available under version 2 of the GNU General Public License at https://github.com/feltus/BDSS. Oxford University Press 2017-02-15 2016-11-28 /pmc/articles/PMC5408802/ /pubmed/27797780 http://dx.doi.org/10.1093/bioinformatics/btw679 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Applications Notes
Watts, Nicholas A
Feltus, Frank A
Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users
title Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users
title_full Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users
title_fullStr Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users
title_full_unstemmed Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users
title_short Big Data Smart Socket (BDSS): a system that abstracts data transfer habits from end users
title_sort big data smart socket (bdss): a system that abstracts data transfer habits from end users
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5408802/
https://www.ncbi.nlm.nih.gov/pubmed/27797780
http://dx.doi.org/10.1093/bioinformatics/btw679
work_keys_str_mv AT wattsnicholasa bigdatasmartsocketbdssasystemthatabstractsdatatransferhabitsfromendusers
AT feltusfranka bigdatasmartsocketbdssasystemthatabstractsdatatransferhabitsfromendusers