Cargando…

Scaling Genomics Data Processing with Memory-Driven Computing to Accelerate Computational Biology

Research is increasingly becoming data-driven, and natural sciences are not an exception. In both biology and medicine, we are observing an exponential growth of structured data collections from experiments and population studies, enabling us to gain novel insights that would otherwise not be possib...

Descripción completa

Detalles Bibliográficos
Autores principales:	Becker, Matthias, Worlikar, Umesh, Agrawal, Shobhit, Schultze, Hartmut, Ulas, Thomas, Singhal, Sharad, Schultze, Joachim L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295347/ http://dx.doi.org/10.1007/978-3-030-50743-5_17

_version_	1783546633353428992
author	Becker, Matthias Worlikar, Umesh Agrawal, Shobhit Schultze, Hartmut Ulas, Thomas Singhal, Sharad Schultze, Joachim L.
author_facet	Becker, Matthias Worlikar, Umesh Agrawal, Shobhit Schultze, Hartmut Ulas, Thomas Singhal, Sharad Schultze, Joachim L.
author_sort	Becker, Matthias
collection	PubMed
description	Research is increasingly becoming data-driven, and natural sciences are not an exception. In both biology and medicine, we are observing an exponential growth of structured data collections from experiments and population studies, enabling us to gain novel insights that would otherwise not be possible. However, these growing data sets pose a challenge for existing compute infrastructures since data is outgrowing limits within compute. In this work, we present the application of a novel approach, Memory-Driven Computing (MDC), in the life sciences. MDC proposes a data-centric approach that has been designed for growing data sizes and provides a composable infrastructure for changing workloads. In particular, we show how a typical pipeline for genomics data processing can be accelerated, and application modifications required to exploit this novel architecture. Furthermore, we demonstrate how the isolated evaluation of individual tasks misses significant overheads of typical pipelines in genomics data processing.
format	Online Article Text
id	pubmed-7295347
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-72953472020-06-16 Scaling Genomics Data Processing with Memory-Driven Computing to Accelerate Computational Biology Becker, Matthias Worlikar, Umesh Agrawal, Shobhit Schultze, Hartmut Ulas, Thomas Singhal, Sharad Schultze, Joachim L. High Performance Computing Article Research is increasingly becoming data-driven, and natural sciences are not an exception. In both biology and medicine, we are observing an exponential growth of structured data collections from experiments and population studies, enabling us to gain novel insights that would otherwise not be possible. However, these growing data sets pose a challenge for existing compute infrastructures since data is outgrowing limits within compute. In this work, we present the application of a novel approach, Memory-Driven Computing (MDC), in the life sciences. MDC proposes a data-centric approach that has been designed for growing data sizes and provides a composable infrastructure for changing workloads. In particular, we show how a typical pipeline for genomics data processing can be accelerated, and application modifications required to exploit this novel architecture. Furthermore, we demonstrate how the isolated evaluation of individual tasks misses significant overheads of typical pipelines in genomics data processing. 2020-05-22 /pmc/articles/PMC7295347/ http://dx.doi.org/10.1007/978-3-030-50743-5_17 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Becker, Matthias Worlikar, Umesh Agrawal, Shobhit Schultze, Hartmut Ulas, Thomas Singhal, Sharad Schultze, Joachim L. Scaling Genomics Data Processing with Memory-Driven Computing to Accelerate Computational Biology
title	Scaling Genomics Data Processing with Memory-Driven Computing to Accelerate Computational Biology
title_full	Scaling Genomics Data Processing with Memory-Driven Computing to Accelerate Computational Biology
title_fullStr	Scaling Genomics Data Processing with Memory-Driven Computing to Accelerate Computational Biology
title_full_unstemmed	Scaling Genomics Data Processing with Memory-Driven Computing to Accelerate Computational Biology
title_short	Scaling Genomics Data Processing with Memory-Driven Computing to Accelerate Computational Biology
title_sort	scaling genomics data processing with memory-driven computing to accelerate computational biology
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7295347/ http://dx.doi.org/10.1007/978-3-030-50743-5_17
work_keys_str_mv	AT beckermatthias scalinggenomicsdataprocessingwithmemorydrivencomputingtoacceleratecomputationalbiology AT worlikarumesh scalinggenomicsdataprocessingwithmemorydrivencomputingtoacceleratecomputationalbiology AT agrawalshobhit scalinggenomicsdataprocessingwithmemorydrivencomputingtoacceleratecomputationalbiology AT schultzehartmut scalinggenomicsdataprocessingwithmemorydrivencomputingtoacceleratecomputationalbiology AT ulasthomas scalinggenomicsdataprocessingwithmemorydrivencomputingtoacceleratecomputationalbiology AT singhalsharad scalinggenomicsdataprocessingwithmemorydrivencomputingtoacceleratecomputationalbiology AT schultzejoachiml scalinggenomicsdataprocessingwithmemorydrivencomputingtoacceleratecomputationalbiology

Scaling Genomics Data Processing with Memory-Driven Computing to Accelerate Computational Biology

Ejemplares similares