Cargando…

Scalable in-memory processing of omics workflows

We present a proof of concept implementation of the in-memory computing paradigm that we use to facilitate the analysis of metagenomic sequencing reads. In doing so we compare the performance of POSIX™file systems and key-value storage for omics data, and we show the potential for integrating high-p...

Descripción completa

Detalles Bibliográficos
Autores principales: Elisseev, Vadim, Gardiner, Laura-Jayne, Krishna, Ritesh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052061/
https://www.ncbi.nlm.nih.gov/pubmed/35521547
http://dx.doi.org/10.1016/j.csbj.2022.04.014
_version_ 1784696704326434816
author Elisseev, Vadim
Gardiner, Laura-Jayne
Krishna, Ritesh
author_facet Elisseev, Vadim
Gardiner, Laura-Jayne
Krishna, Ritesh
author_sort Elisseev, Vadim
collection PubMed
description We present a proof of concept implementation of the in-memory computing paradigm that we use to facilitate the analysis of metagenomic sequencing reads. In doing so we compare the performance of POSIX™file systems and key-value storage for omics data, and we show the potential for integrating high-performance computing (HPC) and cloud native technologies. We show that in-memory key-value storage offers possibilities for improved handling of omics data through more flexible and faster data processing. We envision fully containerized workflows and their deployment in portable micro-pipelines with multiple instances working concurrently with the same distributed in-memory storage. To highlight the potential usage of this technology for event driven and real-time data processing, we use a biological case study focused on the growing threat of antimicrobial resistance (AMR). We develop a workflow encompassing bioinformatics and explainable machine learning (ML) to predict life expectancy of a population based on the microbiome of its sewage while providing a description of AMR contribution to the prediction. We propose that in future, performing such analyses in ’real-time’ would allow us to assess the potential risk to the population based on changes in the AMR profile of the community.
format Online
Article
Text
id pubmed-9052061
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-90520612022-05-04 Scalable in-memory processing of omics workflows Elisseev, Vadim Gardiner, Laura-Jayne Krishna, Ritesh Comput Struct Biotechnol J Research Article We present a proof of concept implementation of the in-memory computing paradigm that we use to facilitate the analysis of metagenomic sequencing reads. In doing so we compare the performance of POSIX™file systems and key-value storage for omics data, and we show the potential for integrating high-performance computing (HPC) and cloud native technologies. We show that in-memory key-value storage offers possibilities for improved handling of omics data through more flexible and faster data processing. We envision fully containerized workflows and their deployment in portable micro-pipelines with multiple instances working concurrently with the same distributed in-memory storage. To highlight the potential usage of this technology for event driven and real-time data processing, we use a biological case study focused on the growing threat of antimicrobial resistance (AMR). We develop a workflow encompassing bioinformatics and explainable machine learning (ML) to predict life expectancy of a population based on the microbiome of its sewage while providing a description of AMR contribution to the prediction. We propose that in future, performing such analyses in ’real-time’ would allow us to assess the potential risk to the population based on changes in the AMR profile of the community. Research Network of Computational and Structural Biotechnology 2022-04-20 /pmc/articles/PMC9052061/ /pubmed/35521547 http://dx.doi.org/10.1016/j.csbj.2022.04.014 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Elisseev, Vadim
Gardiner, Laura-Jayne
Krishna, Ritesh
Scalable in-memory processing of omics workflows
title Scalable in-memory processing of omics workflows
title_full Scalable in-memory processing of omics workflows
title_fullStr Scalable in-memory processing of omics workflows
title_full_unstemmed Scalable in-memory processing of omics workflows
title_short Scalable in-memory processing of omics workflows
title_sort scalable in-memory processing of omics workflows
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9052061/
https://www.ncbi.nlm.nih.gov/pubmed/35521547
http://dx.doi.org/10.1016/j.csbj.2022.04.014
work_keys_str_mv AT elisseevvadim scalableinmemoryprocessingofomicsworkflows
AT gardinerlaurajayne scalableinmemoryprocessingofomicsworkflows
AT krishnaritesh scalableinmemoryprocessingofomicsworkflows