Cargando…

Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage

Analysis of a patient's genomics data is the first step toward precision medicine. Such analyses are performed on expensive enterprise-class server machines because input data sets are large, and the intermediate data structures are even larger (TB-size) and require random accesses. We present...

Descripción completa

Detalles Bibliográficos
Autores principales: Cadenelli, Nicola, Jun, Sang-Woo, Polo, Jordà, Wright, Andrew, Carrera, David, Arvind
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116887/
https://www.ncbi.nlm.nih.gov/pubmed/33995473
http://dx.doi.org/10.3389/fgene.2021.615958
_version_ 1783691493541675008
author Cadenelli, Nicola
Jun, Sang-Woo
Polo, Jordà
Wright, Andrew
Carrera, David
Arvind,
author_facet Cadenelli, Nicola
Jun, Sang-Woo
Polo, Jordà
Wright, Andrew
Carrera, David
Arvind,
author_sort Cadenelli, Nicola
collection PubMed
description Analysis of a patient's genomics data is the first step toward precision medicine. Such analyses are performed on expensive enterprise-class server machines because input data sets are large, and the intermediate data structures are even larger (TB-size) and require random accesses. We present a general method to perform a specific genomics problem, mutation detection, on a cheap commodity personal computer (PC) with a small amount of DRAM. We construct and access large histograms of k-mers efficiently on external storage (SSDs) and apply our technique to a state-of-the-art reference-free genomics algorithm, SMUFIN, to create SMUFIN-F. We show that on two PCs, SMUFIN-F can achieve the same throughput at only one third (36%) the hardware cost and half (45%) the energy compared to SMUFIN on an enterprise-class server. To the best of our knowledge, SMUFIN-F is the first reference-free system that can detect somatic mutations on commodity PCs for whole human genomes. We believe our technique should apply to other k-mer or n-gram-based algorithms.
format Online
Article
Text
id pubmed-8116887
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-81168872021-05-14 Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage Cadenelli, Nicola Jun, Sang-Woo Polo, Jordà Wright, Andrew Carrera, David Arvind, Front Genet Genetics Analysis of a patient's genomics data is the first step toward precision medicine. Such analyses are performed on expensive enterprise-class server machines because input data sets are large, and the intermediate data structures are even larger (TB-size) and require random accesses. We present a general method to perform a specific genomics problem, mutation detection, on a cheap commodity personal computer (PC) with a small amount of DRAM. We construct and access large histograms of k-mers efficiently on external storage (SSDs) and apply our technique to a state-of-the-art reference-free genomics algorithm, SMUFIN, to create SMUFIN-F. We show that on two PCs, SMUFIN-F can achieve the same throughput at only one third (36%) the hardware cost and half (45%) the energy compared to SMUFIN on an enterprise-class server. To the best of our knowledge, SMUFIN-F is the first reference-free system that can detect somatic mutations on commodity PCs for whole human genomes. We believe our technique should apply to other k-mer or n-gram-based algorithms. Frontiers Media S.A. 2021-04-29 /pmc/articles/PMC8116887/ /pubmed/33995473 http://dx.doi.org/10.3389/fgene.2021.615958 Text en Copyright © 2021 Cadenelli, Jun, Polo, Wright, Carrera and Arvind. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Cadenelli, Nicola
Jun, Sang-Woo
Polo, Jordà
Wright, Andrew
Carrera, David
Arvind,
Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage
title Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage
title_full Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage
title_fullStr Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage
title_full_unstemmed Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage
title_short Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage
title_sort enabling genomics pipelines in commodity personal computers with flash storage
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116887/
https://www.ncbi.nlm.nih.gov/pubmed/33995473
http://dx.doi.org/10.3389/fgene.2021.615958
work_keys_str_mv AT cadenellinicola enablinggenomicspipelinesincommoditypersonalcomputerswithflashstorage
AT junsangwoo enablinggenomicspipelinesincommoditypersonalcomputerswithflashstorage
AT polojorda enablinggenomicspipelinesincommoditypersonalcomputerswithflashstorage
AT wrightandrew enablinggenomicspipelinesincommoditypersonalcomputerswithflashstorage
AT carreradavid enablinggenomicspipelinesincommoditypersonalcomputerswithflashstorage
AT arvind enablinggenomicspipelinesincommoditypersonalcomputerswithflashstorage