Cargando…
Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage
Analysis of a patient's genomics data is the first step toward precision medicine. Such analyses are performed on expensive enterprise-class server machines because input data sets are large, and the intermediate data structures are even larger (TB-size) and require random accesses. We present...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116887/ https://www.ncbi.nlm.nih.gov/pubmed/33995473 http://dx.doi.org/10.3389/fgene.2021.615958 |
_version_ | 1783691493541675008 |
---|---|
author | Cadenelli, Nicola Jun, Sang-Woo Polo, Jordà Wright, Andrew Carrera, David Arvind, |
author_facet | Cadenelli, Nicola Jun, Sang-Woo Polo, Jordà Wright, Andrew Carrera, David Arvind, |
author_sort | Cadenelli, Nicola |
collection | PubMed |
description | Analysis of a patient's genomics data is the first step toward precision medicine. Such analyses are performed on expensive enterprise-class server machines because input data sets are large, and the intermediate data structures are even larger (TB-size) and require random accesses. We present a general method to perform a specific genomics problem, mutation detection, on a cheap commodity personal computer (PC) with a small amount of DRAM. We construct and access large histograms of k-mers efficiently on external storage (SSDs) and apply our technique to a state-of-the-art reference-free genomics algorithm, SMUFIN, to create SMUFIN-F. We show that on two PCs, SMUFIN-F can achieve the same throughput at only one third (36%) the hardware cost and half (45%) the energy compared to SMUFIN on an enterprise-class server. To the best of our knowledge, SMUFIN-F is the first reference-free system that can detect somatic mutations on commodity PCs for whole human genomes. We believe our technique should apply to other k-mer or n-gram-based algorithms. |
format | Online Article Text |
id | pubmed-8116887 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-81168872021-05-14 Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage Cadenelli, Nicola Jun, Sang-Woo Polo, Jordà Wright, Andrew Carrera, David Arvind, Front Genet Genetics Analysis of a patient's genomics data is the first step toward precision medicine. Such analyses are performed on expensive enterprise-class server machines because input data sets are large, and the intermediate data structures are even larger (TB-size) and require random accesses. We present a general method to perform a specific genomics problem, mutation detection, on a cheap commodity personal computer (PC) with a small amount of DRAM. We construct and access large histograms of k-mers efficiently on external storage (SSDs) and apply our technique to a state-of-the-art reference-free genomics algorithm, SMUFIN, to create SMUFIN-F. We show that on two PCs, SMUFIN-F can achieve the same throughput at only one third (36%) the hardware cost and half (45%) the energy compared to SMUFIN on an enterprise-class server. To the best of our knowledge, SMUFIN-F is the first reference-free system that can detect somatic mutations on commodity PCs for whole human genomes. We believe our technique should apply to other k-mer or n-gram-based algorithms. Frontiers Media S.A. 2021-04-29 /pmc/articles/PMC8116887/ /pubmed/33995473 http://dx.doi.org/10.3389/fgene.2021.615958 Text en Copyright © 2021 Cadenelli, Jun, Polo, Wright, Carrera and Arvind. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Cadenelli, Nicola Jun, Sang-Woo Polo, Jordà Wright, Andrew Carrera, David Arvind, Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage |
title | Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage |
title_full | Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage |
title_fullStr | Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage |
title_full_unstemmed | Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage |
title_short | Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage |
title_sort | enabling genomics pipelines in commodity personal computers with flash storage |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116887/ https://www.ncbi.nlm.nih.gov/pubmed/33995473 http://dx.doi.org/10.3389/fgene.2021.615958 |
work_keys_str_mv | AT cadenellinicola enablinggenomicspipelinesincommoditypersonalcomputerswithflashstorage AT junsangwoo enablinggenomicspipelinesincommoditypersonalcomputerswithflashstorage AT polojorda enablinggenomicspipelinesincommoditypersonalcomputerswithflashstorage AT wrightandrew enablinggenomicspipelinesincommoditypersonalcomputerswithflashstorage AT carreradavid enablinggenomicspipelinesincommoditypersonalcomputerswithflashstorage AT arvind enablinggenomicspipelinesincommoditypersonalcomputerswithflashstorage |