Cargando…

Enabling Genomics Pipelines in Commodity Personal Computers With Flash Storage

Analysis of a patient's genomics data is the first step toward precision medicine. Such analyses are performed on expensive enterprise-class server machines because input data sets are large, and the intermediate data structures are even larger (TB-size) and require random accesses. We present...

Descripción completa

Detalles Bibliográficos
Autores principales: Cadenelli, Nicola, Jun, Sang-Woo, Polo, Jordà, Wright, Andrew, Carrera, David, Arvind
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8116887/
https://www.ncbi.nlm.nih.gov/pubmed/33995473
http://dx.doi.org/10.3389/fgene.2021.615958
Descripción
Sumario:Analysis of a patient's genomics data is the first step toward precision medicine. Such analyses are performed on expensive enterprise-class server machines because input data sets are large, and the intermediate data structures are even larger (TB-size) and require random accesses. We present a general method to perform a specific genomics problem, mutation detection, on a cheap commodity personal computer (PC) with a small amount of DRAM. We construct and access large histograms of k-mers efficiently on external storage (SSDs) and apply our technique to a state-of-the-art reference-free genomics algorithm, SMUFIN, to create SMUFIN-F. We show that on two PCs, SMUFIN-F can achieve the same throughput at only one third (36%) the hardware cost and half (45%) the energy compared to SMUFIN on an enterprise-class server. To the best of our knowledge, SMUFIN-F is the first reference-free system that can detect somatic mutations on commodity PCs for whole human genomes. We believe our technique should apply to other k-mer or n-gram-based algorithms.