Cargando…

Design and implementation of a hybrid cloud system for large-scale human genomic research

In the field of genomic medical research, the amount of large-scale information continues to increase due to advances in measurement technologies, such as high-performance sequencing and spatial omics, as well as the progress made in genomic cohort studies involving more than one million individuals...

Descripción completa

Detalles Bibliográficos
Autores principales: Nagasaki, Masao, Sekiya, Yayoi, Asakura, Akihiro, Teraoka, Ryo, Otokozawa, Ryoko, Hashimoto, Hiroki, Kawaguchi, Takahisa, Fukazawa, Keiichiro, Inadomi, Yuichi, Murata, Ken T., Ohkawa, Yasuyuki, Yamaguchi, Izumi, Mizuhara, Takamichi, Tokunaga, Katsushi, Sekiya, Yuji, Hanawa, Toshihiro, Yamada, Ryo, Matsuda, Fumihiko
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9908893/
https://www.ncbi.nlm.nih.gov/pubmed/36755016
http://dx.doi.org/10.1038/s41439-023-00231-2
Descripción
Sumario:In the field of genomic medical research, the amount of large-scale information continues to increase due to advances in measurement technologies, such as high-performance sequencing and spatial omics, as well as the progress made in genomic cohort studies involving more than one million individuals. Therefore, researchers require more computational resources to analyze this information. Here, we introduce a hybrid cloud system consisting of an on-premise supercomputer, science cloud, and public cloud at the Kyoto University Center for Genomic Medicine in Japan as a solution. This system can flexibly handle various heterogeneous computational resource-demanding bioinformatics tools while scaling the computational capacity. In the hybrid cloud system, we demonstrate the way to properly perform joint genotyping of whole-genome sequencing data for a large population of 11,238, which can be a bottleneck in sequencing data analysis. This system can be one of the reference implementations when dealing with large amounts of genomic medical data in research centers and organizations.