Cargando…

Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis

BACKGROUND: The use of whole genome sequence has increased recently with rapid progression of next-generation sequencing (NGS) technologies. However, storing raw sequence reads to perform large-scale genome analysis pose hardware challenges. Despite advancement in genome analytic platforms, efficien...

Descripción completa

Detalles Bibliográficos
Autores principales: Sona, Peter, Hong, Jong Hui, Lee, Sunho, Kim, Byong Joon, Hong, Woon-Young, Jung, Jongcheol, Kim, Han-Na, Kim, Hyung-Lae, Christopher, David, Herviou, Laurent, Im, Young Hwan, Lee, Kwee-Yum, Kim, Tae Soon, Jung, Jongsun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6276166/
https://www.ncbi.nlm.nih.gov/pubmed/30509173
http://dx.doi.org/10.1186/s12859-018-2499-1
_version_ 1783377958775291904
author Sona, Peter
Hong, Jong Hui
Lee, Sunho
Kim, Byong Joon
Hong, Woon-Young
Jung, Jongcheol
Kim, Han-Na
Kim, Hyung-Lae
Christopher, David
Herviou, Laurent
Im, Young Hwan
Lee, Kwee-Yum
Kim, Tae Soon
Jung, Jongsun
author_facet Sona, Peter
Hong, Jong Hui
Lee, Sunho
Kim, Byong Joon
Hong, Woon-Young
Jung, Jongcheol
Kim, Han-Na
Kim, Hyung-Lae
Christopher, David
Herviou, Laurent
Im, Young Hwan
Lee, Kwee-Yum
Kim, Tae Soon
Jung, Jongsun
author_sort Sona, Peter
collection PubMed
description BACKGROUND: The use of whole genome sequence has increased recently with rapid progression of next-generation sequencing (NGS) technologies. However, storing raw sequence reads to perform large-scale genome analysis pose hardware challenges. Despite advancement in genome analytic platforms, efficient approaches remain relevant especially as applied to the human genome. In this study, an Integrated Genome Sizing (IGS) approach is adopted to speed up multiple whole genome analysis in high-performance computing (HPC) environment. The approach splits a genome (GRCh37) into 630 chunks (fragments) wherein multiple chunks can simultaneously be parallelized for sequence analyses across cohorts. RESULTS: IGS was integrated on Maha-Fs (HPC) system, to provide the parallelization required to analyze 2504 whole genomes. Using a single reference pilot genome, NA12878, we compared the NGS process time between Maha-Fs (NFS SATA hard disk drive) and SGI-UV300 (solid state drive memory). It was observed that SGI-UV300 was faster, having 32.5 mins of process time, while that of the Maha-Fs was 55.2 mins. CONCLUSIONS: The implementation of IGS can leverage the ability of HPC systems to analyze multiple genomes simultaneously. We believe this approach will accelerate research advancement in personalized genomic medicine. Our method is comparable to the fastest methods for sequence alignment. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2499-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6276166
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62761662018-12-06 Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis Sona, Peter Hong, Jong Hui Lee, Sunho Kim, Byong Joon Hong, Woon-Young Jung, Jongcheol Kim, Han-Na Kim, Hyung-Lae Christopher, David Herviou, Laurent Im, Young Hwan Lee, Kwee-Yum Kim, Tae Soon Jung, Jongsun BMC Bioinformatics Methodology Article BACKGROUND: The use of whole genome sequence has increased recently with rapid progression of next-generation sequencing (NGS) technologies. However, storing raw sequence reads to perform large-scale genome analysis pose hardware challenges. Despite advancement in genome analytic platforms, efficient approaches remain relevant especially as applied to the human genome. In this study, an Integrated Genome Sizing (IGS) approach is adopted to speed up multiple whole genome analysis in high-performance computing (HPC) environment. The approach splits a genome (GRCh37) into 630 chunks (fragments) wherein multiple chunks can simultaneously be parallelized for sequence analyses across cohorts. RESULTS: IGS was integrated on Maha-Fs (HPC) system, to provide the parallelization required to analyze 2504 whole genomes. Using a single reference pilot genome, NA12878, we compared the NGS process time between Maha-Fs (NFS SATA hard disk drive) and SGI-UV300 (solid state drive memory). It was observed that SGI-UV300 was faster, having 32.5 mins of process time, while that of the Maha-Fs was 55.2 mins. CONCLUSIONS: The implementation of IGS can leverage the ability of HPC systems to analyze multiple genomes simultaneously. We believe this approach will accelerate research advancement in personalized genomic medicine. Our method is comparable to the fastest methods for sequence alignment. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2499-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-03 /pmc/articles/PMC6276166/ /pubmed/30509173 http://dx.doi.org/10.1186/s12859-018-2499-1 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Sona, Peter
Hong, Jong Hui
Lee, Sunho
Kim, Byong Joon
Hong, Woon-Young
Jung, Jongcheol
Kim, Han-Na
Kim, Hyung-Lae
Christopher, David
Herviou, Laurent
Im, Young Hwan
Lee, Kwee-Yum
Kim, Tae Soon
Jung, Jongsun
Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis
title Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis
title_full Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis
title_fullStr Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis
title_full_unstemmed Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis
title_short Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis
title_sort integrated genome sizing (igs) approach for the parallelization of whole genome analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6276166/
https://www.ncbi.nlm.nih.gov/pubmed/30509173
http://dx.doi.org/10.1186/s12859-018-2499-1
work_keys_str_mv AT sonapeter integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT hongjonghui integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT leesunho integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT kimbyongjoon integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT hongwoonyoung integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT jungjongcheol integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT kimhanna integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT kimhyunglae integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT christopherdavid integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT hervioulaurent integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT imyounghwan integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT leekweeyum integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT kimtaesoon integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis
AT jungjongsun integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis