Cargando…
Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis
BACKGROUND: The use of whole genome sequence has increased recently with rapid progression of next-generation sequencing (NGS) technologies. However, storing raw sequence reads to perform large-scale genome analysis pose hardware challenges. Despite advancement in genome analytic platforms, efficien...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6276166/ https://www.ncbi.nlm.nih.gov/pubmed/30509173 http://dx.doi.org/10.1186/s12859-018-2499-1 |
_version_ | 1783377958775291904 |
---|---|
author | Sona, Peter Hong, Jong Hui Lee, Sunho Kim, Byong Joon Hong, Woon-Young Jung, Jongcheol Kim, Han-Na Kim, Hyung-Lae Christopher, David Herviou, Laurent Im, Young Hwan Lee, Kwee-Yum Kim, Tae Soon Jung, Jongsun |
author_facet | Sona, Peter Hong, Jong Hui Lee, Sunho Kim, Byong Joon Hong, Woon-Young Jung, Jongcheol Kim, Han-Na Kim, Hyung-Lae Christopher, David Herviou, Laurent Im, Young Hwan Lee, Kwee-Yum Kim, Tae Soon Jung, Jongsun |
author_sort | Sona, Peter |
collection | PubMed |
description | BACKGROUND: The use of whole genome sequence has increased recently with rapid progression of next-generation sequencing (NGS) technologies. However, storing raw sequence reads to perform large-scale genome analysis pose hardware challenges. Despite advancement in genome analytic platforms, efficient approaches remain relevant especially as applied to the human genome. In this study, an Integrated Genome Sizing (IGS) approach is adopted to speed up multiple whole genome analysis in high-performance computing (HPC) environment. The approach splits a genome (GRCh37) into 630 chunks (fragments) wherein multiple chunks can simultaneously be parallelized for sequence analyses across cohorts. RESULTS: IGS was integrated on Maha-Fs (HPC) system, to provide the parallelization required to analyze 2504 whole genomes. Using a single reference pilot genome, NA12878, we compared the NGS process time between Maha-Fs (NFS SATA hard disk drive) and SGI-UV300 (solid state drive memory). It was observed that SGI-UV300 was faster, having 32.5 mins of process time, while that of the Maha-Fs was 55.2 mins. CONCLUSIONS: The implementation of IGS can leverage the ability of HPC systems to analyze multiple genomes simultaneously. We believe this approach will accelerate research advancement in personalized genomic medicine. Our method is comparable to the fastest methods for sequence alignment. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2499-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6276166 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-62761662018-12-06 Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis Sona, Peter Hong, Jong Hui Lee, Sunho Kim, Byong Joon Hong, Woon-Young Jung, Jongcheol Kim, Han-Na Kim, Hyung-Lae Christopher, David Herviou, Laurent Im, Young Hwan Lee, Kwee-Yum Kim, Tae Soon Jung, Jongsun BMC Bioinformatics Methodology Article BACKGROUND: The use of whole genome sequence has increased recently with rapid progression of next-generation sequencing (NGS) technologies. However, storing raw sequence reads to perform large-scale genome analysis pose hardware challenges. Despite advancement in genome analytic platforms, efficient approaches remain relevant especially as applied to the human genome. In this study, an Integrated Genome Sizing (IGS) approach is adopted to speed up multiple whole genome analysis in high-performance computing (HPC) environment. The approach splits a genome (GRCh37) into 630 chunks (fragments) wherein multiple chunks can simultaneously be parallelized for sequence analyses across cohorts. RESULTS: IGS was integrated on Maha-Fs (HPC) system, to provide the parallelization required to analyze 2504 whole genomes. Using a single reference pilot genome, NA12878, we compared the NGS process time between Maha-Fs (NFS SATA hard disk drive) and SGI-UV300 (solid state drive memory). It was observed that SGI-UV300 was faster, having 32.5 mins of process time, while that of the Maha-Fs was 55.2 mins. CONCLUSIONS: The implementation of IGS can leverage the ability of HPC systems to analyze multiple genomes simultaneously. We believe this approach will accelerate research advancement in personalized genomic medicine. Our method is comparable to the fastest methods for sequence alignment. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2499-1) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-03 /pmc/articles/PMC6276166/ /pubmed/30509173 http://dx.doi.org/10.1186/s12859-018-2499-1 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Sona, Peter Hong, Jong Hui Lee, Sunho Kim, Byong Joon Hong, Woon-Young Jung, Jongcheol Kim, Han-Na Kim, Hyung-Lae Christopher, David Herviou, Laurent Im, Young Hwan Lee, Kwee-Yum Kim, Tae Soon Jung, Jongsun Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis |
title | Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis |
title_full | Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis |
title_fullStr | Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis |
title_full_unstemmed | Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis |
title_short | Integrated genome sizing (IGS) approach for the parallelization of whole genome analysis |
title_sort | integrated genome sizing (igs) approach for the parallelization of whole genome analysis |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6276166/ https://www.ncbi.nlm.nih.gov/pubmed/30509173 http://dx.doi.org/10.1186/s12859-018-2499-1 |
work_keys_str_mv | AT sonapeter integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT hongjonghui integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT leesunho integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT kimbyongjoon integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT hongwoonyoung integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT jungjongcheol integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT kimhanna integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT kimhyunglae integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT christopherdavid integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT hervioulaurent integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT imyounghwan integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT leekweeyum integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT kimtaesoon integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis AT jungjongsun integratedgenomesizingigsapproachfortheparallelizationofwholegenomeanalysis |