Cargando…

GenoGAM 2.0: scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes

BACKGROUND: GenoGAM (Genome-wide generalized additive models) is a powerful statistical modeling tool for the analysis of ChIP-Seq data with flexible factorial design experiments. However large runtime and memory requirements of its current implementation prohibit its application to gigabase-scale g...

Descripción completa

Detalles Bibliográficos
Autores principales:	Stricker, Georg, Galinier, Mathilde, Gagneur, Julien
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6020310/ https://www.ncbi.nlm.nih.gov/pubmed/29945559 http://dx.doi.org/10.1186/s12859-018-2238-7

_version_	1783335268241113088
author	Stricker, Georg Galinier, Mathilde Gagneur, Julien
author_facet	Stricker, Georg Galinier, Mathilde Gagneur, Julien
author_sort	Stricker, Georg
collection	PubMed
description	BACKGROUND: GenoGAM (Genome-wide generalized additive models) is a powerful statistical modeling tool for the analysis of ChIP-Seq data with flexible factorial design experiments. However large runtime and memory requirements of its current implementation prohibit its application to gigabase-scale genomes such as mammalian genomes. RESULTS: Here we present GenoGAM 2.0, a scalable and efficient implementation that is 2 to 3 orders of magnitude faster than the previous version. This is achieved by exploiting the sparsity of the model using the SuperLU direct solver for parameter fitting, and sparse Cholesky factorization together with the sparse inverse subset algorithm for computing standard errors. Furthermore the HDF5 library is employed to store data efficiently on hard drive, reducing memory footprint while keeping I/O low. Whole-genome fits for human ChIP-seq datasets (ca. 300 million parameters) could be obtained in less than 9 hours on a standard 60-core server. GenoGAM 2.0 is implemented as an open source R package and currently available on GitHub. A Bioconductor release of the new version is in preparation. CONCLUSIONS: We have vastly improved the performance of the GenoGAM framework, opening up its application to all types of organisms. Moreover, our algorithmic improvements for fitting large GAMs could be of interest to the statistical community beyond the genomics field.
format	Online Article Text
id	pubmed-6020310
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-60203102018-07-06 GenoGAM 2.0: scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes Stricker, Georg Galinier, Mathilde Gagneur, Julien BMC Bioinformatics Software BACKGROUND: GenoGAM (Genome-wide generalized additive models) is a powerful statistical modeling tool for the analysis of ChIP-Seq data with flexible factorial design experiments. However large runtime and memory requirements of its current implementation prohibit its application to gigabase-scale genomes such as mammalian genomes. RESULTS: Here we present GenoGAM 2.0, a scalable and efficient implementation that is 2 to 3 orders of magnitude faster than the previous version. This is achieved by exploiting the sparsity of the model using the SuperLU direct solver for parameter fitting, and sparse Cholesky factorization together with the sparse inverse subset algorithm for computing standard errors. Furthermore the HDF5 library is employed to store data efficiently on hard drive, reducing memory footprint while keeping I/O low. Whole-genome fits for human ChIP-seq datasets (ca. 300 million parameters) could be obtained in less than 9 hours on a standard 60-core server. GenoGAM 2.0 is implemented as an open source R package and currently available on GitHub. A Bioconductor release of the new version is in preparation. CONCLUSIONS: We have vastly improved the performance of the GenoGAM framework, opening up its application to all types of organisms. Moreover, our algorithmic improvements for fitting large GAMs could be of interest to the statistical community beyond the genomics field. BioMed Central 2018-06-27 /pmc/articles/PMC6020310/ /pubmed/29945559 http://dx.doi.org/10.1186/s12859-018-2238-7 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Software Stricker, Georg Galinier, Mathilde Gagneur, Julien GenoGAM 2.0: scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes
title	GenoGAM 2.0: scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes
title_full	GenoGAM 2.0: scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes
title_fullStr	GenoGAM 2.0: scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes
title_full_unstemmed	GenoGAM 2.0: scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes
title_short	GenoGAM 2.0: scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes
title_sort	genogam 2.0: scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6020310/ https://www.ncbi.nlm.nih.gov/pubmed/29945559 http://dx.doi.org/10.1186/s12859-018-2238-7
work_keys_str_mv	AT strickergeorg genogam20scalableandefficientimplementationofgenomewidegeneralizedadditivemodelsforgigabasescalegenomes AT galiniermathilde genogam20scalableandefficientimplementationofgenomewidegeneralizedadditivemodelsforgigabasescalegenomes AT gagneurjulien genogam20scalableandefficientimplementationofgenomewidegeneralizedadditivemodelsforgigabasescalegenomes

GenoGAM 2.0: scalable and efficient implementation of genome-wide generalized additive models for gigabase-scale genomes

Ejemplares similares