Cargando…

Tracking the NGS revolution: managing life science research on shared high-performance computing clusters

BACKGROUND: Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other scienc...

Descripción completa

Detalles Bibliográficos
Autores principales: Dahlö, Martin, Scofield, Douglas G, Schaal, Wesley, Spjuth, Ola
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5928410/
https://www.ncbi.nlm.nih.gov/pubmed/29659792
http://dx.doi.org/10.1093/gigascience/giy028
_version_ 1783319240522072064
author Dahlö, Martin
Scofield, Douglas G
Schaal, Wesley
Spjuth, Ola
author_facet Dahlö, Martin
Scofield, Douglas G
Schaal, Wesley
Spjuth, Ola
author_sort Dahlö, Martin
collection PubMed
description BACKGROUND: Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences. RESULTS: The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat. CONCLUSIONS: Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases.
format Online
Article
Text
id pubmed-5928410
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-59284102018-05-04 Tracking the NGS revolution: managing life science research on shared high-performance computing clusters Dahlö, Martin Scofield, Douglas G Schaal, Wesley Spjuth, Ola Gigascience Research BACKGROUND: Next-generation sequencing (NGS) has transformed the life sciences, and many research groups are newly dependent upon computer clusters to store and analyze large datasets. This creates challenges for e-infrastructures accustomed to hosting computationally mature research in other sciences. Using data gathered from our own clusters at UPPMAX computing center at Uppsala University, Sweden, where core hour usage of ∼800 NGS and ∼200 non-NGS projects is now similar, we compare and contrast the growth, administrative burden, and cluster usage of NGS projects with projects from other sciences. RESULTS: The number of NGS projects has grown rapidly since 2010, with growth driven by entry of new research groups. Storage used by NGS projects has grown more rapidly since 2013 and is now limited by disk capacity. NGS users submit nearly twice as many support tickets per user, and 11 more tools are installed each month for NGS projects than for non-NGS projects. We developed usage and efficiency metrics and show that computing jobs for NGS projects use more RAM than non-NGS projects, are more variable in core usage, and rarely span multiple nodes. NGS jobs use booked resources less efficiently for a variety of reasons. Active monitoring can improve this somewhat. CONCLUSIONS: Hosting NGS projects imposes a large administrative burden at UPPMAX due to large numbers of inexperienced users and diverse and rapidly evolving research areas. We provide a set of recommendations for e-infrastructures that host NGS research projects. We provide anonymized versions of our storage, job, and efficiency databases. Oxford University Press 2018-04-05 /pmc/articles/PMC5928410/ /pubmed/29659792 http://dx.doi.org/10.1093/gigascience/giy028 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Dahlö, Martin
Scofield, Douglas G
Schaal, Wesley
Spjuth, Ola
Tracking the NGS revolution: managing life science research on shared high-performance computing clusters
title Tracking the NGS revolution: managing life science research on shared high-performance computing clusters
title_full Tracking the NGS revolution: managing life science research on shared high-performance computing clusters
title_fullStr Tracking the NGS revolution: managing life science research on shared high-performance computing clusters
title_full_unstemmed Tracking the NGS revolution: managing life science research on shared high-performance computing clusters
title_short Tracking the NGS revolution: managing life science research on shared high-performance computing clusters
title_sort tracking the ngs revolution: managing life science research on shared high-performance computing clusters
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5928410/
https://www.ncbi.nlm.nih.gov/pubmed/29659792
http://dx.doi.org/10.1093/gigascience/giy028
work_keys_str_mv AT dahlomartin trackingthengsrevolutionmanaginglifescienceresearchonsharedhighperformancecomputingclusters
AT scofielddouglasg trackingthengsrevolutionmanaginglifescienceresearchonsharedhighperformancecomputingclusters
AT schaalwesley trackingthengsrevolutionmanaginglifescienceresearchonsharedhighperformancecomputingclusters
AT spjuthola trackingthengsrevolutionmanaginglifescienceresearchonsharedhighperformancecomputingclusters