Cargando…

Wrangling distributed computing for high-throughput environmental science: An introduction to HTCondor

Biologists and environmental scientists now routinely solve computational problems that were unimaginable a generation ago. Examples include processing geospatial data, analyzing -omics data, and running large-scale simulations. Conventional desktop computing cannot handle these tasks when they are...

Descripción completa

Detalles Bibliográficos
Autores principales: Erickson, Richard A., Fienen, Michael N., McCalla, S. Grace, Weiser, Emily L., Bower, Melvin L., Knudson, Jonathan M., Thain, Greg
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169842/
https://www.ncbi.nlm.nih.gov/pubmed/30281592
http://dx.doi.org/10.1371/journal.pcbi.1006468
_version_ 1783360567095853056
author Erickson, Richard A.
Fienen, Michael N.
McCalla, S. Grace
Weiser, Emily L.
Bower, Melvin L.
Knudson, Jonathan M.
Thain, Greg
author_facet Erickson, Richard A.
Fienen, Michael N.
McCalla, S. Grace
Weiser, Emily L.
Bower, Melvin L.
Knudson, Jonathan M.
Thain, Greg
author_sort Erickson, Richard A.
collection PubMed
description Biologists and environmental scientists now routinely solve computational problems that were unimaginable a generation ago. Examples include processing geospatial data, analyzing -omics data, and running large-scale simulations. Conventional desktop computing cannot handle these tasks when they are large, and high-performance computing is not always available nor the most appropriate solution for all computationally intense problems. High-throughput computing (HTC) is one method for handling computationally intense research. In contrast to high-performance computing, which uses a single "supercomputer," HTC can distribute tasks over many computers (e.g., idle desktop computers, dedicated servers, or cloud-based resources). HTC facilities exist at many academic and government institutes and are relatively easy to create from commodity hardware. Additionally, consortia such as Open Science Grid facilitate HTC, and commercial entities sell cloud-based solutions for researchers who lack HTC at their institution. We provide an introduction to HTC for biologists and environmental scientists. Our examples from biology and the environmental sciences use HTCondor, an open source HTC system.
format Online
Article
Text
id pubmed-6169842
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-61698422018-10-19 Wrangling distributed computing for high-throughput environmental science: An introduction to HTCondor Erickson, Richard A. Fienen, Michael N. McCalla, S. Grace Weiser, Emily L. Bower, Melvin L. Knudson, Jonathan M. Thain, Greg PLoS Comput Biol Education Biologists and environmental scientists now routinely solve computational problems that were unimaginable a generation ago. Examples include processing geospatial data, analyzing -omics data, and running large-scale simulations. Conventional desktop computing cannot handle these tasks when they are large, and high-performance computing is not always available nor the most appropriate solution for all computationally intense problems. High-throughput computing (HTC) is one method for handling computationally intense research. In contrast to high-performance computing, which uses a single "supercomputer," HTC can distribute tasks over many computers (e.g., idle desktop computers, dedicated servers, or cloud-based resources). HTC facilities exist at many academic and government institutes and are relatively easy to create from commodity hardware. Additionally, consortia such as Open Science Grid facilitate HTC, and commercial entities sell cloud-based solutions for researchers who lack HTC at their institution. We provide an introduction to HTC for biologists and environmental scientists. Our examples from biology and the environmental sciences use HTCondor, an open source HTC system. Public Library of Science 2018-10-03 /pmc/articles/PMC6169842/ /pubmed/30281592 http://dx.doi.org/10.1371/journal.pcbi.1006468 Text en https://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Education
Erickson, Richard A.
Fienen, Michael N.
McCalla, S. Grace
Weiser, Emily L.
Bower, Melvin L.
Knudson, Jonathan M.
Thain, Greg
Wrangling distributed computing for high-throughput environmental science: An introduction to HTCondor
title Wrangling distributed computing for high-throughput environmental science: An introduction to HTCondor
title_full Wrangling distributed computing for high-throughput environmental science: An introduction to HTCondor
title_fullStr Wrangling distributed computing for high-throughput environmental science: An introduction to HTCondor
title_full_unstemmed Wrangling distributed computing for high-throughput environmental science: An introduction to HTCondor
title_short Wrangling distributed computing for high-throughput environmental science: An introduction to HTCondor
title_sort wrangling distributed computing for high-throughput environmental science: an introduction to htcondor
topic Education
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169842/
https://www.ncbi.nlm.nih.gov/pubmed/30281592
http://dx.doi.org/10.1371/journal.pcbi.1006468
work_keys_str_mv AT ericksonricharda wranglingdistributedcomputingforhighthroughputenvironmentalscienceanintroductiontohtcondor
AT fienenmichaeln wranglingdistributedcomputingforhighthroughputenvironmentalscienceanintroductiontohtcondor
AT mccallasgrace wranglingdistributedcomputingforhighthroughputenvironmentalscienceanintroductiontohtcondor
AT weiseremilyl wranglingdistributedcomputingforhighthroughputenvironmentalscienceanintroductiontohtcondor
AT bowermelvinl wranglingdistributedcomputingforhighthroughputenvironmentalscienceanintroductiontohtcondor
AT knudsonjonathanm wranglingdistributedcomputingforhighthroughputenvironmentalscienceanintroductiontohtcondor
AT thaingreg wranglingdistributedcomputingforhighthroughputenvironmentalscienceanintroductiontohtcondor