Cargando…

A Primer on High-Throughput Computing for Genomic Selection

High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Xiao-Lin, Beissinger, Timothy M., Bauck, Stewart, Woodward, Brent, Rosa, Guilherme J. M., Weigel, Kent A., Gatti, Natalia de Leon, Gianola, Daniel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Research Foundation 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268564/
https://www.ncbi.nlm.nih.gov/pubmed/22303303
http://dx.doi.org/10.3389/fgene.2011.00004
_version_ 1782222379846467584
author Wu, Xiao-Lin
Beissinger, Timothy M.
Bauck, Stewart
Woodward, Brent
Rosa, Guilherme J. M.
Weigel, Kent A.
Gatti, Natalia de Leon
Gianola, Daniel
author_facet Wu, Xiao-Lin
Beissinger, Timothy M.
Bauck, Stewart
Woodward, Brent
Rosa, Guilherme J. M.
Weigel, Kent A.
Gatti, Natalia de Leon
Gianola, Daniel
author_sort Wu, Xiao-Lin
collection PubMed
description High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans.
format Online
Article
Text
id pubmed-3268564
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Frontiers Research Foundation
record_format MEDLINE/PubMed
spelling pubmed-32685642012-02-02 A Primer on High-Throughput Computing for Genomic Selection Wu, Xiao-Lin Beissinger, Timothy M. Bauck, Stewart Woodward, Brent Rosa, Guilherme J. M. Weigel, Kent A. Gatti, Natalia de Leon Gianola, Daniel Front Genet Genetics High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. Frontiers Research Foundation 2011-02-24 /pmc/articles/PMC3268564/ /pubmed/22303303 http://dx.doi.org/10.3389/fgene.2011.00004 Text en Copyright © 2011 Wu, Beissinger, Bauck, Woodward, Rosa, Weigel, de Leon Gatti and Gianola. http://www.frontiersin.org/licenseagreement This is an open-access article subject to an exclusive license agreement between the authors and Frontiers Media SA, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.
spellingShingle Genetics
Wu, Xiao-Lin
Beissinger, Timothy M.
Bauck, Stewart
Woodward, Brent
Rosa, Guilherme J. M.
Weigel, Kent A.
Gatti, Natalia de Leon
Gianola, Daniel
A Primer on High-Throughput Computing for Genomic Selection
title A Primer on High-Throughput Computing for Genomic Selection
title_full A Primer on High-Throughput Computing for Genomic Selection
title_fullStr A Primer on High-Throughput Computing for Genomic Selection
title_full_unstemmed A Primer on High-Throughput Computing for Genomic Selection
title_short A Primer on High-Throughput Computing for Genomic Selection
title_sort primer on high-throughput computing for genomic selection
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268564/
https://www.ncbi.nlm.nih.gov/pubmed/22303303
http://dx.doi.org/10.3389/fgene.2011.00004
work_keys_str_mv AT wuxiaolin aprimeronhighthroughputcomputingforgenomicselection
AT beissingertimothym aprimeronhighthroughputcomputingforgenomicselection
AT bauckstewart aprimeronhighthroughputcomputingforgenomicselection
AT woodwardbrent aprimeronhighthroughputcomputingforgenomicselection
AT rosaguilhermejm aprimeronhighthroughputcomputingforgenomicselection
AT weigelkenta aprimeronhighthroughputcomputingforgenomicselection
AT gattinataliadeleon aprimeronhighthroughputcomputingforgenomicselection
AT gianoladaniel aprimeronhighthroughputcomputingforgenomicselection
AT wuxiaolin primeronhighthroughputcomputingforgenomicselection
AT beissingertimothym primeronhighthroughputcomputingforgenomicselection
AT bauckstewart primeronhighthroughputcomputingforgenomicselection
AT woodwardbrent primeronhighthroughputcomputingforgenomicselection
AT rosaguilhermejm primeronhighthroughputcomputingforgenomicselection
AT weigelkenta primeronhighthroughputcomputingforgenomicselection
AT gattinataliadeleon primeronhighthroughputcomputingforgenomicselection
AT gianoladaniel primeronhighthroughputcomputingforgenomicselection