Cargando…
A Primer on High-Throughput Computing for Genomic Selection
High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Research Foundation
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268564/ https://www.ncbi.nlm.nih.gov/pubmed/22303303 http://dx.doi.org/10.3389/fgene.2011.00004 |
_version_ | 1782222379846467584 |
---|---|
author | Wu, Xiao-Lin Beissinger, Timothy M. Bauck, Stewart Woodward, Brent Rosa, Guilherme J. M. Weigel, Kent A. Gatti, Natalia de Leon Gianola, Daniel |
author_facet | Wu, Xiao-Lin Beissinger, Timothy M. Bauck, Stewart Woodward, Brent Rosa, Guilherme J. M. Weigel, Kent A. Gatti, Natalia de Leon Gianola, Daniel |
author_sort | Wu, Xiao-Lin |
collection | PubMed |
description | High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. |
format | Online Article Text |
id | pubmed-3268564 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | Frontiers Research Foundation |
record_format | MEDLINE/PubMed |
spelling | pubmed-32685642012-02-02 A Primer on High-Throughput Computing for Genomic Selection Wu, Xiao-Lin Beissinger, Timothy M. Bauck, Stewart Woodward, Brent Rosa, Guilherme J. M. Weigel, Kent A. Gatti, Natalia de Leon Gianola, Daniel Front Genet Genetics High-throughput computing (HTC) uses computer clusters to solve advanced computational problems, with the goal of accomplishing high-throughput over relatively long periods of time. In genomic selection, for example, a set of markers covering the entire genome is used to train a model based on known data, and the resulting model is used to predict the genetic merit of selection candidates. Sophisticated models are very computationally demanding and, with several traits to be evaluated sequentially, computing time is long, and output is low. In this paper, we present scenarios and basic principles of how HTC can be used in genomic selection, implemented using various techniques from simple batch processing to pipelining in distributed computer clusters. Various scripting languages, such as shell scripting, Perl, and R, are also very useful to devise pipelines. By pipelining, we can reduce total computing time and consequently increase throughput. In comparison to the traditional data processing pipeline residing on the central processors, performing general-purpose computation on a graphics processing unit provide a new-generation approach to massive parallel computing in genomic selection. While the concept of HTC may still be new to many researchers in animal breeding, plant breeding, and genetics, HTC infrastructures have already been built in many institutions, such as the University of Wisconsin–Madison, which can be leveraged for genomic selection, in terms of central processing unit capacity, network connectivity, storage availability, and middleware connectivity. Exploring existing HTC infrastructures as well as general-purpose computing environments will further expand our capability to meet increasing computing demands posed by unprecedented genomic data that we have today. We anticipate that HTC will impact genomic selection via better statistical models, faster solutions, and more competitive products (e.g., from design of marker panels to realized genetic gain). Eventually, HTC may change our view of data analysis as well as decision-making in the post-genomic era of selection programs in animals and plants, or in the study of complex diseases in humans. Frontiers Research Foundation 2011-02-24 /pmc/articles/PMC3268564/ /pubmed/22303303 http://dx.doi.org/10.3389/fgene.2011.00004 Text en Copyright © 2011 Wu, Beissinger, Bauck, Woodward, Rosa, Weigel, de Leon Gatti and Gianola. http://www.frontiersin.org/licenseagreement This is an open-access article subject to an exclusive license agreement between the authors and Frontiers Media SA, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited. |
spellingShingle | Genetics Wu, Xiao-Lin Beissinger, Timothy M. Bauck, Stewart Woodward, Brent Rosa, Guilherme J. M. Weigel, Kent A. Gatti, Natalia de Leon Gianola, Daniel A Primer on High-Throughput Computing for Genomic Selection |
title | A Primer on High-Throughput Computing for Genomic Selection |
title_full | A Primer on High-Throughput Computing for Genomic Selection |
title_fullStr | A Primer on High-Throughput Computing for Genomic Selection |
title_full_unstemmed | A Primer on High-Throughput Computing for Genomic Selection |
title_short | A Primer on High-Throughput Computing for Genomic Selection |
title_sort | primer on high-throughput computing for genomic selection |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3268564/ https://www.ncbi.nlm.nih.gov/pubmed/22303303 http://dx.doi.org/10.3389/fgene.2011.00004 |
work_keys_str_mv | AT wuxiaolin aprimeronhighthroughputcomputingforgenomicselection AT beissingertimothym aprimeronhighthroughputcomputingforgenomicselection AT bauckstewart aprimeronhighthroughputcomputingforgenomicselection AT woodwardbrent aprimeronhighthroughputcomputingforgenomicselection AT rosaguilhermejm aprimeronhighthroughputcomputingforgenomicselection AT weigelkenta aprimeronhighthroughputcomputingforgenomicselection AT gattinataliadeleon aprimeronhighthroughputcomputingforgenomicselection AT gianoladaniel aprimeronhighthroughputcomputingforgenomicselection AT wuxiaolin primeronhighthroughputcomputingforgenomicselection AT beissingertimothym primeronhighthroughputcomputingforgenomicselection AT bauckstewart primeronhighthroughputcomputingforgenomicselection AT woodwardbrent primeronhighthroughputcomputingforgenomicselection AT rosaguilhermejm primeronhighthroughputcomputingforgenomicselection AT weigelkenta primeronhighthroughputcomputingforgenomicselection AT gattinataliadeleon primeronhighthroughputcomputingforgenomicselection AT gianoladaniel primeronhighthroughputcomputingforgenomicselection |