Cargando…

ParallABEL: an R library for generalized parallelization of genome-wide association studies

BACKGROUND: Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduo...

Descripción completa

Detalles Bibliográficos
Autores principales: Sangket, Unitsa, Mahasirimongkol, Surakameth, Chantratita, Wasun, Tandayya, Pichaya, Aulchenko, Yurii S
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2879286/
https://www.ncbi.nlm.nih.gov/pubmed/20429914
http://dx.doi.org/10.1186/1471-2105-11-217
_version_ 1782181913719472128
author Sangket, Unitsa
Mahasirimongkol, Surakameth
Chantratita, Wasun
Tandayya, Pichaya
Aulchenko, Yurii S
author_facet Sangket, Unitsa
Mahasirimongkol, Surakameth
Chantratita, Wasun
Tandayya, Pichaya
Aulchenko, Yurii S
author_sort Sangket, Unitsa
collection PubMed
description BACKGROUND: Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files. RESULTS: Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors. CONCLUSIONS: Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL.
format Text
id pubmed-2879286
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28792862010-06-02 ParallABEL: an R library for generalized parallelization of genome-wide association studies Sangket, Unitsa Mahasirimongkol, Surakameth Chantratita, Wasun Tandayya, Pichaya Aulchenko, Yurii S BMC Bioinformatics Software BACKGROUND: Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files. RESULTS: Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors. CONCLUSIONS: Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL. BioMed Central 2010-04-29 /pmc/articles/PMC2879286/ /pubmed/20429914 http://dx.doi.org/10.1186/1471-2105-11-217 Text en Copyright ©2010 Sangket et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Sangket, Unitsa
Mahasirimongkol, Surakameth
Chantratita, Wasun
Tandayya, Pichaya
Aulchenko, Yurii S
ParallABEL: an R library for generalized parallelization of genome-wide association studies
title ParallABEL: an R library for generalized parallelization of genome-wide association studies
title_full ParallABEL: an R library for generalized parallelization of genome-wide association studies
title_fullStr ParallABEL: an R library for generalized parallelization of genome-wide association studies
title_full_unstemmed ParallABEL: an R library for generalized parallelization of genome-wide association studies
title_short ParallABEL: an R library for generalized parallelization of genome-wide association studies
title_sort parallabel: an r library for generalized parallelization of genome-wide association studies
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2879286/
https://www.ncbi.nlm.nih.gov/pubmed/20429914
http://dx.doi.org/10.1186/1471-2105-11-217
work_keys_str_mv AT sangketunitsa parallabelanrlibraryforgeneralizedparallelizationofgenomewideassociationstudies
AT mahasirimongkolsurakameth parallabelanrlibraryforgeneralizedparallelizationofgenomewideassociationstudies
AT chantratitawasun parallabelanrlibraryforgeneralizedparallelizationofgenomewideassociationstudies
AT tandayyapichaya parallabelanrlibraryforgeneralizedparallelizationofgenomewideassociationstudies
AT aulchenkoyuriis parallabelanrlibraryforgeneralizedparallelizationofgenomewideassociationstudies