Cargando…

R-Gada: a fast and flexible pipeline for copy number analysis in association studies

BACKGROUND: Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clea...

Descripción completa

Detalles Bibliográficos
Autores principales:	Pique-Regi, Roger, Cáceres, Alejandro, González, Juan R
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2915992/ https://www.ncbi.nlm.nih.gov/pubmed/20637081 http://dx.doi.org/10.1186/1471-2105-11-380

_version_	1782184985681199104
author	Pique-Regi, Roger Cáceres, Alejandro González, Juan R
author_facet	Pique-Regi, Roger Cáceres, Alejandro González, Juan R
author_sort	Pique-Regi, Roger
collection	PubMed
description	BACKGROUND: Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clear association. RESULTS: Here we present a new R package, that integrates: (i) data import from most common formats of Affymetrix, Illumina and aCGH arrays; (ii) a fast and accurate segmentation algorithm to call CNVs based on Genome Alteration Detection Analysis (GADA); and (iii) functions for displaying and exporting the Copy Number calls, identification of recurrent CNVs, multivariate analysis of population structure, and tools for performing association studies. Using a large dataset containing 270 HapMap individuals (Affymetrix Human SNP Array 6.0 Sample Dataset) we demonstrate a flexible pipeline implemented with the package. It requires less than one minute per sample (3 million probe arrays) on a single core computer, and provides a flexible parallelization for very large datasets. Case-control data were generated from the HapMap dataset to demonstrate a GWAS analysis. CONCLUSIONS: The package provides the tools for creating a complete integrated pipeline from data normalization to statistical association. It can effciently handle a massive volume of data consisting of millions of genetic markers and hundreds or thousands of samples with very accurate results.
format	Text
id	pubmed-2915992
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-29159922010-08-11 R-Gada: a fast and flexible pipeline for copy number analysis in association studies Pique-Regi, Roger Cáceres, Alejandro González, Juan R BMC Bioinformatics Software BACKGROUND: Genome-wide association studies (GWAS) using Copy Number Variation (CNV) are becoming a central focus of genetic research. CNVs have successfully provided target genome regions for some disease conditions where simple genetic variation (i.e., SNPs) has previously failed to provide a clear association. RESULTS: Here we present a new R package, that integrates: (i) data import from most common formats of Affymetrix, Illumina and aCGH arrays; (ii) a fast and accurate segmentation algorithm to call CNVs based on Genome Alteration Detection Analysis (GADA); and (iii) functions for displaying and exporting the Copy Number calls, identification of recurrent CNVs, multivariate analysis of population structure, and tools for performing association studies. Using a large dataset containing 270 HapMap individuals (Affymetrix Human SNP Array 6.0 Sample Dataset) we demonstrate a flexible pipeline implemented with the package. It requires less than one minute per sample (3 million probe arrays) on a single core computer, and provides a flexible parallelization for very large datasets. Case-control data were generated from the HapMap dataset to demonstrate a GWAS analysis. CONCLUSIONS: The package provides the tools for creating a complete integrated pipeline from data normalization to statistical association. It can effciently handle a massive volume of data consisting of millions of genetic markers and hundreds or thousands of samples with very accurate results. BioMed Central 2010-07-16 /pmc/articles/PMC2915992/ /pubmed/20637081 http://dx.doi.org/10.1186/1471-2105-11-380 Text en Copyright ©2010 Pique-Regi et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Pique-Regi, Roger Cáceres, Alejandro González, Juan R R-Gada: a fast and flexible pipeline for copy number analysis in association studies
title	R-Gada: a fast and flexible pipeline for copy number analysis in association studies
title_full	R-Gada: a fast and flexible pipeline for copy number analysis in association studies
title_fullStr	R-Gada: a fast and flexible pipeline for copy number analysis in association studies
title_full_unstemmed	R-Gada: a fast and flexible pipeline for copy number analysis in association studies
title_short	R-Gada: a fast and flexible pipeline for copy number analysis in association studies
title_sort	r-gada: a fast and flexible pipeline for copy number analysis in association studies
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2915992/ https://www.ncbi.nlm.nih.gov/pubmed/20637081 http://dx.doi.org/10.1186/1471-2105-11-380
work_keys_str_mv	AT piqueregiroger rgadaafastandflexiblepipelineforcopynumberanalysisinassociationstudies AT caceresalejandro rgadaafastandflexiblepipelineforcopynumberanalysisinassociationstudies AT gonzalezjuanr rgadaafastandflexiblepipelineforcopynumberanalysisinassociationstudies

R-Gada: a fast and flexible pipeline for copy number analysis in association studies

Ejemplares similares