Cargando…

pyGenClean: efficient tool for genetic data clean up before association testing

Summary: Genetic association studies making use of high-throughput genotyping arrays need to process large amounts of data in the order of millions of markers per experiment. The first step of any analysis with genotyping arrays is typically the conduct of a thorough data clean up and quality contro...

Descripción completa

Detalles Bibliográficos
Autores principales: Lemieux Perreault, Louis-Philippe, Provost, Sylvie, Legault, Marc-André, Barhdadi, Amina, Dubé, Marie-Pierre
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694635/
https://www.ncbi.nlm.nih.gov/pubmed/23652425
http://dx.doi.org/10.1093/bioinformatics/btt261
_version_ 1782274878040178688
author Lemieux Perreault, Louis-Philippe
Provost, Sylvie
Legault, Marc-André
Barhdadi, Amina
Dubé, Marie-Pierre
author_facet Lemieux Perreault, Louis-Philippe
Provost, Sylvie
Legault, Marc-André
Barhdadi, Amina
Dubé, Marie-Pierre
author_sort Lemieux Perreault, Louis-Philippe
collection PubMed
description Summary: Genetic association studies making use of high-throughput genotyping arrays need to process large amounts of data in the order of millions of markers per experiment. The first step of any analysis with genotyping arrays is typically the conduct of a thorough data clean up and quality control to remove poor quality genotypes and generate metrics to inform and select individuals for downstream statistical analysis. We have developed pyGenClean, a bioinformatics tool to facilitate and standardize the genetic data clean up pipeline with genotyping array data. In conjunction with a source batch-queuing system, the tool minimizes data manipulation errors, accelerates the completion of the data clean up process and provides informative plots and metrics to guide decision making for statistical analysis. Availability and implementation: pyGenClean is an open source Python 2.7 software and is freely available, along with documentation and examples, from http://www.statgen.org. Contact: louis-philippe.lemieux.perreault@umontreal.ca or marie-pierre.dube@statgen.org
format Online
Article
Text
id pubmed-3694635
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36946352013-06-27 pyGenClean: efficient tool for genetic data clean up before association testing Lemieux Perreault, Louis-Philippe Provost, Sylvie Legault, Marc-André Barhdadi, Amina Dubé, Marie-Pierre Bioinformatics Applications Notes Summary: Genetic association studies making use of high-throughput genotyping arrays need to process large amounts of data in the order of millions of markers per experiment. The first step of any analysis with genotyping arrays is typically the conduct of a thorough data clean up and quality control to remove poor quality genotypes and generate metrics to inform and select individuals for downstream statistical analysis. We have developed pyGenClean, a bioinformatics tool to facilitate and standardize the genetic data clean up pipeline with genotyping array data. In conjunction with a source batch-queuing system, the tool minimizes data manipulation errors, accelerates the completion of the data clean up process and provides informative plots and metrics to guide decision making for statistical analysis. Availability and implementation: pyGenClean is an open source Python 2.7 software and is freely available, along with documentation and examples, from http://www.statgen.org. Contact: louis-philippe.lemieux.perreault@umontreal.ca or marie-pierre.dube@statgen.org Oxford University Press 2013-07-01 2013-05-06 /pmc/articles/PMC3694635/ /pubmed/23652425 http://dx.doi.org/10.1093/bioinformatics/btt261 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Notes
Lemieux Perreault, Louis-Philippe
Provost, Sylvie
Legault, Marc-André
Barhdadi, Amina
Dubé, Marie-Pierre
pyGenClean: efficient tool for genetic data clean up before association testing
title pyGenClean: efficient tool for genetic data clean up before association testing
title_full pyGenClean: efficient tool for genetic data clean up before association testing
title_fullStr pyGenClean: efficient tool for genetic data clean up before association testing
title_full_unstemmed pyGenClean: efficient tool for genetic data clean up before association testing
title_short pyGenClean: efficient tool for genetic data clean up before association testing
title_sort pygenclean: efficient tool for genetic data clean up before association testing
topic Applications Notes
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694635/
https://www.ncbi.nlm.nih.gov/pubmed/23652425
http://dx.doi.org/10.1093/bioinformatics/btt261
work_keys_str_mv AT lemieuxperreaultlouisphilippe pygencleanefficienttoolforgeneticdatacleanupbeforeassociationtesting
AT provostsylvie pygencleanefficienttoolforgeneticdatacleanupbeforeassociationtesting
AT legaultmarcandre pygencleanefficienttoolforgeneticdatacleanupbeforeassociationtesting
AT barhdadiamina pygencleanefficienttoolforgeneticdatacleanupbeforeassociationtesting
AT dubemariepierre pygencleanefficienttoolforgeneticdatacleanupbeforeassociationtesting