Cargando…

snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data

Quality control of genomic data is an essential but complicated multi-step procedure, often requiring separate installation and expert familiarity with a combination of different bioinformatics tools. Software incompatibilities, and inconsistencies across computing environments, are recurrent challe...

Descripción completa

Detalles Bibliográficos
Autores principales:	Vasilopoulou, Christina, Wingfield, Benjamin, Morris, Andrew P., Duddy, William
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	F1000 Research Limited 2021
Materias:	Software Tool Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637247/ https://www.ncbi.nlm.nih.gov/pubmed/34900230 http://dx.doi.org/10.12688/f1000research.53821.2

_version_	1784608702081269760
author	Vasilopoulou, Christina Wingfield, Benjamin Morris, Andrew P. Duddy, William
author_facet	Vasilopoulou, Christina Wingfield, Benjamin Morris, Andrew P. Duddy, William
author_sort	Vasilopoulou, Christina
collection	PubMed
description	Quality control of genomic data is an essential but complicated multi-step procedure, often requiring separate installation and expert familiarity with a combination of different bioinformatics tools. Software incompatibilities, and inconsistencies across computing environments, are recurrent challenges, leading to poor reproducibility. Existing semi-automated or automated solutions lack comprehensive quality checks, flexible workflow architecture, and user control. To address these challenges, we have developed snpQT: a scalable, stand-alone software pipeline using nextflow and BioContainers, for comprehensive, reproducible and interactive quality control of human genomic data. snpQT offers some 36 discrete quality filters or correction steps in a complete standardised pipeline, producing graphical reports to demonstrate the state of data before and after each quality control procedure. This includes human genome build conversion, population stratification against data from the 1,000 Genomes Project, automated population outlier removal, and built-in imputation with its own pre- and post- quality controls. Common input formats are used, and a synthetic dataset and comprehensive online tutorial are provided for testing, educational purposes, and demonstration. The snpQT pipeline is designed to run with minimal user input and coding experience; quality control steps are implemented with numerous user-modifiable thresholds, and workflows can be flexibly combined in custom combinations. snpQT is open source and freely available at https://github.com/nebfield/snpQT. A comprehensive online tutorial and installation guide is provided through to GWAS (https://snpqt.readthedocs.io/en/latest/), introducing snpQT using a synthetic demonstration dataset and a real-world Amyotrophic Lateral Sclerosis SNP-array dataset.
format	Online Article Text
id	pubmed-8637247
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	F1000 Research Limited
record_format	MEDLINE/PubMed
spelling	pubmed-86372472021-12-09 snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data Vasilopoulou, Christina Wingfield, Benjamin Morris, Andrew P. Duddy, William F1000Res Software Tool Article Quality control of genomic data is an essential but complicated multi-step procedure, often requiring separate installation and expert familiarity with a combination of different bioinformatics tools. Software incompatibilities, and inconsistencies across computing environments, are recurrent challenges, leading to poor reproducibility. Existing semi-automated or automated solutions lack comprehensive quality checks, flexible workflow architecture, and user control. To address these challenges, we have developed snpQT: a scalable, stand-alone software pipeline using nextflow and BioContainers, for comprehensive, reproducible and interactive quality control of human genomic data. snpQT offers some 36 discrete quality filters or correction steps in a complete standardised pipeline, producing graphical reports to demonstrate the state of data before and after each quality control procedure. This includes human genome build conversion, population stratification against data from the 1,000 Genomes Project, automated population outlier removal, and built-in imputation with its own pre- and post- quality controls. Common input formats are used, and a synthetic dataset and comprehensive online tutorial are provided for testing, educational purposes, and demonstration. The snpQT pipeline is designed to run with minimal user input and coding experience; quality control steps are implemented with numerous user-modifiable thresholds, and workflows can be flexibly combined in custom combinations. snpQT is open source and freely available at https://github.com/nebfield/snpQT. A comprehensive online tutorial and installation guide is provided through to GWAS (https://snpqt.readthedocs.io/en/latest/), introducing snpQT using a synthetic demonstration dataset and a real-world Amyotrophic Lateral Sclerosis SNP-array dataset. F1000 Research Limited 2021-11-29 /pmc/articles/PMC8637247/ /pubmed/34900230 http://dx.doi.org/10.12688/f1000research.53821.2 Text en Copyright: © 2021 Vasilopoulou C et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Tool Article Vasilopoulou, Christina Wingfield, Benjamin Morris, Andrew P. Duddy, William snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data
title	snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data
title_full	snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data
title_fullStr	snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data
title_full_unstemmed	snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data
title_short	snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data
title_sort	snpqt: flexible, reproducible, and comprehensive quality control and imputation of genomic data
topic	Software Tool Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8637247/ https://www.ncbi.nlm.nih.gov/pubmed/34900230 http://dx.doi.org/10.12688/f1000research.53821.2
work_keys_str_mv	AT vasilopoulouchristina snpqtflexiblereproducibleandcomprehensivequalitycontrolandimputationofgenomicdata AT wingfieldbenjamin snpqtflexiblereproducibleandcomprehensivequalitycontrolandimputationofgenomicdata AT morrisandrewp snpqtflexiblereproducibleandcomprehensivequalitycontrolandimputationofgenomicdata AT duddywilliam snpqtflexiblereproducibleandcomprehensivequalitycontrolandimputationofgenomicdata

snpQT: flexible, reproducible, and comprehensive quality control and imputation of genomic data

Ejemplares similares