Cargando…

prewas: data pre-processing for more informative bacterial GWAS

While variant identification pipelines are becoming increasingly standardized, less attention has been paid to the pre-processing of variants prior to their use in bacterial genome-wide association studies (bGWAS). Three nuances of variant pre-processing that impact downstream identification of gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Saund, Katie, Lapp, Zena, Thiede, Stephanie N., Pirani, Ali, Snitkin, Evan S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7371116/
https://www.ncbi.nlm.nih.gov/pubmed/32310745
http://dx.doi.org/10.1099/mgen.0.000368
_version_ 1783561081929596928
author Saund, Katie
Lapp, Zena
Thiede, Stephanie N.
Pirani, Ali
Snitkin, Evan S.
author_facet Saund, Katie
Lapp, Zena
Thiede, Stephanie N.
Pirani, Ali
Snitkin, Evan S.
author_sort Saund, Katie
collection PubMed
description While variant identification pipelines are becoming increasingly standardized, less attention has been paid to the pre-processing of variants prior to their use in bacterial genome-wide association studies (bGWAS). Three nuances of variant pre-processing that impact downstream identification of genetic associations include the separation of variants at multiallelic sites, separation of variants in overlapping genes, and referencing of variants relative to ancestral alleles. Here we demonstrate the importance of these variant pre-processing steps on diverse bacterial genomic datasets and present prewas, an R package, that standardizes the pre-processing of multiallelic sites, overlapping genes, and reference alleles before bGWAS. This package facilitates improved reproducibility and interpretability of bGWAS results. prewas enables users to extract maximal information from bGWAS by implementing multi-line representation for multiallelic sites and variants in overlapping genes. prewas outputs a binary SNP matrix that can be used for SNP-based bGWAS and will prevent the masking of minor alleles during bGWAS analysis. The optional binary gene matrix output can be used for gene-based bGWAS, which will enable users to maximize the power and evolutionary interpretability of their bGWAS studies. prewas is available for download from GitHub.
format Online
Article
Text
id pubmed-7371116
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-73711162020-07-21 prewas: data pre-processing for more informative bacterial GWAS Saund, Katie Lapp, Zena Thiede, Stephanie N. Pirani, Ali Snitkin, Evan S. Microb Genom Method While variant identification pipelines are becoming increasingly standardized, less attention has been paid to the pre-processing of variants prior to their use in bacterial genome-wide association studies (bGWAS). Three nuances of variant pre-processing that impact downstream identification of genetic associations include the separation of variants at multiallelic sites, separation of variants in overlapping genes, and referencing of variants relative to ancestral alleles. Here we demonstrate the importance of these variant pre-processing steps on diverse bacterial genomic datasets and present prewas, an R package, that standardizes the pre-processing of multiallelic sites, overlapping genes, and reference alleles before bGWAS. This package facilitates improved reproducibility and interpretability of bGWAS results. prewas enables users to extract maximal information from bGWAS by implementing multi-line representation for multiallelic sites and variants in overlapping genes. prewas outputs a binary SNP matrix that can be used for SNP-based bGWAS and will prevent the masking of minor alleles during bGWAS analysis. The optional binary gene matrix output can be used for gene-based bGWAS, which will enable users to maximize the power and evolutionary interpretability of their bGWAS studies. prewas is available for download from GitHub. Microbiology Society 2020-04-20 /pmc/articles/PMC7371116/ /pubmed/32310745 http://dx.doi.org/10.1099/mgen.0.000368 Text en © 2020 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License.
spellingShingle Method
Saund, Katie
Lapp, Zena
Thiede, Stephanie N.
Pirani, Ali
Snitkin, Evan S.
prewas: data pre-processing for more informative bacterial GWAS
title prewas: data pre-processing for more informative bacterial GWAS
title_full prewas: data pre-processing for more informative bacterial GWAS
title_fullStr prewas: data pre-processing for more informative bacterial GWAS
title_full_unstemmed prewas: data pre-processing for more informative bacterial GWAS
title_short prewas: data pre-processing for more informative bacterial GWAS
title_sort prewas: data pre-processing for more informative bacterial gwas
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7371116/
https://www.ncbi.nlm.nih.gov/pubmed/32310745
http://dx.doi.org/10.1099/mgen.0.000368
work_keys_str_mv AT saundkatie prewasdatapreprocessingformoreinformativebacterialgwas
AT lappzena prewasdatapreprocessingformoreinformativebacterialgwas
AT thiedestephanien prewasdatapreprocessingformoreinformativebacterialgwas
AT piraniali prewasdatapreprocessingformoreinformativebacterialgwas
AT snitkinevans prewasdatapreprocessingformoreinformativebacterialgwas