Cargando…

Removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci

BACKGROUND: Genetic variation can alter transcriptional regulatory activity contributing to variation in complex traits and risk of disease, but identifying individual variants that affect regulatory activity has been challenging. Quantitative sequence-based experiments such as ChIP-seq and DNase-se...

Descripción completa

Detalles Bibliográficos
Autores principales: Buchkovich, Martin L., Eklund, Karl, Duan, Qing, Li, Yun, Mohlke, Karen L., Furey, Terrence S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4515314/
https://www.ncbi.nlm.nih.gov/pubmed/26210163
http://dx.doi.org/10.1186/s12920-015-0117-x
_version_ 1782382895672852480
author Buchkovich, Martin L.
Eklund, Karl
Duan, Qing
Li, Yun
Mohlke, Karen L.
Furey, Terrence S.
author_facet Buchkovich, Martin L.
Eklund, Karl
Duan, Qing
Li, Yun
Mohlke, Karen L.
Furey, Terrence S.
author_sort Buchkovich, Martin L.
collection PubMed
description BACKGROUND: Genetic variation can alter transcriptional regulatory activity contributing to variation in complex traits and risk of disease, but identifying individual variants that affect regulatory activity has been challenging. Quantitative sequence-based experiments such as ChIP-seq and DNase-seq can detect sites of allelic imbalance where alleles contribute disproportionately to the overall signal suggesting allelic differences in regulatory activity. METHODS: We created an allelic imbalance detection pipeline, AA-ALIGNER, to remove reference mapping biases influencing allelic imbalance detection and evaluate accuracy of allelic imbalance predictions in the absence of complete genotype data. Using the sequence aligner, GSNAP, and varying amounts of genotype information to remove mapping biases we investigated the accuracy of allelic imbalance detection (binomial test) in CREB1 ChIP-seq reads from the GM12878 cell line. Additionally we thoroughly evaluated the influence of experimental and analytical parameters on imbalance detection. RESULTS: Compared to imbalances identified using complete genotypes, using imputed partial sample genotypes, AA-ALIGNER detected >95 % of imbalances with >90 % accuracy. AA-ALIGNER performed nearly as well using common variants when genotypes were unknown. In contrast, predicting additional heterozygous sites and imbalances using the sequence data led to >50 % false positive rates. We evaluated effects of experimental data characteristics and key analytical parameter settings on imbalance detection. Overall, total base coverage and signal dispersion across the genome most affected our ability to detect imbalances, while parameters such as imbalance significance, imputation quality thresholds, and alignment mismatches had little effect. To assess the biological relevance of imbalance predictions, we used electrophoretic mobility shift assays to functionally test for predicted allelic differences in CREB1 binding in the GM12878 lymphoblast cell line. Six of nine tested variants exhibited allelic differences in binding. Two of these variants, rs2382818 and rs713875, are located within inflammatory bowel disease-associated loci. CONCLUSIONS: AA-ALIGNER accurately detects allelic imbalance in quantitative sequence data using partial genotypes or common variants filling a critical methodological gap in these analyses, as full genotypes are rarely available. Importantly, we demonstrate how experimental and analytical features impact imbalance detection providing guidance for similar future studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12920-015-0117-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4515314
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-45153142015-07-27 Removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci Buchkovich, Martin L. Eklund, Karl Duan, Qing Li, Yun Mohlke, Karen L. Furey, Terrence S. BMC Med Genomics Technical Advance BACKGROUND: Genetic variation can alter transcriptional regulatory activity contributing to variation in complex traits and risk of disease, but identifying individual variants that affect regulatory activity has been challenging. Quantitative sequence-based experiments such as ChIP-seq and DNase-seq can detect sites of allelic imbalance where alleles contribute disproportionately to the overall signal suggesting allelic differences in regulatory activity. METHODS: We created an allelic imbalance detection pipeline, AA-ALIGNER, to remove reference mapping biases influencing allelic imbalance detection and evaluate accuracy of allelic imbalance predictions in the absence of complete genotype data. Using the sequence aligner, GSNAP, and varying amounts of genotype information to remove mapping biases we investigated the accuracy of allelic imbalance detection (binomial test) in CREB1 ChIP-seq reads from the GM12878 cell line. Additionally we thoroughly evaluated the influence of experimental and analytical parameters on imbalance detection. RESULTS: Compared to imbalances identified using complete genotypes, using imputed partial sample genotypes, AA-ALIGNER detected >95 % of imbalances with >90 % accuracy. AA-ALIGNER performed nearly as well using common variants when genotypes were unknown. In contrast, predicting additional heterozygous sites and imbalances using the sequence data led to >50 % false positive rates. We evaluated effects of experimental data characteristics and key analytical parameter settings on imbalance detection. Overall, total base coverage and signal dispersion across the genome most affected our ability to detect imbalances, while parameters such as imbalance significance, imputation quality thresholds, and alignment mismatches had little effect. To assess the biological relevance of imbalance predictions, we used electrophoretic mobility shift assays to functionally test for predicted allelic differences in CREB1 binding in the GM12878 lymphoblast cell line. Six of nine tested variants exhibited allelic differences in binding. Two of these variants, rs2382818 and rs713875, are located within inflammatory bowel disease-associated loci. CONCLUSIONS: AA-ALIGNER accurately detects allelic imbalance in quantitative sequence data using partial genotypes or common variants filling a critical methodological gap in these analyses, as full genotypes are rarely available. Importantly, we demonstrate how experimental and analytical features impact imbalance detection providing guidance for similar future studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12920-015-0117-x) contains supplementary material, which is available to authorized users. BioMed Central 2015-07-26 /pmc/articles/PMC4515314/ /pubmed/26210163 http://dx.doi.org/10.1186/s12920-015-0117-x Text en © Buchkovich et al. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Advance
Buchkovich, Martin L.
Eklund, Karl
Duan, Qing
Li, Yun
Mohlke, Karen L.
Furey, Terrence S.
Removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci
title Removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci
title_full Removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci
title_fullStr Removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci
title_full_unstemmed Removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci
title_short Removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci
title_sort removing reference mapping biases using limited or no genotype data identifies allelic differences in protein binding at disease-associated loci
topic Technical Advance
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4515314/
https://www.ncbi.nlm.nih.gov/pubmed/26210163
http://dx.doi.org/10.1186/s12920-015-0117-x
work_keys_str_mv AT buchkovichmartinl removingreferencemappingbiasesusinglimitedornogenotypedataidentifiesallelicdifferencesinproteinbindingatdiseaseassociatedloci
AT eklundkarl removingreferencemappingbiasesusinglimitedornogenotypedataidentifiesallelicdifferencesinproteinbindingatdiseaseassociatedloci
AT duanqing removingreferencemappingbiasesusinglimitedornogenotypedataidentifiesallelicdifferencesinproteinbindingatdiseaseassociatedloci
AT liyun removingreferencemappingbiasesusinglimitedornogenotypedataidentifiesallelicdifferencesinproteinbindingatdiseaseassociatedloci
AT mohlkekarenl removingreferencemappingbiasesusinglimitedornogenotypedataidentifiesallelicdifferencesinproteinbindingatdiseaseassociatedloci
AT fureyterrences removingreferencemappingbiasesusinglimitedornogenotypedataidentifiesallelicdifferencesinproteinbindingatdiseaseassociatedloci