Cargando…

A hidden Markov approach for ascertaining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium

BACKGROUND: Allelic specific expression (ASE) increases our understanding of the genetic control of gene expression and its links to phenotypic variation. ASE testing is implemented through binomial or beta-binomial tests of sequence read counts of alternative alleles at a cSNP of interest in hetero...

Descripción completa

Detalles Bibliográficos
Autores principales: Steibel, Juan P, Wang, Heng, Zhong, Ping-Shou
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351697/
https://www.ncbi.nlm.nih.gov/pubmed/25887316
http://dx.doi.org/10.1186/s12859-015-0479-2
_version_ 1782360360651587584
author Steibel, Juan P
Wang, Heng
Zhong, Ping-Shou
author_facet Steibel, Juan P
Wang, Heng
Zhong, Ping-Shou
author_sort Steibel, Juan P
collection PubMed
description BACKGROUND: Allelic specific expression (ASE) increases our understanding of the genetic control of gene expression and its links to phenotypic variation. ASE testing is implemented through binomial or beta-binomial tests of sequence read counts of alternative alleles at a cSNP of interest in heterozygous individuals. This requires prior ascertainment of the cSNP genotypes for all individuals. To meet the needs, we propose hidden Markov methods to call SNPs from next generation RNA sequence data when ASE possibly exists. RESULTS: We propose two hidden Markov models (HMMs), HMM-ASE and HMM-NASE that consider or do not consider ASE, respectively, in order to improve genotyping accuracy. Both HMMs have the advantages of calling the genotypes of several SNPs simultaneously and allow mapping error which, respectively, utilize the dependence among SNPs and correct the bias due to mapping error. In addition, HMM-ASE exploits ASE information to further improve genotype accuracy when the ASE is likely to be present. Simulation results indicate that the HMMs proposed demonstrate a very good prediction accuracy in terms of controlling both the false discovery rate (FDR) and the false negative rate (FNR). When ASE is present, the HMM-ASE had a lower FNR than HMM-NASE, while both can control the false discovery rate (FDR) at a similar level. By exploiting linkage disequilibrium (LD), a real data application demonstrate that the proposed methods have better sensitivity and similar FDR in calling heterozygous SNPs than the VarScan method. Sensitivity and FDR are similar to that of the BCFtools and Beagle methods. The resulting genotypes show good properties for the estimation of the genetic parameters and ASE ratios. CONCLUSIONS: We introduce HMMs, which are able to exploit LD and account for the ASE and mapping errors, to simultaneously call SNPs from the next generation RNA sequence data. The method introduced can reliably call for cSNP genotypes even in the presence of ASE and under low sequencing coverage. As a byproduct, the proposed method is able to provide predictions of ASE ratios for the heterozygous genotypes, which can then be used for ASE testing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0479-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4351697
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43516972015-03-07 A hidden Markov approach for ascertaining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium Steibel, Juan P Wang, Heng Zhong, Ping-Shou BMC Bioinformatics Methodology Article BACKGROUND: Allelic specific expression (ASE) increases our understanding of the genetic control of gene expression and its links to phenotypic variation. ASE testing is implemented through binomial or beta-binomial tests of sequence read counts of alternative alleles at a cSNP of interest in heterozygous individuals. This requires prior ascertainment of the cSNP genotypes for all individuals. To meet the needs, we propose hidden Markov methods to call SNPs from next generation RNA sequence data when ASE possibly exists. RESULTS: We propose two hidden Markov models (HMMs), HMM-ASE and HMM-NASE that consider or do not consider ASE, respectively, in order to improve genotyping accuracy. Both HMMs have the advantages of calling the genotypes of several SNPs simultaneously and allow mapping error which, respectively, utilize the dependence among SNPs and correct the bias due to mapping error. In addition, HMM-ASE exploits ASE information to further improve genotype accuracy when the ASE is likely to be present. Simulation results indicate that the HMMs proposed demonstrate a very good prediction accuracy in terms of controlling both the false discovery rate (FDR) and the false negative rate (FNR). When ASE is present, the HMM-ASE had a lower FNR than HMM-NASE, while both can control the false discovery rate (FDR) at a similar level. By exploiting linkage disequilibrium (LD), a real data application demonstrate that the proposed methods have better sensitivity and similar FDR in calling heterozygous SNPs than the VarScan method. Sensitivity and FDR are similar to that of the BCFtools and Beagle methods. The resulting genotypes show good properties for the estimation of the genetic parameters and ASE ratios. CONCLUSIONS: We introduce HMMs, which are able to exploit LD and account for the ASE and mapping errors, to simultaneously call SNPs from the next generation RNA sequence data. The method introduced can reliably call for cSNP genotypes even in the presence of ASE and under low sequencing coverage. As a byproduct, the proposed method is able to provide predictions of ASE ratios for the heterozygous genotypes, which can then be used for ASE testing. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0479-2) contains supplementary material, which is available to authorized users. BioMed Central 2015-02-22 /pmc/articles/PMC4351697/ /pubmed/25887316 http://dx.doi.org/10.1186/s12859-015-0479-2 Text en © Steibel et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Steibel, Juan P
Wang, Heng
Zhong, Ping-Shou
A hidden Markov approach for ascertaining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium
title A hidden Markov approach for ascertaining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium
title_full A hidden Markov approach for ascertaining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium
title_fullStr A hidden Markov approach for ascertaining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium
title_full_unstemmed A hidden Markov approach for ascertaining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium
title_short A hidden Markov approach for ascertaining cSNP genotypes from RNA sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium
title_sort hidden markov approach for ascertaining csnp genotypes from rna sequence data in the presence of allelic imbalance by exploiting linkage disequilibrium
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4351697/
https://www.ncbi.nlm.nih.gov/pubmed/25887316
http://dx.doi.org/10.1186/s12859-015-0479-2
work_keys_str_mv AT steibeljuanp ahiddenmarkovapproachforascertainingcsnpgenotypesfromrnasequencedatainthepresenceofallelicimbalancebyexploitinglinkagedisequilibrium
AT wangheng ahiddenmarkovapproachforascertainingcsnpgenotypesfromrnasequencedatainthepresenceofallelicimbalancebyexploitinglinkagedisequilibrium
AT zhongpingshou ahiddenmarkovapproachforascertainingcsnpgenotypesfromrnasequencedatainthepresenceofallelicimbalancebyexploitinglinkagedisequilibrium
AT steibeljuanp hiddenmarkovapproachforascertainingcsnpgenotypesfromrnasequencedatainthepresenceofallelicimbalancebyexploitinglinkagedisequilibrium
AT wangheng hiddenmarkovapproachforascertainingcsnpgenotypesfromrnasequencedatainthepresenceofallelicimbalancebyexploitinglinkagedisequilibrium
AT zhongpingshou hiddenmarkovapproachforascertainingcsnpgenotypesfromrnasequencedatainthepresenceofallelicimbalancebyexploitinglinkagedisequilibrium