Cargando…

AlleleHMM: a data-driven method to identify allele specific differences in distributed functional genomic marks

How DNA sequence variation influences gene expression remains poorly understood. Diploid organisms have two homologous copies of their DNA sequence in the same nucleus, providing a rich source of information about how genetic variation affects a wealth of biochemical processes. However, few computat...

Descripción completa

Detalles Bibliográficos
Autores principales: Chou, Shao-Pei, Danko, Charles G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6582321/
https://www.ncbi.nlm.nih.gov/pubmed/30918970
http://dx.doi.org/10.1093/nar/gkz176
_version_ 1783428299389665280
author Chou, Shao-Pei
Danko, Charles G
author_facet Chou, Shao-Pei
Danko, Charles G
author_sort Chou, Shao-Pei
collection PubMed
description How DNA sequence variation influences gene expression remains poorly understood. Diploid organisms have two homologous copies of their DNA sequence in the same nucleus, providing a rich source of information about how genetic variation affects a wealth of biochemical processes. However, few computational methods have been developed to discover allele specific differences in functional genomic data. Existing methods either treat each SNP independently, limiting statistical power, or combine SNPs across gene annotations, preventing the discovery of allele specific differences in unexpected genomic regions. Here we introduce AlleleHMM, a new computational method to identify blocks of neighboring SNPs that share similar allele specific differences in mark abundance. AlleleHMM uses a hidden Markov model to divide the genome into three hidden states based on allele frequencies in genomic data: a symmetric state (state S) which shows no difference between alleles, and regions with a higher signal on the maternal (state M) or paternal (state P) allele. AlleleHMM substantially outperformed naive methods using both simulated and real genomic data, particularly when input data had realistic levels of overdispersion. Using global run-on sequencing (GRO-seq) data, AlleleHMM identified thousands of allele specific blocks of transcription in both coding and non-coding genomic regions. AlleleHMM is a powerful tool for discovering allele specific regions in functional genomic datasets.
format Online
Article
Text
id pubmed-6582321
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-65823212019-06-21 AlleleHMM: a data-driven method to identify allele specific differences in distributed functional genomic marks Chou, Shao-Pei Danko, Charles G Nucleic Acids Res Methods Online How DNA sequence variation influences gene expression remains poorly understood. Diploid organisms have two homologous copies of their DNA sequence in the same nucleus, providing a rich source of information about how genetic variation affects a wealth of biochemical processes. However, few computational methods have been developed to discover allele specific differences in functional genomic data. Existing methods either treat each SNP independently, limiting statistical power, or combine SNPs across gene annotations, preventing the discovery of allele specific differences in unexpected genomic regions. Here we introduce AlleleHMM, a new computational method to identify blocks of neighboring SNPs that share similar allele specific differences in mark abundance. AlleleHMM uses a hidden Markov model to divide the genome into three hidden states based on allele frequencies in genomic data: a symmetric state (state S) which shows no difference between alleles, and regions with a higher signal on the maternal (state M) or paternal (state P) allele. AlleleHMM substantially outperformed naive methods using both simulated and real genomic data, particularly when input data had realistic levels of overdispersion. Using global run-on sequencing (GRO-seq) data, AlleleHMM identified thousands of allele specific blocks of transcription in both coding and non-coding genomic regions. AlleleHMM is a powerful tool for discovering allele specific regions in functional genomic datasets. Oxford University Press 2019-06-20 2019-03-28 /pmc/articles/PMC6582321/ /pubmed/30918970 http://dx.doi.org/10.1093/nar/gkz176 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Chou, Shao-Pei
Danko, Charles G
AlleleHMM: a data-driven method to identify allele specific differences in distributed functional genomic marks
title AlleleHMM: a data-driven method to identify allele specific differences in distributed functional genomic marks
title_full AlleleHMM: a data-driven method to identify allele specific differences in distributed functional genomic marks
title_fullStr AlleleHMM: a data-driven method to identify allele specific differences in distributed functional genomic marks
title_full_unstemmed AlleleHMM: a data-driven method to identify allele specific differences in distributed functional genomic marks
title_short AlleleHMM: a data-driven method to identify allele specific differences in distributed functional genomic marks
title_sort allelehmm: a data-driven method to identify allele specific differences in distributed functional genomic marks
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6582321/
https://www.ncbi.nlm.nih.gov/pubmed/30918970
http://dx.doi.org/10.1093/nar/gkz176
work_keys_str_mv AT choushaopei allelehmmadatadrivenmethodtoidentifyallelespecificdifferencesindistributedfunctionalgenomicmarks
AT dankocharlesg allelehmmadatadrivenmethodtoidentifyallelespecificdifferencesindistributedfunctionalgenomicmarks