Cargando…

Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human

Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of...

Descripción completa

Detalles Bibliográficos
Autores principales: Wagner, James R., Ge, Bing, Pokholok, Dmitry, Gunderson, Kevin L., Pastinen, Tomi, Blanchette, Mathieu
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2900287/
https://www.ncbi.nlm.nih.gov/pubmed/20628616
http://dx.doi.org/10.1371/journal.pcbi.1000849
_version_ 1782183618239528960
author Wagner, James R.
Ge, Bing
Pokholok, Dmitry
Gunderson, Kevin L.
Pastinen, Tomi
Blanchette, Mathieu
author_facet Wagner, James R.
Ge, Bing
Pokholok, Dmitry
Gunderson, Kevin L.
Pastinen, Tomi
Blanchette, Mathieu
author_sort Wagner, James R.
collection PubMed
description Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of genotyping arrays to assay AI at a high resolution (∼750,000 SNPs across the autosomes). In this paper, we investigate computational approaches to analyze this data and identify genomic regions with AI in an unbiased and robust statistical manner. We propose two families of approaches: (i) a statistical approach based on z-score computations, and (ii) a family of machine learning approaches based on Hidden Markov Models. Each method is evaluated using previously published experimental data sets as well as with permutation testing. When applied to whole genome data from 53 HapMap samples, our approaches reveal that allelic imbalance is widespread (most expressed genes show evidence of AI in at least one of our 53 samples) and that most AI regions in a given individual are also found in at least a few other individuals. While many AI regions identified in the genome correspond to known protein-coding transcripts, others overlap with recently discovered long non-coding RNAs. We also observe that genomic regions with AI not only include complete transcripts with consistent differential expression levels, but also more complex patterns of allelic expression such as alternative promoters and alternative 3′ end. The approaches developed not only shed light on the incidence and mechanisms of allelic expression, but will also help towards mapping the genetic causes of allelic expression and identify cases where this variation may be linked to diseases.
format Text
id pubmed-2900287
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29002872010-07-13 Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human Wagner, James R. Ge, Bing Pokholok, Dmitry Gunderson, Kevin L. Pastinen, Tomi Blanchette, Mathieu PLoS Comput Biol Research Article Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of genotyping arrays to assay AI at a high resolution (∼750,000 SNPs across the autosomes). In this paper, we investigate computational approaches to analyze this data and identify genomic regions with AI in an unbiased and robust statistical manner. We propose two families of approaches: (i) a statistical approach based on z-score computations, and (ii) a family of machine learning approaches based on Hidden Markov Models. Each method is evaluated using previously published experimental data sets as well as with permutation testing. When applied to whole genome data from 53 HapMap samples, our approaches reveal that allelic imbalance is widespread (most expressed genes show evidence of AI in at least one of our 53 samples) and that most AI regions in a given individual are also found in at least a few other individuals. While many AI regions identified in the genome correspond to known protein-coding transcripts, others overlap with recently discovered long non-coding RNAs. We also observe that genomic regions with AI not only include complete transcripts with consistent differential expression levels, but also more complex patterns of allelic expression such as alternative promoters and alternative 3′ end. The approaches developed not only shed light on the incidence and mechanisms of allelic expression, but will also help towards mapping the genetic causes of allelic expression and identify cases where this variation may be linked to diseases. Public Library of Science 2010-07-08 /pmc/articles/PMC2900287/ /pubmed/20628616 http://dx.doi.org/10.1371/journal.pcbi.1000849 Text en Wagner et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Wagner, James R.
Ge, Bing
Pokholok, Dmitry
Gunderson, Kevin L.
Pastinen, Tomi
Blanchette, Mathieu
Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human
title Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human
title_full Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human
title_fullStr Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human
title_full_unstemmed Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human
title_short Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human
title_sort computational analysis of whole-genome differential allelic expression data in human
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2900287/
https://www.ncbi.nlm.nih.gov/pubmed/20628616
http://dx.doi.org/10.1371/journal.pcbi.1000849
work_keys_str_mv AT wagnerjamesr computationalanalysisofwholegenomedifferentialallelicexpressiondatainhuman
AT gebing computationalanalysisofwholegenomedifferentialallelicexpressiondatainhuman
AT pokholokdmitry computationalanalysisofwholegenomedifferentialallelicexpressiondatainhuman
AT gundersonkevinl computationalanalysisofwholegenomedifferentialallelicexpressiondatainhuman
AT pastinentomi computationalanalysisofwholegenomedifferentialallelicexpressiondatainhuman
AT blanchettemathieu computationalanalysisofwholegenomedifferentialallelicexpressiondatainhuman