Cargando…
Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human
Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2900287/ https://www.ncbi.nlm.nih.gov/pubmed/20628616 http://dx.doi.org/10.1371/journal.pcbi.1000849 |
_version_ | 1782183618239528960 |
---|---|
author | Wagner, James R. Ge, Bing Pokholok, Dmitry Gunderson, Kevin L. Pastinen, Tomi Blanchette, Mathieu |
author_facet | Wagner, James R. Ge, Bing Pokholok, Dmitry Gunderson, Kevin L. Pastinen, Tomi Blanchette, Mathieu |
author_sort | Wagner, James R. |
collection | PubMed |
description | Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of genotyping arrays to assay AI at a high resolution (∼750,000 SNPs across the autosomes). In this paper, we investigate computational approaches to analyze this data and identify genomic regions with AI in an unbiased and robust statistical manner. We propose two families of approaches: (i) a statistical approach based on z-score computations, and (ii) a family of machine learning approaches based on Hidden Markov Models. Each method is evaluated using previously published experimental data sets as well as with permutation testing. When applied to whole genome data from 53 HapMap samples, our approaches reveal that allelic imbalance is widespread (most expressed genes show evidence of AI in at least one of our 53 samples) and that most AI regions in a given individual are also found in at least a few other individuals. While many AI regions identified in the genome correspond to known protein-coding transcripts, others overlap with recently discovered long non-coding RNAs. We also observe that genomic regions with AI not only include complete transcripts with consistent differential expression levels, but also more complex patterns of allelic expression such as alternative promoters and alternative 3′ end. The approaches developed not only shed light on the incidence and mechanisms of allelic expression, but will also help towards mapping the genetic causes of allelic expression and identify cases where this variation may be linked to diseases. |
format | Text |
id | pubmed-2900287 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-29002872010-07-13 Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human Wagner, James R. Ge, Bing Pokholok, Dmitry Gunderson, Kevin L. Pastinen, Tomi Blanchette, Mathieu PLoS Comput Biol Research Article Allelic imbalance (AI) is a phenomenon where the two alleles of a given gene are expressed at different levels in a given cell, either because of epigenetic inactivation of one of the two alleles, or because of genetic variation in regulatory regions. Recently, Bing et al. have described the use of genotyping arrays to assay AI at a high resolution (∼750,000 SNPs across the autosomes). In this paper, we investigate computational approaches to analyze this data and identify genomic regions with AI in an unbiased and robust statistical manner. We propose two families of approaches: (i) a statistical approach based on z-score computations, and (ii) a family of machine learning approaches based on Hidden Markov Models. Each method is evaluated using previously published experimental data sets as well as with permutation testing. When applied to whole genome data from 53 HapMap samples, our approaches reveal that allelic imbalance is widespread (most expressed genes show evidence of AI in at least one of our 53 samples) and that most AI regions in a given individual are also found in at least a few other individuals. While many AI regions identified in the genome correspond to known protein-coding transcripts, others overlap with recently discovered long non-coding RNAs. We also observe that genomic regions with AI not only include complete transcripts with consistent differential expression levels, but also more complex patterns of allelic expression such as alternative promoters and alternative 3′ end. The approaches developed not only shed light on the incidence and mechanisms of allelic expression, but will also help towards mapping the genetic causes of allelic expression and identify cases where this variation may be linked to diseases. Public Library of Science 2010-07-08 /pmc/articles/PMC2900287/ /pubmed/20628616 http://dx.doi.org/10.1371/journal.pcbi.1000849 Text en Wagner et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Wagner, James R. Ge, Bing Pokholok, Dmitry Gunderson, Kevin L. Pastinen, Tomi Blanchette, Mathieu Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human |
title | Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human |
title_full | Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human |
title_fullStr | Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human |
title_full_unstemmed | Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human |
title_short | Computational Analysis of Whole-Genome Differential Allelic Expression Data in Human |
title_sort | computational analysis of whole-genome differential allelic expression data in human |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2900287/ https://www.ncbi.nlm.nih.gov/pubmed/20628616 http://dx.doi.org/10.1371/journal.pcbi.1000849 |
work_keys_str_mv | AT wagnerjamesr computationalanalysisofwholegenomedifferentialallelicexpressiondatainhuman AT gebing computationalanalysisofwholegenomedifferentialallelicexpressiondatainhuman AT pokholokdmitry computationalanalysisofwholegenomedifferentialallelicexpressiondatainhuman AT gundersonkevinl computationalanalysisofwholegenomedifferentialallelicexpressiondatainhuman AT pastinentomi computationalanalysisofwholegenomedifferentialallelicexpressiondatainhuman AT blanchettemathieu computationalanalysisofwholegenomedifferentialallelicexpressiondatainhuman |