Cargando…

A statistical method for predicting splice variants between two groups of samples using GeneChip(® )expression array data

BACKGROUND: Alternative splicing of pre-messenger RNA results in RNA variants with combinations of selected exons. It is one of the essential biological functions and regulatory components in higher eukaryotic cells. Some of these variants are detectable with the Affymetrix GeneChip(® )that uses mul...

Descripción completa

Detalles Bibliográficos
Autores principales: Fan, Wenhong, Khalid, Najma, Hallahan, Andrew R, Olson, James M, Zhao, Lue Ping
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1502129/
https://www.ncbi.nlm.nih.gov/pubmed/16603076
http://dx.doi.org/10.1186/1742-4682-3-19
_version_ 1782128424089812992
author Fan, Wenhong
Khalid, Najma
Hallahan, Andrew R
Olson, James M
Zhao, Lue Ping
author_facet Fan, Wenhong
Khalid, Najma
Hallahan, Andrew R
Olson, James M
Zhao, Lue Ping
author_sort Fan, Wenhong
collection PubMed
description BACKGROUND: Alternative splicing of pre-messenger RNA results in RNA variants with combinations of selected exons. It is one of the essential biological functions and regulatory components in higher eukaryotic cells. Some of these variants are detectable with the Affymetrix GeneChip(® )that uses multiple oligonucleotide probes (i.e. probe set), since the target sequences for the multiple probes are adjacent within each gene. Hybridization intensity from a probe correlates with abundance of the corresponding transcript. Although the multiple-probe feature in the current GeneChip(® )was designed to assess expression values of individual genes, it also measures transcriptional abundance for a sub-region of a gene sequence. This additional capacity motivated us to develop a method to predict alternative splicing, taking advance of extensive repositories of GeneChip(® )gene expression array data. RESULTS: We developed a two-step approach to predict alternative splicing from GeneChip(® )data. First, we clustered the probes from a probe set into pseudo-exons based on similarity of probe intensities and physical adjacency. A pseudo-exon is defined as a sequence in the gene within which multiple probes have comparable probe intensity values. Second, for each pseudo-exon, we assessed the statistical significance of the difference in probe intensity between two groups of samples. Differentially expressed pseudo-exons are predicted to be alternatively spliced. We applied our method to empirical data generated from GeneChip(® )Hu6800 arrays, which include 7129 probe sets and twenty probes per probe set. The dataset consists of sixty-nine medulloblastoma (27 metastatic and 42 non-metastatic) samples and four cerebellum samples as normal controls. We predicted that 577 genes would be alternatively spliced when we compared normal cerebellum samples to medulloblastomas, and predicted that thirteen genes would be alternatively spliced when we compared metastatic medulloblastomas to non-metastatic ones. We checked the consistency of some of our findings with information in UCSC Human Genome Browser. CONCLUSION: The two-step approach described in this paper is capable of predicting some alternative splicing from multiple oligonucleotide-based gene expression array data with GeneChip(® )technology. Our method employs the extensive repositories of gene expression array data available and generates alternative splicing hypotheses, which can be further validated by experimental studies.
format Text
id pubmed-1502129
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15021292006-07-14 A statistical method for predicting splice variants between two groups of samples using GeneChip(® )expression array data Fan, Wenhong Khalid, Najma Hallahan, Andrew R Olson, James M Zhao, Lue Ping Theor Biol Med Model Research BACKGROUND: Alternative splicing of pre-messenger RNA results in RNA variants with combinations of selected exons. It is one of the essential biological functions and regulatory components in higher eukaryotic cells. Some of these variants are detectable with the Affymetrix GeneChip(® )that uses multiple oligonucleotide probes (i.e. probe set), since the target sequences for the multiple probes are adjacent within each gene. Hybridization intensity from a probe correlates with abundance of the corresponding transcript. Although the multiple-probe feature in the current GeneChip(® )was designed to assess expression values of individual genes, it also measures transcriptional abundance for a sub-region of a gene sequence. This additional capacity motivated us to develop a method to predict alternative splicing, taking advance of extensive repositories of GeneChip(® )gene expression array data. RESULTS: We developed a two-step approach to predict alternative splicing from GeneChip(® )data. First, we clustered the probes from a probe set into pseudo-exons based on similarity of probe intensities and physical adjacency. A pseudo-exon is defined as a sequence in the gene within which multiple probes have comparable probe intensity values. Second, for each pseudo-exon, we assessed the statistical significance of the difference in probe intensity between two groups of samples. Differentially expressed pseudo-exons are predicted to be alternatively spliced. We applied our method to empirical data generated from GeneChip(® )Hu6800 arrays, which include 7129 probe sets and twenty probes per probe set. The dataset consists of sixty-nine medulloblastoma (27 metastatic and 42 non-metastatic) samples and four cerebellum samples as normal controls. We predicted that 577 genes would be alternatively spliced when we compared normal cerebellum samples to medulloblastomas, and predicted that thirteen genes would be alternatively spliced when we compared metastatic medulloblastomas to non-metastatic ones. We checked the consistency of some of our findings with information in UCSC Human Genome Browser. CONCLUSION: The two-step approach described in this paper is capable of predicting some alternative splicing from multiple oligonucleotide-based gene expression array data with GeneChip(® )technology. Our method employs the extensive repositories of gene expression array data available and generates alternative splicing hypotheses, which can be further validated by experimental studies. BioMed Central 2006-04-07 /pmc/articles/PMC1502129/ /pubmed/16603076 http://dx.doi.org/10.1186/1742-4682-3-19 Text en Copyright © 2006 Fan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Fan, Wenhong
Khalid, Najma
Hallahan, Andrew R
Olson, James M
Zhao, Lue Ping
A statistical method for predicting splice variants between two groups of samples using GeneChip(® )expression array data
title A statistical method for predicting splice variants between two groups of samples using GeneChip(® )expression array data
title_full A statistical method for predicting splice variants between two groups of samples using GeneChip(® )expression array data
title_fullStr A statistical method for predicting splice variants between two groups of samples using GeneChip(® )expression array data
title_full_unstemmed A statistical method for predicting splice variants between two groups of samples using GeneChip(® )expression array data
title_short A statistical method for predicting splice variants between two groups of samples using GeneChip(® )expression array data
title_sort statistical method for predicting splice variants between two groups of samples using genechip(® )expression array data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1502129/
https://www.ncbi.nlm.nih.gov/pubmed/16603076
http://dx.doi.org/10.1186/1742-4682-3-19
work_keys_str_mv AT fanwenhong astatisticalmethodforpredictingsplicevariantsbetweentwogroupsofsamplesusinggenechipexpressionarraydata
AT khalidnajma astatisticalmethodforpredictingsplicevariantsbetweentwogroupsofsamplesusinggenechipexpressionarraydata
AT hallahanandrewr astatisticalmethodforpredictingsplicevariantsbetweentwogroupsofsamplesusinggenechipexpressionarraydata
AT olsonjamesm astatisticalmethodforpredictingsplicevariantsbetweentwogroupsofsamplesusinggenechipexpressionarraydata
AT zhaolueping astatisticalmethodforpredictingsplicevariantsbetweentwogroupsofsamplesusinggenechipexpressionarraydata
AT fanwenhong statisticalmethodforpredictingsplicevariantsbetweentwogroupsofsamplesusinggenechipexpressionarraydata
AT khalidnajma statisticalmethodforpredictingsplicevariantsbetweentwogroupsofsamplesusinggenechipexpressionarraydata
AT hallahanandrewr statisticalmethodforpredictingsplicevariantsbetweentwogroupsofsamplesusinggenechipexpressionarraydata
AT olsonjamesm statisticalmethodforpredictingsplicevariantsbetweentwogroupsofsamplesusinggenechipexpressionarraydata
AT zhaolueping statisticalmethodforpredictingsplicevariantsbetweentwogroupsofsamplesusinggenechipexpressionarraydata