Cargando…

Improved identification of conserved cassette exons using Bayesian networks

BACKGROUND: Alternative splicing is a major contributor to the diversity of eukaryotic transcriptomes and proteomes. Currently, large scale detection of alternative splicing using expressed sequence tags (ESTs) or microarrays does not capture all alternative splicing events. Moreover, for many speci...

Descripción completa

Detalles Bibliográficos
Autores principales: Sinha, Rileen, Hiller, Michael, Pudimat, Rainer, Gausmann, Ulrike, Platzer, Matthias, Backofen, Rolf
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2621368/
https://www.ncbi.nlm.nih.gov/pubmed/19014490
http://dx.doi.org/10.1186/1471-2105-9-477
_version_ 1782163399980875776
author Sinha, Rileen
Hiller, Michael
Pudimat, Rainer
Gausmann, Ulrike
Platzer, Matthias
Backofen, Rolf
author_facet Sinha, Rileen
Hiller, Michael
Pudimat, Rainer
Gausmann, Ulrike
Platzer, Matthias
Backofen, Rolf
author_sort Sinha, Rileen
collection PubMed
description BACKGROUND: Alternative splicing is a major contributor to the diversity of eukaryotic transcriptomes and proteomes. Currently, large scale detection of alternative splicing using expressed sequence tags (ESTs) or microarrays does not capture all alternative splicing events. Moreover, for many species genomic data is being produced at a far greater rate than corresponding transcript data, hence in silico methods of predicting alternative splicing have to be improved. RESULTS: Here, we show that the use of Bayesian networks (BNs) allows accurate prediction of evolutionary conserved exon skipping events. At a stringent false positive rate of 0.5%, our BN achieves an improved true positive rate of 61%, compared to a previously reported 50% on the same dataset using support vector machines (SVMs). Incorporating several novel discriminative features such as intronic splicing regulatory elements leads to the improvement. Features related to mRNA secondary structure increase the prediction performance, corroborating previous findings that secondary structures are important for exon recognition. Random labelling tests rule out overfitting. Cross-validation on another dataset confirms the increased performance. When using the same dataset and the same set of features, the BN matches the performance of an SVM in earlier literature. Remarkably, we could show that about half of the exons which are labelled constitutive but receive a high probability of being alternative by the BN, are in fact alternative exons according to the latest EST data. Finally, we predict exon skipping without using conservation-based features, and achieve a true positive rate of 29% at a false positive rate of 0.5%. CONCLUSION: BNs can be used to achieve accurate identification of alternative exons and provide clues about possible dependencies between relevant features. The near-identical performance of the BN and SVM when using the same features shows that good classification depends more on features than on the choice of classifier. Conservation based features continue to be the most informative, and hence distinguishing alternative exons from constitutive ones without using conservation based features remains a challenging problem.
format Text
id pubmed-2621368
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26213682009-01-13 Improved identification of conserved cassette exons using Bayesian networks Sinha, Rileen Hiller, Michael Pudimat, Rainer Gausmann, Ulrike Platzer, Matthias Backofen, Rolf BMC Bioinformatics Research Article BACKGROUND: Alternative splicing is a major contributor to the diversity of eukaryotic transcriptomes and proteomes. Currently, large scale detection of alternative splicing using expressed sequence tags (ESTs) or microarrays does not capture all alternative splicing events. Moreover, for many species genomic data is being produced at a far greater rate than corresponding transcript data, hence in silico methods of predicting alternative splicing have to be improved. RESULTS: Here, we show that the use of Bayesian networks (BNs) allows accurate prediction of evolutionary conserved exon skipping events. At a stringent false positive rate of 0.5%, our BN achieves an improved true positive rate of 61%, compared to a previously reported 50% on the same dataset using support vector machines (SVMs). Incorporating several novel discriminative features such as intronic splicing regulatory elements leads to the improvement. Features related to mRNA secondary structure increase the prediction performance, corroborating previous findings that secondary structures are important for exon recognition. Random labelling tests rule out overfitting. Cross-validation on another dataset confirms the increased performance. When using the same dataset and the same set of features, the BN matches the performance of an SVM in earlier literature. Remarkably, we could show that about half of the exons which are labelled constitutive but receive a high probability of being alternative by the BN, are in fact alternative exons according to the latest EST data. Finally, we predict exon skipping without using conservation-based features, and achieve a true positive rate of 29% at a false positive rate of 0.5%. CONCLUSION: BNs can be used to achieve accurate identification of alternative exons and provide clues about possible dependencies between relevant features. The near-identical performance of the BN and SVM when using the same features shows that good classification depends more on features than on the choice of classifier. Conservation based features continue to be the most informative, and hence distinguishing alternative exons from constitutive ones without using conservation based features remains a challenging problem. BioMed Central 2008-11-12 /pmc/articles/PMC2621368/ /pubmed/19014490 http://dx.doi.org/10.1186/1471-2105-9-477 Text en Copyright © 2008 Sinha et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Sinha, Rileen
Hiller, Michael
Pudimat, Rainer
Gausmann, Ulrike
Platzer, Matthias
Backofen, Rolf
Improved identification of conserved cassette exons using Bayesian networks
title Improved identification of conserved cassette exons using Bayesian networks
title_full Improved identification of conserved cassette exons using Bayesian networks
title_fullStr Improved identification of conserved cassette exons using Bayesian networks
title_full_unstemmed Improved identification of conserved cassette exons using Bayesian networks
title_short Improved identification of conserved cassette exons using Bayesian networks
title_sort improved identification of conserved cassette exons using bayesian networks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2621368/
https://www.ncbi.nlm.nih.gov/pubmed/19014490
http://dx.doi.org/10.1186/1471-2105-9-477
work_keys_str_mv AT sinharileen improvedidentificationofconservedcassetteexonsusingbayesiannetworks
AT hillermichael improvedidentificationofconservedcassetteexonsusingbayesiannetworks
AT pudimatrainer improvedidentificationofconservedcassetteexonsusingbayesiannetworks
AT gausmannulrike improvedidentificationofconservedcassetteexonsusingbayesiannetworks
AT platzermatthias improvedidentificationofconservedcassetteexonsusingbayesiannetworks
AT backofenrolf improvedidentificationofconservedcassetteexonsusingbayesiannetworks