Cargando…
Improved identification of conserved cassette exons using Bayesian networks
BACKGROUND: Alternative splicing is a major contributor to the diversity of eukaryotic transcriptomes and proteomes. Currently, large scale detection of alternative splicing using expressed sequence tags (ESTs) or microarrays does not capture all alternative splicing events. Moreover, for many speci...
Autores principales: | , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2008
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2621368/ https://www.ncbi.nlm.nih.gov/pubmed/19014490 http://dx.doi.org/10.1186/1471-2105-9-477 |
_version_ | 1782163399980875776 |
---|---|
author | Sinha, Rileen Hiller, Michael Pudimat, Rainer Gausmann, Ulrike Platzer, Matthias Backofen, Rolf |
author_facet | Sinha, Rileen Hiller, Michael Pudimat, Rainer Gausmann, Ulrike Platzer, Matthias Backofen, Rolf |
author_sort | Sinha, Rileen |
collection | PubMed |
description | BACKGROUND: Alternative splicing is a major contributor to the diversity of eukaryotic transcriptomes and proteomes. Currently, large scale detection of alternative splicing using expressed sequence tags (ESTs) or microarrays does not capture all alternative splicing events. Moreover, for many species genomic data is being produced at a far greater rate than corresponding transcript data, hence in silico methods of predicting alternative splicing have to be improved. RESULTS: Here, we show that the use of Bayesian networks (BNs) allows accurate prediction of evolutionary conserved exon skipping events. At a stringent false positive rate of 0.5%, our BN achieves an improved true positive rate of 61%, compared to a previously reported 50% on the same dataset using support vector machines (SVMs). Incorporating several novel discriminative features such as intronic splicing regulatory elements leads to the improvement. Features related to mRNA secondary structure increase the prediction performance, corroborating previous findings that secondary structures are important for exon recognition. Random labelling tests rule out overfitting. Cross-validation on another dataset confirms the increased performance. When using the same dataset and the same set of features, the BN matches the performance of an SVM in earlier literature. Remarkably, we could show that about half of the exons which are labelled constitutive but receive a high probability of being alternative by the BN, are in fact alternative exons according to the latest EST data. Finally, we predict exon skipping without using conservation-based features, and achieve a true positive rate of 29% at a false positive rate of 0.5%. CONCLUSION: BNs can be used to achieve accurate identification of alternative exons and provide clues about possible dependencies between relevant features. The near-identical performance of the BN and SVM when using the same features shows that good classification depends more on features than on the choice of classifier. Conservation based features continue to be the most informative, and hence distinguishing alternative exons from constitutive ones without using conservation based features remains a challenging problem. |
format | Text |
id | pubmed-2621368 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2008 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26213682009-01-13 Improved identification of conserved cassette exons using Bayesian networks Sinha, Rileen Hiller, Michael Pudimat, Rainer Gausmann, Ulrike Platzer, Matthias Backofen, Rolf BMC Bioinformatics Research Article BACKGROUND: Alternative splicing is a major contributor to the diversity of eukaryotic transcriptomes and proteomes. Currently, large scale detection of alternative splicing using expressed sequence tags (ESTs) or microarrays does not capture all alternative splicing events. Moreover, for many species genomic data is being produced at a far greater rate than corresponding transcript data, hence in silico methods of predicting alternative splicing have to be improved. RESULTS: Here, we show that the use of Bayesian networks (BNs) allows accurate prediction of evolutionary conserved exon skipping events. At a stringent false positive rate of 0.5%, our BN achieves an improved true positive rate of 61%, compared to a previously reported 50% on the same dataset using support vector machines (SVMs). Incorporating several novel discriminative features such as intronic splicing regulatory elements leads to the improvement. Features related to mRNA secondary structure increase the prediction performance, corroborating previous findings that secondary structures are important for exon recognition. Random labelling tests rule out overfitting. Cross-validation on another dataset confirms the increased performance. When using the same dataset and the same set of features, the BN matches the performance of an SVM in earlier literature. Remarkably, we could show that about half of the exons which are labelled constitutive but receive a high probability of being alternative by the BN, are in fact alternative exons according to the latest EST data. Finally, we predict exon skipping without using conservation-based features, and achieve a true positive rate of 29% at a false positive rate of 0.5%. CONCLUSION: BNs can be used to achieve accurate identification of alternative exons and provide clues about possible dependencies between relevant features. The near-identical performance of the BN and SVM when using the same features shows that good classification depends more on features than on the choice of classifier. Conservation based features continue to be the most informative, and hence distinguishing alternative exons from constitutive ones without using conservation based features remains a challenging problem. BioMed Central 2008-11-12 /pmc/articles/PMC2621368/ /pubmed/19014490 http://dx.doi.org/10.1186/1471-2105-9-477 Text en Copyright © 2008 Sinha et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Sinha, Rileen Hiller, Michael Pudimat, Rainer Gausmann, Ulrike Platzer, Matthias Backofen, Rolf Improved identification of conserved cassette exons using Bayesian networks |
title | Improved identification of conserved cassette exons using Bayesian networks |
title_full | Improved identification of conserved cassette exons using Bayesian networks |
title_fullStr | Improved identification of conserved cassette exons using Bayesian networks |
title_full_unstemmed | Improved identification of conserved cassette exons using Bayesian networks |
title_short | Improved identification of conserved cassette exons using Bayesian networks |
title_sort | improved identification of conserved cassette exons using bayesian networks |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2621368/ https://www.ncbi.nlm.nih.gov/pubmed/19014490 http://dx.doi.org/10.1186/1471-2105-9-477 |
work_keys_str_mv | AT sinharileen improvedidentificationofconservedcassetteexonsusingbayesiannetworks AT hillermichael improvedidentificationofconservedcassetteexonsusingbayesiannetworks AT pudimatrainer improvedidentificationofconservedcassetteexonsusingbayesiannetworks AT gausmannulrike improvedidentificationofconservedcassetteexonsusingbayesiannetworks AT platzermatthias improvedidentificationofconservedcassetteexonsusingbayesiannetworks AT backofenrolf improvedidentificationofconservedcassetteexonsusingbayesiannetworks |