Cargando…

Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods

BACKGROUND: Alternative splicing is the critical process in a single gene coding, which removes introns and joins exons, and splicing branchpoints are indicators for the alternative splicing. Wet experiments have identified a great number of human splicing branchpoints, but many branchpoints are sti...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Wen, Zhu, Xiaopeng, Fu, Yu, Tsuji, Junko, Weng, Zhiping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5773893/
https://www.ncbi.nlm.nih.gov/pubmed/29219070
http://dx.doi.org/10.1186/s12859-017-1875-6
_version_ 1783293659131674624
author Zhang, Wen
Zhu, Xiaopeng
Fu, Yu
Tsuji, Junko
Weng, Zhiping
author_facet Zhang, Wen
Zhu, Xiaopeng
Fu, Yu
Tsuji, Junko
Weng, Zhiping
author_sort Zhang, Wen
collection PubMed
description BACKGROUND: Alternative splicing is the critical process in a single gene coding, which removes introns and joins exons, and splicing branchpoints are indicators for the alternative splicing. Wet experiments have identified a great number of human splicing branchpoints, but many branchpoints are still unknown. In order to guide wet experiments, we develop computational methods to predict human splicing branchpoints. RESULTS: Considering the fact that an intron may have multiple branchpoints, we transform the branchpoint prediction as the multi-label learning problem, and attempt to predict branchpoint sites from intron sequences. First, we investigate a variety of intron sequence-derived features, such as sparse profile, dinucleotide profile, position weight matrix profile, Markov motif profile and polypyrimidine tract profile. Second, we consider several multi-label learning methods: partial least squares regression, canonical correlation analysis and regularized canonical correlation analysis, and use them as the basic classification engines. Third, we propose two ensemble learning schemes which integrate different features and different classifiers to build ensemble learning systems for the branchpoint prediction. One is the genetic algorithm-based weighted average ensemble method; the other is the logistic regression-based ensemble method. CONCLUSIONS: In the computational experiments, two ensemble learning methods outperform benchmark branchpoint prediction methods, and can produce high-accuracy results on the benchmark dataset.
format Online
Article
Text
id pubmed-5773893
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57738932018-01-26 Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods Zhang, Wen Zhu, Xiaopeng Fu, Yu Tsuji, Junko Weng, Zhiping BMC Bioinformatics Research BACKGROUND: Alternative splicing is the critical process in a single gene coding, which removes introns and joins exons, and splicing branchpoints are indicators for the alternative splicing. Wet experiments have identified a great number of human splicing branchpoints, but many branchpoints are still unknown. In order to guide wet experiments, we develop computational methods to predict human splicing branchpoints. RESULTS: Considering the fact that an intron may have multiple branchpoints, we transform the branchpoint prediction as the multi-label learning problem, and attempt to predict branchpoint sites from intron sequences. First, we investigate a variety of intron sequence-derived features, such as sparse profile, dinucleotide profile, position weight matrix profile, Markov motif profile and polypyrimidine tract profile. Second, we consider several multi-label learning methods: partial least squares regression, canonical correlation analysis and regularized canonical correlation analysis, and use them as the basic classification engines. Third, we propose two ensemble learning schemes which integrate different features and different classifiers to build ensemble learning systems for the branchpoint prediction. One is the genetic algorithm-based weighted average ensemble method; the other is the logistic regression-based ensemble method. CONCLUSIONS: In the computational experiments, two ensemble learning methods outperform benchmark branchpoint prediction methods, and can produce high-accuracy results on the benchmark dataset. BioMed Central 2017-12-01 /pmc/articles/PMC5773893/ /pubmed/29219070 http://dx.doi.org/10.1186/s12859-017-1875-6 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zhang, Wen
Zhu, Xiaopeng
Fu, Yu
Tsuji, Junko
Weng, Zhiping
Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods
title Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods
title_full Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods
title_fullStr Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods
title_full_unstemmed Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods
title_short Predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods
title_sort predicting human splicing branchpoints by combining sequence-derived features and multi-label learning methods
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5773893/
https://www.ncbi.nlm.nih.gov/pubmed/29219070
http://dx.doi.org/10.1186/s12859-017-1875-6
work_keys_str_mv AT zhangwen predictinghumansplicingbranchpointsbycombiningsequencederivedfeaturesandmultilabellearningmethods
AT zhuxiaopeng predictinghumansplicingbranchpointsbycombiningsequencederivedfeaturesandmultilabellearningmethods
AT fuyu predictinghumansplicingbranchpointsbycombiningsequencederivedfeaturesandmultilabellearningmethods
AT tsujijunko predictinghumansplicingbranchpointsbycombiningsequencederivedfeaturesandmultilabellearningmethods
AT wengzhiping predictinghumansplicingbranchpointsbycombiningsequencederivedfeaturesandmultilabellearningmethods