Cargando…

LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion

Recent studies uncover that subcellular location of long non-coding RNAs (lncRNAs) can provide significant information on its function. Due to the lack of experimental data, the number of lncRNAs is very limited, experimentally verified subcellular localization, and the numbers of lncRNAs located in...

Descripción completa

Detalles Bibliográficos
Autores principales: Feng, Shiyao, Liang, Yanchun, Du, Wei, Lv, Wei, Li, Ying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7582431/
https://www.ncbi.nlm.nih.gov/pubmed/33019721
http://dx.doi.org/10.3390/ijms21197271
_version_ 1783599189959114752
author Feng, Shiyao
Liang, Yanchun
Du, Wei
Lv, Wei
Li, Ying
author_facet Feng, Shiyao
Liang, Yanchun
Du, Wei
Lv, Wei
Li, Ying
author_sort Feng, Shiyao
collection PubMed
description Recent studies uncover that subcellular location of long non-coding RNAs (lncRNAs) can provide significant information on its function. Due to the lack of experimental data, the number of lncRNAs is very limited, experimentally verified subcellular localization, and the numbers of lncRNAs located in different organelle are wildly imbalanced. The prediction of subcellular location of lncRNAs is actually a multi-classification small sample imbalance problem. The imbalance of data results in the poor recognition effect of machine learning models on small data subsets, which is a puzzling and challenging problem in the existing research. In this study, we integrate multi-source features to construct a sequence-based computational tool, lncLocation, to predict the subcellular location of lncRNAs. Autoencoder is used to enhance part of the features, and the binomial distribution-based filtering method and recursive feature elimination (RFE) are used to filter some of the features. It improves the representation ability of data and reduces the problem of unbalanced multi-classification data. By comprehensive experiments on different feature combinations and machine learning models, we select the optimal features and classifier model scheme to construct a subcellular location prediction tool, lncLocation. LncLocation can obtain an 87.78% accuracy using 5-fold cross validation on the benchmark data, which is higher than the state-of-the-art tools, and the classification performance, especially for small class sets, is improved significantly.
format Online
Article
Text
id pubmed-7582431
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75824312020-10-29 LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion Feng, Shiyao Liang, Yanchun Du, Wei Lv, Wei Li, Ying Int J Mol Sci Article Recent studies uncover that subcellular location of long non-coding RNAs (lncRNAs) can provide significant information on its function. Due to the lack of experimental data, the number of lncRNAs is very limited, experimentally verified subcellular localization, and the numbers of lncRNAs located in different organelle are wildly imbalanced. The prediction of subcellular location of lncRNAs is actually a multi-classification small sample imbalance problem. The imbalance of data results in the poor recognition effect of machine learning models on small data subsets, which is a puzzling and challenging problem in the existing research. In this study, we integrate multi-source features to construct a sequence-based computational tool, lncLocation, to predict the subcellular location of lncRNAs. Autoencoder is used to enhance part of the features, and the binomial distribution-based filtering method and recursive feature elimination (RFE) are used to filter some of the features. It improves the representation ability of data and reduces the problem of unbalanced multi-classification data. By comprehensive experiments on different feature combinations and machine learning models, we select the optimal features and classifier model scheme to construct a subcellular location prediction tool, lncLocation. LncLocation can obtain an 87.78% accuracy using 5-fold cross validation on the benchmark data, which is higher than the state-of-the-art tools, and the classification performance, especially for small class sets, is improved significantly. MDPI 2020-10-01 /pmc/articles/PMC7582431/ /pubmed/33019721 http://dx.doi.org/10.3390/ijms21197271 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Feng, Shiyao
Liang, Yanchun
Du, Wei
Lv, Wei
Li, Ying
LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion
title LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion
title_full LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion
title_fullStr LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion
title_full_unstemmed LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion
title_short LncLocation: Efficient Subcellular Location Prediction of Long Non-Coding RNA-Based Multi-Source Heterogeneous Feature Fusion
title_sort lnclocation: efficient subcellular location prediction of long non-coding rna-based multi-source heterogeneous feature fusion
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7582431/
https://www.ncbi.nlm.nih.gov/pubmed/33019721
http://dx.doi.org/10.3390/ijms21197271
work_keys_str_mv AT fengshiyao lnclocationefficientsubcellularlocationpredictionoflongnoncodingrnabasedmultisourceheterogeneousfeaturefusion
AT liangyanchun lnclocationefficientsubcellularlocationpredictionoflongnoncodingrnabasedmultisourceheterogeneousfeaturefusion
AT duwei lnclocationefficientsubcellularlocationpredictionoflongnoncodingrnabasedmultisourceheterogeneousfeaturefusion
AT lvwei lnclocationefficientsubcellularlocationpredictionoflongnoncodingrnabasedmultisourceheterogeneousfeaturefusion
AT liying lnclocationefficientsubcellularlocationpredictionoflongnoncodingrnabasedmultisourceheterogeneousfeaturefusion