Cargando…

A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network

BACKGROUND: Long-chain non-coding RNA (lncRNA) is closely related to many biological activities. Since its sequence structure is similar to that of messenger RNA (mRNA), it is difficult to distinguish between the two based only on sequence biometrics. Therefore, it is particularly important to const...

Descripción completa

Detalles Bibliográficos
Autores principales: Wen, Jianghui, Liu, Yeshu, Shi, Yu, Huang, Haoran, Deng, Bing, Xiao, Xinping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6743109/
https://www.ncbi.nlm.nih.gov/pubmed/31519146
http://dx.doi.org/10.1186/s12859-019-3039-3
_version_ 1783451217771364352
author Wen, Jianghui
Liu, Yeshu
Shi, Yu
Huang, Haoran
Deng, Bing
Xiao, Xinping
author_facet Wen, Jianghui
Liu, Yeshu
Shi, Yu
Huang, Haoran
Deng, Bing
Xiao, Xinping
author_sort Wen, Jianghui
collection PubMed
description BACKGROUND: Long-chain non-coding RNA (lncRNA) is closely related to many biological activities. Since its sequence structure is similar to that of messenger RNA (mRNA), it is difficult to distinguish between the two based only on sequence biometrics. Therefore, it is particularly important to construct a model that can effectively identify lncRNA and mRNA. RESULTS: First, the difference in the k-mer frequency distribution between lncRNA and mRNA sequences is considered in this paper, and they are transformed into the k-mer frequency matrix. Moreover, k-mers with more species are screened by relative entropy. The classification model of the lncRNA and mRNA sequences is then proposed by inputting the k-mer frequency matrix and training the convolutional neural network. Finally, the optimal k-mer combination of the classification model is determined and compared with other machine learning methods in humans, mice and chickens. The results indicate that the proposed model has the highest classification accuracy. Furthermore, the recognition ability of this model is verified to a single sequence. CONCLUSION: We established a classification model for lncRNA and mRNA based on k-mers and the convolutional neural network. The classification accuracy of the model with 1-mers, 2-mers and 3-mers was the highest, with an accuracy of 0.9872 in humans, 0.8797 in mice and 0.9963 in chickens, which is better than those of the random forest, logistic regression, decision tree and support vector machine.
format Online
Article
Text
id pubmed-6743109
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-67431092019-09-16 A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network Wen, Jianghui Liu, Yeshu Shi, Yu Huang, Haoran Deng, Bing Xiao, Xinping BMC Bioinformatics Methodology Article BACKGROUND: Long-chain non-coding RNA (lncRNA) is closely related to many biological activities. Since its sequence structure is similar to that of messenger RNA (mRNA), it is difficult to distinguish between the two based only on sequence biometrics. Therefore, it is particularly important to construct a model that can effectively identify lncRNA and mRNA. RESULTS: First, the difference in the k-mer frequency distribution between lncRNA and mRNA sequences is considered in this paper, and they are transformed into the k-mer frequency matrix. Moreover, k-mers with more species are screened by relative entropy. The classification model of the lncRNA and mRNA sequences is then proposed by inputting the k-mer frequency matrix and training the convolutional neural network. Finally, the optimal k-mer combination of the classification model is determined and compared with other machine learning methods in humans, mice and chickens. The results indicate that the proposed model has the highest classification accuracy. Furthermore, the recognition ability of this model is verified to a single sequence. CONCLUSION: We established a classification model for lncRNA and mRNA based on k-mers and the convolutional neural network. The classification accuracy of the model with 1-mers, 2-mers and 3-mers was the highest, with an accuracy of 0.9872 in humans, 0.8797 in mice and 0.9963 in chickens, which is better than those of the random forest, logistic regression, decision tree and support vector machine. BioMed Central 2019-09-13 /pmc/articles/PMC6743109/ /pubmed/31519146 http://dx.doi.org/10.1186/s12859-019-3039-3 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Wen, Jianghui
Liu, Yeshu
Shi, Yu
Huang, Haoran
Deng, Bing
Xiao, Xinping
A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network
title A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network
title_full A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network
title_fullStr A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network
title_full_unstemmed A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network
title_short A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network
title_sort classification model for lncrna and mrna based on k-mers and a convolutional neural network
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6743109/
https://www.ncbi.nlm.nih.gov/pubmed/31519146
http://dx.doi.org/10.1186/s12859-019-3039-3
work_keys_str_mv AT wenjianghui aclassificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork
AT liuyeshu aclassificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork
AT shiyu aclassificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork
AT huanghaoran aclassificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork
AT dengbing aclassificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork
AT xiaoxinping aclassificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork
AT wenjianghui classificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork
AT liuyeshu classificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork
AT shiyu classificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork
AT huanghaoran classificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork
AT dengbing classificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork
AT xiaoxinping classificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork