Cargando…
A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network
BACKGROUND: Long-chain non-coding RNA (lncRNA) is closely related to many biological activities. Since its sequence structure is similar to that of messenger RNA (mRNA), it is difficult to distinguish between the two based only on sequence biometrics. Therefore, it is particularly important to const...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6743109/ https://www.ncbi.nlm.nih.gov/pubmed/31519146 http://dx.doi.org/10.1186/s12859-019-3039-3 |
_version_ | 1783451217771364352 |
---|---|
author | Wen, Jianghui Liu, Yeshu Shi, Yu Huang, Haoran Deng, Bing Xiao, Xinping |
author_facet | Wen, Jianghui Liu, Yeshu Shi, Yu Huang, Haoran Deng, Bing Xiao, Xinping |
author_sort | Wen, Jianghui |
collection | PubMed |
description | BACKGROUND: Long-chain non-coding RNA (lncRNA) is closely related to many biological activities. Since its sequence structure is similar to that of messenger RNA (mRNA), it is difficult to distinguish between the two based only on sequence biometrics. Therefore, it is particularly important to construct a model that can effectively identify lncRNA and mRNA. RESULTS: First, the difference in the k-mer frequency distribution between lncRNA and mRNA sequences is considered in this paper, and they are transformed into the k-mer frequency matrix. Moreover, k-mers with more species are screened by relative entropy. The classification model of the lncRNA and mRNA sequences is then proposed by inputting the k-mer frequency matrix and training the convolutional neural network. Finally, the optimal k-mer combination of the classification model is determined and compared with other machine learning methods in humans, mice and chickens. The results indicate that the proposed model has the highest classification accuracy. Furthermore, the recognition ability of this model is verified to a single sequence. CONCLUSION: We established a classification model for lncRNA and mRNA based on k-mers and the convolutional neural network. The classification accuracy of the model with 1-mers, 2-mers and 3-mers was the highest, with an accuracy of 0.9872 in humans, 0.8797 in mice and 0.9963 in chickens, which is better than those of the random forest, logistic regression, decision tree and support vector machine. |
format | Online Article Text |
id | pubmed-6743109 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-67431092019-09-16 A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network Wen, Jianghui Liu, Yeshu Shi, Yu Huang, Haoran Deng, Bing Xiao, Xinping BMC Bioinformatics Methodology Article BACKGROUND: Long-chain non-coding RNA (lncRNA) is closely related to many biological activities. Since its sequence structure is similar to that of messenger RNA (mRNA), it is difficult to distinguish between the two based only on sequence biometrics. Therefore, it is particularly important to construct a model that can effectively identify lncRNA and mRNA. RESULTS: First, the difference in the k-mer frequency distribution between lncRNA and mRNA sequences is considered in this paper, and they are transformed into the k-mer frequency matrix. Moreover, k-mers with more species are screened by relative entropy. The classification model of the lncRNA and mRNA sequences is then proposed by inputting the k-mer frequency matrix and training the convolutional neural network. Finally, the optimal k-mer combination of the classification model is determined and compared with other machine learning methods in humans, mice and chickens. The results indicate that the proposed model has the highest classification accuracy. Furthermore, the recognition ability of this model is verified to a single sequence. CONCLUSION: We established a classification model for lncRNA and mRNA based on k-mers and the convolutional neural network. The classification accuracy of the model with 1-mers, 2-mers and 3-mers was the highest, with an accuracy of 0.9872 in humans, 0.8797 in mice and 0.9963 in chickens, which is better than those of the random forest, logistic regression, decision tree and support vector machine. BioMed Central 2019-09-13 /pmc/articles/PMC6743109/ /pubmed/31519146 http://dx.doi.org/10.1186/s12859-019-3039-3 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Wen, Jianghui Liu, Yeshu Shi, Yu Huang, Haoran Deng, Bing Xiao, Xinping A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network |
title | A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network |
title_full | A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network |
title_fullStr | A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network |
title_full_unstemmed | A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network |
title_short | A classification model for lncRNA and mRNA based on k-mers and a convolutional neural network |
title_sort | classification model for lncrna and mrna based on k-mers and a convolutional neural network |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6743109/ https://www.ncbi.nlm.nih.gov/pubmed/31519146 http://dx.doi.org/10.1186/s12859-019-3039-3 |
work_keys_str_mv | AT wenjianghui aclassificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork AT liuyeshu aclassificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork AT shiyu aclassificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork AT huanghaoran aclassificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork AT dengbing aclassificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork AT xiaoxinping aclassificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork AT wenjianghui classificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork AT liuyeshu classificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork AT shiyu classificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork AT huanghaoran classificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork AT dengbing classificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork AT xiaoxinping classificationmodelforlncrnaandmrnabasedonkmersandaconvolutionalneuralnetwork |