Cargando…
Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features
BACKGROUND: Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete. METHODS: In this paper, we attempt to differ...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4830532/ https://www.ncbi.nlm.nih.gov/pubmed/27074043 http://dx.doi.org/10.1371/journal.pone.0153268 |
_version_ | 1782426908261089280 |
---|---|
author | Luo, Longqiang Li, Dingfang Zhang, Wen Tu, Shikui Zhu, Xiaopeng Tian, Gang |
author_facet | Luo, Longqiang Li, Dingfang Zhang, Wen Tu, Shikui Zhu, Xiaopeng Tian, Gang |
author_sort | Luo, Longqiang |
collection | PubMed |
description | BACKGROUND: Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete. METHODS: In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposon-derived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models. RESULTS: We construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets. CONCLUSIONS: Compared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File. |
format | Online Article Text |
id | pubmed-4830532 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-48305322016-04-22 Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features Luo, Longqiang Li, Dingfang Zhang, Wen Tu, Shikui Zhu, Xiaopeng Tian, Gang PLoS One Research Article BACKGROUND: Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete. METHODS: In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposon-derived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models. RESULTS: We construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets. CONCLUSIONS: Compared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File. Public Library of Science 2016-04-13 /pmc/articles/PMC4830532/ /pubmed/27074043 http://dx.doi.org/10.1371/journal.pone.0153268 Text en © 2016 Luo et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Luo, Longqiang Li, Dingfang Zhang, Wen Tu, Shikui Zhu, Xiaopeng Tian, Gang Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features |
title | Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features |
title_full | Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features |
title_fullStr | Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features |
title_full_unstemmed | Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features |
title_short | Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features |
title_sort | accurate prediction of transposon-derived pirnas by integrating various sequential and physicochemical features |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4830532/ https://www.ncbi.nlm.nih.gov/pubmed/27074043 http://dx.doi.org/10.1371/journal.pone.0153268 |
work_keys_str_mv | AT luolongqiang accuratepredictionoftransposonderivedpirnasbyintegratingvarioussequentialandphysicochemicalfeatures AT lidingfang accuratepredictionoftransposonderivedpirnasbyintegratingvarioussequentialandphysicochemicalfeatures AT zhangwen accuratepredictionoftransposonderivedpirnasbyintegratingvarioussequentialandphysicochemicalfeatures AT tushikui accuratepredictionoftransposonderivedpirnasbyintegratingvarioussequentialandphysicochemicalfeatures AT zhuxiaopeng accuratepredictionoftransposonderivedpirnasbyintegratingvarioussequentialandphysicochemicalfeatures AT tiangang accuratepredictionoftransposonderivedpirnasbyintegratingvarioussequentialandphysicochemicalfeatures |