Cargando…

Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features

BACKGROUND: Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete. METHODS: In this paper, we attempt to differ...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Longqiang, Li, Dingfang, Zhang, Wen, Tu, Shikui, Zhu, Xiaopeng, Tian, Gang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4830532/
https://www.ncbi.nlm.nih.gov/pubmed/27074043
http://dx.doi.org/10.1371/journal.pone.0153268
_version_ 1782426908261089280
author Luo, Longqiang
Li, Dingfang
Zhang, Wen
Tu, Shikui
Zhu, Xiaopeng
Tian, Gang
author_facet Luo, Longqiang
Li, Dingfang
Zhang, Wen
Tu, Shikui
Zhu, Xiaopeng
Tian, Gang
author_sort Luo, Longqiang
collection PubMed
description BACKGROUND: Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete. METHODS: In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposon-derived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models. RESULTS: We construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets. CONCLUSIONS: Compared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File.
format Online
Article
Text
id pubmed-4830532
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-48305322016-04-22 Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features Luo, Longqiang Li, Dingfang Zhang, Wen Tu, Shikui Zhu, Xiaopeng Tian, Gang PLoS One Research Article BACKGROUND: Piwi-interacting RNA (piRNA) is the largest class of small non-coding RNA molecules. The transposon-derived piRNA prediction can enrich the research contents of small ncRNAs as well as help to further understand generation mechanism of gamete. METHODS: In this paper, we attempt to differentiate transposon-derived piRNAs from non-piRNAs based on their sequential and physicochemical features by using machine learning methods. We explore six sequence-derived features, i.e. spectrum profile, mismatch profile, subsequence profile, position-specific scoring matrix, pseudo dinucleotide composition and local structure-sequence triplet elements, and systematically evaluate their performances for transposon-derived piRNA prediction. Finally, we consider two approaches: direct combination and ensemble learning to integrate useful features and achieve high-accuracy prediction models. RESULTS: We construct three datasets, covering three species: Human, Mouse and Drosophila, and evaluate the performances of prediction models by 10-fold cross validation. In the computational experiments, direct combination models achieve AUC of 0.917, 0.922 and 0.992 on Human, Mouse and Drosophila, respectively; ensemble learning models achieve AUC of 0.922, 0.926 and 0.994 on the three datasets. CONCLUSIONS: Compared with other state-of-the-art methods, our methods can lead to better performances. In conclusion, the proposed methods are promising for the transposon-derived piRNA prediction. The source codes and datasets are available in S1 File. Public Library of Science 2016-04-13 /pmc/articles/PMC4830532/ /pubmed/27074043 http://dx.doi.org/10.1371/journal.pone.0153268 Text en © 2016 Luo et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Luo, Longqiang
Li, Dingfang
Zhang, Wen
Tu, Shikui
Zhu, Xiaopeng
Tian, Gang
Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features
title Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features
title_full Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features
title_fullStr Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features
title_full_unstemmed Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features
title_short Accurate Prediction of Transposon-Derived piRNAs by Integrating Various Sequential and Physicochemical Features
title_sort accurate prediction of transposon-derived pirnas by integrating various sequential and physicochemical features
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4830532/
https://www.ncbi.nlm.nih.gov/pubmed/27074043
http://dx.doi.org/10.1371/journal.pone.0153268
work_keys_str_mv AT luolongqiang accuratepredictionoftransposonderivedpirnasbyintegratingvarioussequentialandphysicochemicalfeatures
AT lidingfang accuratepredictionoftransposonderivedpirnasbyintegratingvarioussequentialandphysicochemicalfeatures
AT zhangwen accuratepredictionoftransposonderivedpirnasbyintegratingvarioussequentialandphysicochemicalfeatures
AT tushikui accuratepredictionoftransposonderivedpirnasbyintegratingvarioussequentialandphysicochemicalfeatures
AT zhuxiaopeng accuratepredictionoftransposonderivedpirnasbyintegratingvarioussequentialandphysicochemicalfeatures
AT tiangang accuratepredictionoftransposonderivedpirnasbyintegratingvarioussequentialandphysicochemicalfeatures