Cargando…

Identification of MicroRNA Precursors with Support Vector Machine and String Kernel

MicroRNAs (miRNAs) are one family of short (21–23 nt) regulatory non-coding RNAs processed from long (70–110 nt) miRNA precursors (pre-miRNAs). Identifying true and false precursors plays an important role in computational identification of miRNAs. Some numerical features have been extracted from pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Jian-Hua, Li, Fei, Sun, Qiu-Feng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5054094/
https://www.ncbi.nlm.nih.gov/pubmed/18973868
http://dx.doi.org/10.1016/S1672-0229(08)60027-3
_version_ 1782458524875358208
author Xu, Jian-Hua
Li, Fei
Sun, Qiu-Feng
author_facet Xu, Jian-Hua
Li, Fei
Sun, Qiu-Feng
author_sort Xu, Jian-Hua
collection PubMed
description MicroRNAs (miRNAs) are one family of short (21–23 nt) regulatory non-coding RNAs processed from long (70–110 nt) miRNA precursors (pre-miRNAs). Identifying true and false precursors plays an important role in computational identification of miRNAs. Some numerical features have been extracted from precursor sequences and their secondary structures to suit some classification methods; however, they may lose some usefully discriminative information hidden in sequences and structures. In this study, pre-miRNA sequences and their secondary structures are directly used to construct an exponential kernel based on weighted Levenshtein distance between two sequences. This string kernel is then combined with support vector machine (SVM) for detecting true and false pre-miRNAs. Based on 331 training samples of true and false human pre-miRNAs, 2 key parameters in SVM are selected by 5-fold cross validation and grid search, and 5 realizations with different 5-fold partitions are executed. Among 16 independent test sets from 3 human, 8 animal, 2 plant, 1 virus, and 2 artificially false human pre-miRNAs, our method statistically outperforms the previous SVM-based technique on 11 sets, including 3 human, 7 animal, and 1 false human pre-miRNAs. In particular, pre-miRNAs with multiple loops that were usually excluded in the previous work are correctly identified in this study with an accuracy of 92.66%.
format Online
Article
Text
id pubmed-5054094
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-50540942016-10-14 Identification of MicroRNA Precursors with Support Vector Machine and String Kernel Xu, Jian-Hua Li, Fei Sun, Qiu-Feng Genomics Proteomics Bioinformatics Method MicroRNAs (miRNAs) are one family of short (21–23 nt) regulatory non-coding RNAs processed from long (70–110 nt) miRNA precursors (pre-miRNAs). Identifying true and false precursors plays an important role in computational identification of miRNAs. Some numerical features have been extracted from precursor sequences and their secondary structures to suit some classification methods; however, they may lose some usefully discriminative information hidden in sequences and structures. In this study, pre-miRNA sequences and their secondary structures are directly used to construct an exponential kernel based on weighted Levenshtein distance between two sequences. This string kernel is then combined with support vector machine (SVM) for detecting true and false pre-miRNAs. Based on 331 training samples of true and false human pre-miRNAs, 2 key parameters in SVM are selected by 5-fold cross validation and grid search, and 5 realizations with different 5-fold partitions are executed. Among 16 independent test sets from 3 human, 8 animal, 2 plant, 1 virus, and 2 artificially false human pre-miRNAs, our method statistically outperforms the previous SVM-based technique on 11 sets, including 3 human, 7 animal, and 1 false human pre-miRNAs. In particular, pre-miRNAs with multiple loops that were usually excluded in the previous work are correctly identified in this study with an accuracy of 92.66%. Elsevier 2008 2008-10-28 /pmc/articles/PMC5054094/ /pubmed/18973868 http://dx.doi.org/10.1016/S1672-0229(08)60027-3 Text en © 2008 Beijing Institute of Genomics http://creativecommons.org/licenses/by-nc-sa/3.0/ This is an open access article under the CC BY-NC-SA license (http://creativecommons.org/licenses/by-nc-sa/3.0/).
spellingShingle Method
Xu, Jian-Hua
Li, Fei
Sun, Qiu-Feng
Identification of MicroRNA Precursors with Support Vector Machine and String Kernel
title Identification of MicroRNA Precursors with Support Vector Machine and String Kernel
title_full Identification of MicroRNA Precursors with Support Vector Machine and String Kernel
title_fullStr Identification of MicroRNA Precursors with Support Vector Machine and String Kernel
title_full_unstemmed Identification of MicroRNA Precursors with Support Vector Machine and String Kernel
title_short Identification of MicroRNA Precursors with Support Vector Machine and String Kernel
title_sort identification of microrna precursors with support vector machine and string kernel
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5054094/
https://www.ncbi.nlm.nih.gov/pubmed/18973868
http://dx.doi.org/10.1016/S1672-0229(08)60027-3
work_keys_str_mv AT xujianhua identificationofmicrornaprecursorswithsupportvectormachineandstringkernel
AT lifei identificationofmicrornaprecursorswithsupportvectormachineandstringkernel
AT sunqiufeng identificationofmicrornaprecursorswithsupportvectormachineandstringkernel