Cargando…

A framework for improving microRNA prediction in non-human genomes

The prediction of novel pre-microRNA (miRNA) from genomic sequence has received considerable attention recently. However, the majority of studies have focused on the human genome. Previous studies have demonstrated that sensitivity (correctly detecting true miRNA) is sustained when human-trained met...

Descripción completa

Detalles Bibliográficos
Autores principales: Peace, Robert J., Biggar, Kyle K., Storey, Kenneth B., Green, James R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4787757/
https://www.ncbi.nlm.nih.gov/pubmed/26163062
http://dx.doi.org/10.1093/nar/gkv698
_version_ 1782420680852111360
author Peace, Robert J.
Biggar, Kyle K.
Storey, Kenneth B.
Green, James R.
author_facet Peace, Robert J.
Biggar, Kyle K.
Storey, Kenneth B.
Green, James R.
author_sort Peace, Robert J.
collection PubMed
description The prediction of novel pre-microRNA (miRNA) from genomic sequence has received considerable attention recently. However, the majority of studies have focused on the human genome. Previous studies have demonstrated that sensitivity (correctly detecting true miRNA) is sustained when human-trained methods are applied to other species, however they have failed to report the dramatic drop in specificity (the ability to correctly reject non-miRNA sequences) in non-human genomes. Considering the ratio of true miRNA sequences to pseudo-miRNA sequences is on the order of 1:1000, such low specificity prevents the application of most existing tools to non-human genomes, as the number of false positives overwhelms the true predictions. We here introduce a framework (SMIRP) for creating species-specific miRNA prediction systems, leveraging sequence conservation and phylogenetic distance information. Substantial improvements in specificity and precision are obtained for four non-human test species when our framework is applied to three different prediction systems representing two types of classifiers (support vector machine and Random Forest), based on three different feature sets, with both human-specific and taxon-wide training data. The SMIRP framework is potentially applicable to all miRNA prediction systems and we expect substantial improvement in precision and specificity, while sustaining sensitivity, independent of the machine learning technique chosen.
format Online
Article
Text
id pubmed-4787757
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47877572016-03-14 A framework for improving microRNA prediction in non-human genomes Peace, Robert J. Biggar, Kyle K. Storey, Kenneth B. Green, James R. Nucleic Acids Res Methods Online The prediction of novel pre-microRNA (miRNA) from genomic sequence has received considerable attention recently. However, the majority of studies have focused on the human genome. Previous studies have demonstrated that sensitivity (correctly detecting true miRNA) is sustained when human-trained methods are applied to other species, however they have failed to report the dramatic drop in specificity (the ability to correctly reject non-miRNA sequences) in non-human genomes. Considering the ratio of true miRNA sequences to pseudo-miRNA sequences is on the order of 1:1000, such low specificity prevents the application of most existing tools to non-human genomes, as the number of false positives overwhelms the true predictions. We here introduce a framework (SMIRP) for creating species-specific miRNA prediction systems, leveraging sequence conservation and phylogenetic distance information. Substantial improvements in specificity and precision are obtained for four non-human test species when our framework is applied to three different prediction systems representing two types of classifiers (support vector machine and Random Forest), based on three different feature sets, with both human-specific and taxon-wide training data. The SMIRP framework is potentially applicable to all miRNA prediction systems and we expect substantial improvement in precision and specificity, while sustaining sensitivity, independent of the machine learning technique chosen. Oxford University Press 2015-11-16 2015-07-10 /pmc/articles/PMC4787757/ /pubmed/26163062 http://dx.doi.org/10.1093/nar/gkv698 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Peace, Robert J.
Biggar, Kyle K.
Storey, Kenneth B.
Green, James R.
A framework for improving microRNA prediction in non-human genomes
title A framework for improving microRNA prediction in non-human genomes
title_full A framework for improving microRNA prediction in non-human genomes
title_fullStr A framework for improving microRNA prediction in non-human genomes
title_full_unstemmed A framework for improving microRNA prediction in non-human genomes
title_short A framework for improving microRNA prediction in non-human genomes
title_sort framework for improving microrna prediction in non-human genomes
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4787757/
https://www.ncbi.nlm.nih.gov/pubmed/26163062
http://dx.doi.org/10.1093/nar/gkv698
work_keys_str_mv AT peacerobertj aframeworkforimprovingmicrornapredictioninnonhumangenomes
AT biggarkylek aframeworkforimprovingmicrornapredictioninnonhumangenomes
AT storeykennethb aframeworkforimprovingmicrornapredictioninnonhumangenomes
AT greenjamesr aframeworkforimprovingmicrornapredictioninnonhumangenomes
AT peacerobertj frameworkforimprovingmicrornapredictioninnonhumangenomes
AT biggarkylek frameworkforimprovingmicrornapredictioninnonhumangenomes
AT storeykennethb frameworkforimprovingmicrornapredictioninnonhumangenomes
AT greenjamesr frameworkforimprovingmicrornapredictioninnonhumangenomes