Cargando…

miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs

MicroRNAs are conserved, endogenous small RNAs with critical post-transcriptional regulatory functions throughout eukaryota, including prominent roles in development and disease. Despite much effort, microRNA annotations still contain errors and are incomplete due especially to challenges related to...

Descripción completa

Detalles Bibliográficos
Autores principales: Bell, Jimmy, Larson, Maureen, Kutzler, Michelle, Bionaz, Massimo, Löhr, Christiane V., Hendrix, David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6785219/
https://www.ncbi.nlm.nih.gov/pubmed/31596843
http://dx.doi.org/10.1371/journal.pcbi.1007309
_version_ 1783457849507053568
author Bell, Jimmy
Larson, Maureen
Kutzler, Michelle
Bionaz, Massimo
Löhr, Christiane V.
Hendrix, David
author_facet Bell, Jimmy
Larson, Maureen
Kutzler, Michelle
Bionaz, Massimo
Löhr, Christiane V.
Hendrix, David
author_sort Bell, Jimmy
collection PubMed
description MicroRNAs are conserved, endogenous small RNAs with critical post-transcriptional regulatory functions throughout eukaryota, including prominent roles in development and disease. Despite much effort, microRNA annotations still contain errors and are incomplete due especially to challenges related to identifying valid miRs that have small numbers of reads, to properly locating hairpin precursors and to balancing precision and recall. Here, we present miRWoods, which solves these challenges using a duplex-focused precursor detection method and stacked random forests with specialized layers to detect mature and precursor microRNAs, and has been tuned to optimize the harmonic mean of precision and recall. We trained and tuned our discovery pipeline on data sets from the well-annotated human genome, and evaluated its performance on data from mouse. Compared to existing approaches, miRWoods better identifies precursor spans, and can balance sensitivity and specificity for an overall greater prediction accuracy, recalling an average of 10% more annotated microRNAs, and correctly predicts substantially more microRNAs with only one read. We apply this method to the under-annotated genomes of Felis catus (domestic cat) and Bos taurus (cow). We identified hundreds of novel microRNAs in small RNA sequencing data sets from muscle and skin from cat, from 10 tissues from cow and also from human and mouse cells. Our novel predictions include a microRNA in an intron of tyrosine kinase 2 (TYK2) that is present in both cat and cow, as well as a family of mirtrons with two instances in the human genome. Our predictions support a more expanded miR-2284 family in the bovine genome, a larger mir-548 family in the human genome, and a larger let-7 family in the feline genome.
format Online
Article
Text
id pubmed-6785219
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-67852192019-10-19 miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs Bell, Jimmy Larson, Maureen Kutzler, Michelle Bionaz, Massimo Löhr, Christiane V. Hendrix, David PLoS Comput Biol Research Article MicroRNAs are conserved, endogenous small RNAs with critical post-transcriptional regulatory functions throughout eukaryota, including prominent roles in development and disease. Despite much effort, microRNA annotations still contain errors and are incomplete due especially to challenges related to identifying valid miRs that have small numbers of reads, to properly locating hairpin precursors and to balancing precision and recall. Here, we present miRWoods, which solves these challenges using a duplex-focused precursor detection method and stacked random forests with specialized layers to detect mature and precursor microRNAs, and has been tuned to optimize the harmonic mean of precision and recall. We trained and tuned our discovery pipeline on data sets from the well-annotated human genome, and evaluated its performance on data from mouse. Compared to existing approaches, miRWoods better identifies precursor spans, and can balance sensitivity and specificity for an overall greater prediction accuracy, recalling an average of 10% more annotated microRNAs, and correctly predicts substantially more microRNAs with only one read. We apply this method to the under-annotated genomes of Felis catus (domestic cat) and Bos taurus (cow). We identified hundreds of novel microRNAs in small RNA sequencing data sets from muscle and skin from cat, from 10 tissues from cow and also from human and mouse cells. Our novel predictions include a microRNA in an intron of tyrosine kinase 2 (TYK2) that is present in both cat and cow, as well as a family of mirtrons with two instances in the human genome. Our predictions support a more expanded miR-2284 family in the bovine genome, a larger mir-548 family in the human genome, and a larger let-7 family in the feline genome. Public Library of Science 2019-10-09 /pmc/articles/PMC6785219/ /pubmed/31596843 http://dx.doi.org/10.1371/journal.pcbi.1007309 Text en © 2019 Bell et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Bell, Jimmy
Larson, Maureen
Kutzler, Michelle
Bionaz, Massimo
Löhr, Christiane V.
Hendrix, David
miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs
title miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs
title_full miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs
title_fullStr miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs
title_full_unstemmed miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs
title_short miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs
title_sort mirwoods: enhanced precursor detection and stacked random forests for the sensitive detection of micrornas
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6785219/
https://www.ncbi.nlm.nih.gov/pubmed/31596843
http://dx.doi.org/10.1371/journal.pcbi.1007309
work_keys_str_mv AT belljimmy mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas
AT larsonmaureen mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas
AT kutzlermichelle mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas
AT bionazmassimo mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas
AT lohrchristianev mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas
AT hendrixdavid mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas