Cargando…

A semi-supervised machine learning framework for microRNA classification

BACKGROUND: MicroRNAs (miRNAs) are a family of short, non-coding RNAs that have been linked to critical cellular activities, most notably regulation of gene expression. The identification of miRNA is a cross-disciplinary approach that requires both computational identification methods and wet-lab va...

Descripción completa

Detalles Bibliográficos
Autores principales: Sheikh Hassani, Mohsen, Green, James R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805288/
https://www.ncbi.nlm.nih.gov/pubmed/31639051
http://dx.doi.org/10.1186/s40246-019-0221-7
_version_ 1783461347022864384
author Sheikh Hassani, Mohsen
Green, James R.
author_facet Sheikh Hassani, Mohsen
Green, James R.
author_sort Sheikh Hassani, Mohsen
collection PubMed
description BACKGROUND: MicroRNAs (miRNAs) are a family of short, non-coding RNAs that have been linked to critical cellular activities, most notably regulation of gene expression. The identification of miRNA is a cross-disciplinary approach that requires both computational identification methods and wet-lab validation experiments, making it a resource-intensive procedure. While numerous machine learning methods have been developed to increase classification accuracy and thus reduce validation costs, most methods use supervised learning and thus require large labeled training data sets, often not feasible for less-sequenced species. On the other hand, there is now an abundance of unlabeled RNA sequence data due to the emergence of high-throughput wet-lab experimental procedures, such as next-generation sequencing. RESULTS: This paper explores the application of semi-supervised machine learning for miRNA classification in order to maximize the utility of both labeled and unlabeled data. We here present the novel combination of two semi-supervised approaches: active learning and multi-view co-training. Results across six diverse species show that this multi-stage semi-supervised approach is able to improve classification performance using very small numbers of labeled instances, effectively leveraging the available unlabeled data. CONCLUSIONS: The proposed semi-supervised miRNA classification pipeline holds the potential to identify novel miRNA with high recall and precision while requiring very small numbers of previously known miRNA. Such a method could be highly beneficial when studying miRNA in newly sequenced genomes of niche species with few known examples of miRNA.
format Online
Article
Text
id pubmed-6805288
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-68052882019-10-24 A semi-supervised machine learning framework for microRNA classification Sheikh Hassani, Mohsen Green, James R. Hum Genomics Research BACKGROUND: MicroRNAs (miRNAs) are a family of short, non-coding RNAs that have been linked to critical cellular activities, most notably regulation of gene expression. The identification of miRNA is a cross-disciplinary approach that requires both computational identification methods and wet-lab validation experiments, making it a resource-intensive procedure. While numerous machine learning methods have been developed to increase classification accuracy and thus reduce validation costs, most methods use supervised learning and thus require large labeled training data sets, often not feasible for less-sequenced species. On the other hand, there is now an abundance of unlabeled RNA sequence data due to the emergence of high-throughput wet-lab experimental procedures, such as next-generation sequencing. RESULTS: This paper explores the application of semi-supervised machine learning for miRNA classification in order to maximize the utility of both labeled and unlabeled data. We here present the novel combination of two semi-supervised approaches: active learning and multi-view co-training. Results across six diverse species show that this multi-stage semi-supervised approach is able to improve classification performance using very small numbers of labeled instances, effectively leveraging the available unlabeled data. CONCLUSIONS: The proposed semi-supervised miRNA classification pipeline holds the potential to identify novel miRNA with high recall and precision while requiring very small numbers of previously known miRNA. Such a method could be highly beneficial when studying miRNA in newly sequenced genomes of niche species with few known examples of miRNA. BioMed Central 2019-10-22 /pmc/articles/PMC6805288/ /pubmed/31639051 http://dx.doi.org/10.1186/s40246-019-0221-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Sheikh Hassani, Mohsen
Green, James R.
A semi-supervised machine learning framework for microRNA classification
title A semi-supervised machine learning framework for microRNA classification
title_full A semi-supervised machine learning framework for microRNA classification
title_fullStr A semi-supervised machine learning framework for microRNA classification
title_full_unstemmed A semi-supervised machine learning framework for microRNA classification
title_short A semi-supervised machine learning framework for microRNA classification
title_sort semi-supervised machine learning framework for microrna classification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6805288/
https://www.ncbi.nlm.nih.gov/pubmed/31639051
http://dx.doi.org/10.1186/s40246-019-0221-7
work_keys_str_mv AT sheikhhassanimohsen asemisupervisedmachinelearningframeworkformicrornaclassification
AT greenjamesr asemisupervisedmachinelearningframeworkformicrornaclassification
AT sheikhhassanimohsen semisupervisedmachinelearningframeworkformicrornaclassification
AT greenjamesr semisupervisedmachinelearningframeworkformicrornaclassification