Cargando…

Multi-view Co-training for microRNA Prediction

MicroRNA (miRNA) are short, non-coding RNAs involved in cell regulation at post-transcriptional and translational levels. Numerous computational predictors of miRNA been developed that generally classify miRNA based on either sequence- or expression-based features. While these methods are highly eff...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sheikh Hassani, Mohsen, Green, James R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6662744/ https://www.ncbi.nlm.nih.gov/pubmed/31358877 http://dx.doi.org/10.1038/s41598-019-47399-8

_version_	1783439700575387648
author	Sheikh Hassani, Mohsen Green, James R.
author_facet	Sheikh Hassani, Mohsen Green, James R.
author_sort	Sheikh Hassani, Mohsen
collection	PubMed
description	MicroRNA (miRNA) are short, non-coding RNAs involved in cell regulation at post-transcriptional and translational levels. Numerous computational predictors of miRNA been developed that generally classify miRNA based on either sequence- or expression-based features. While these methods are highly effective, they require large labelled training data sets, which are often not available for many species. Simultaneously, emerging high-throughput wet-lab experimental procedures are producing large unlabelled data sets of genomic sequence and RNA expression profiles. Existing methods use supervised machine learning and are therefore unable to leverage these unlabelled data. In this paper, we design and develop a multi-view co-training approach for the classification of miRNA to maximize the utility of unlabelled training data by taking advantage of multiple views of the problem. Starting with only 10 labelled training data, co-training is shown to significantly (p < 0.01) increase classification accuracy of both sequence- and expression-based classifiers, without requiring any new labelled training data. After 11 iterations of co-training, the expression-based view of miRNA classification experiences an average increase in AUPRC of 15.81% over six species, compared to 11.90% for self-training and 4.84% for passive learning. Similar results are observed for sequence-based classifiers with increases of 46.47%, 39.53% and 29.43%, for co-training, self-training, and passive learning, respectively. The final co-trained sequence and expression-based classifiers are integrated into a final confidence-based classifier which shows improved performance compared to both the expression (1.5%, p = 0.021) and sequence (3.7%, p = 0.006) views. This study represents the first application of multi-view co-training to miRNA prediction and shows great promise, particularly for understudied species with few available training data.
format	Online Article Text
id	pubmed-6662744
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-66627442019-08-02 Multi-view Co-training for microRNA Prediction Sheikh Hassani, Mohsen Green, James R. Sci Rep Article MicroRNA (miRNA) are short, non-coding RNAs involved in cell regulation at post-transcriptional and translational levels. Numerous computational predictors of miRNA been developed that generally classify miRNA based on either sequence- or expression-based features. While these methods are highly effective, they require large labelled training data sets, which are often not available for many species. Simultaneously, emerging high-throughput wet-lab experimental procedures are producing large unlabelled data sets of genomic sequence and RNA expression profiles. Existing methods use supervised machine learning and are therefore unable to leverage these unlabelled data. In this paper, we design and develop a multi-view co-training approach for the classification of miRNA to maximize the utility of unlabelled training data by taking advantage of multiple views of the problem. Starting with only 10 labelled training data, co-training is shown to significantly (p < 0.01) increase classification accuracy of both sequence- and expression-based classifiers, without requiring any new labelled training data. After 11 iterations of co-training, the expression-based view of miRNA classification experiences an average increase in AUPRC of 15.81% over six species, compared to 11.90% for self-training and 4.84% for passive learning. Similar results are observed for sequence-based classifiers with increases of 46.47%, 39.53% and 29.43%, for co-training, self-training, and passive learning, respectively. The final co-trained sequence and expression-based classifiers are integrated into a final confidence-based classifier which shows improved performance compared to both the expression (1.5%, p = 0.021) and sequence (3.7%, p = 0.006) views. This study represents the first application of multi-view co-training to miRNA prediction and shows great promise, particularly for understudied species with few available training data. Nature Publishing Group UK 2019-07-29 /pmc/articles/PMC6662744/ /pubmed/31358877 http://dx.doi.org/10.1038/s41598-019-47399-8 Text en © The Author(s) 2019 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Article Sheikh Hassani, Mohsen Green, James R. Multi-view Co-training for microRNA Prediction
title	Multi-view Co-training for microRNA Prediction
title_full	Multi-view Co-training for microRNA Prediction
title_fullStr	Multi-view Co-training for microRNA Prediction
title_full_unstemmed	Multi-view Co-training for microRNA Prediction
title_short	Multi-view Co-training for microRNA Prediction
title_sort	multi-view co-training for microrna prediction
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6662744/ https://www.ncbi.nlm.nih.gov/pubmed/31358877 http://dx.doi.org/10.1038/s41598-019-47399-8
work_keys_str_mv	AT sheikhhassanimohsen multiviewcotrainingformicrornaprediction AT greenjamesr multiviewcotrainingformicrornaprediction

Multi-view Co-training for microRNA Prediction

Ejemplares similares