Cargando…

Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection

MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads...

Descripción completa

Detalles Bibliográficos
Autores principales: Demirci, Müşerref Duygu Saçar, Allmer, Jens
Formato: Online Artículo Texto
Lenguaje:English
Publicado: De Gruyter 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6042829/
https://www.ncbi.nlm.nih.gov/pubmed/28753538
http://dx.doi.org/10.1515/jib-2017-0032
_version_ 1783339229876584448
author Demirci, Müşerref Duygu Saçar
Allmer, Jens
author_facet Demirci, Müşerref Duygu Saçar
Allmer, Jens
author_sort Demirci, Müşerref Duygu Saçar
collection PubMed
description MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads to a great interest in establishing the miRNAs of an organism. Experimental methods are complicated which led to the development of computational methods for pre-miRNA detection. Such methods generally employ machine learning to establish models for the discrimination between miRNAs and other sequences. Positive training data for model establishment, for the most part, stems from miRBase, the miRNA registry. The quality of the entries in miRBase has been questioned, though. This unknown quality led to the development of filtering strategies in attempts to produce high quality positive datasets which can lead to a scarcity of positive data. To analyze the quality of filtered data we developed a machine learning model and found it is well able to establish data quality based on intrinsic measures. Additionally, we analyzed which features describing pre-miRNAs could discriminate between low and high quality data. Both models are applicable to data from miRBase and can be used for establishing high quality positive data. This will facilitate the development of better miRNA detection tools which will make the prediction of miRNAs in disease states more accurate. Finally, we applied both models to all miRBase data and provide the list of high quality hairpins.
format Online
Article
Text
id pubmed-6042829
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher De Gruyter
record_format MEDLINE/PubMed
spelling pubmed-60428292019-01-28 Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection Demirci, Müşerref Duygu Saçar Allmer, Jens J Integr Bioinform Research Articles MicroRNAs (miRNAs) are involved in the post-transcriptional regulation of protein abundance and thus have a great impact on the resulting phenotype. It is, therefore, no wonder that they have been implicated in many diseases ranging from virus infections to cancer. This impact on the phenotype leads to a great interest in establishing the miRNAs of an organism. Experimental methods are complicated which led to the development of computational methods for pre-miRNA detection. Such methods generally employ machine learning to establish models for the discrimination between miRNAs and other sequences. Positive training data for model establishment, for the most part, stems from miRBase, the miRNA registry. The quality of the entries in miRBase has been questioned, though. This unknown quality led to the development of filtering strategies in attempts to produce high quality positive datasets which can lead to a scarcity of positive data. To analyze the quality of filtered data we developed a machine learning model and found it is well able to establish data quality based on intrinsic measures. Additionally, we analyzed which features describing pre-miRNAs could discriminate between low and high quality data. Both models are applicable to data from miRBase and can be used for establishing high quality positive data. This will facilitate the development of better miRNA detection tools which will make the prediction of miRNAs in disease states more accurate. Finally, we applied both models to all miRBase data and provide the list of high quality hairpins. De Gruyter 2017-07-28 /pmc/articles/PMC6042829/ /pubmed/28753538 http://dx.doi.org/10.1515/jib-2017-0032 Text en ©2017, Müşerref Duygu Saçar Demirci, published by De Gruyter, Berlin/Boston http://creativecommons.org/licenses/by-nc-nd/3.0 This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.
spellingShingle Research Articles
Demirci, Müşerref Duygu Saçar
Allmer, Jens
Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection
title Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection
title_full Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection
title_fullStr Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection
title_full_unstemmed Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection
title_short Improving the Quality of Positive Datasets for the Establishment of Machine Learning Models for pre-microRNA Detection
title_sort improving the quality of positive datasets for the establishment of machine learning models for pre-microrna detection
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6042829/
https://www.ncbi.nlm.nih.gov/pubmed/28753538
http://dx.doi.org/10.1515/jib-2017-0032
work_keys_str_mv AT demircimuserrefduygusacar improvingthequalityofpositivedatasetsfortheestablishmentofmachinelearningmodelsforpremicrornadetection
AT allmerjens improvingthequalityofpositivedatasetsfortheestablishmentofmachinelearningmodelsforpremicrornadetection