Cargando…

Genome-wide hairpins datasets of animals and plants for novel miRNA prediction

This article makes available several genome-wide datasets, which can be used for training microRNA (miRNA) classifiers. The hairpin sequences available are from the genomes of: Homo sapiens, Arabidopsis thaliana, Anopheles gambiae, Caenorhabditis elegans and Drosophila melanogaster. Each dataset pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Bugnon, L.A., Yones, C., Raad, J., Milone, D.H., Stegmayer, G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6700487/
https://www.ncbi.nlm.nih.gov/pubmed/31453279
http://dx.doi.org/10.1016/j.dib.2019.104209
_version_ 1783444884787560448
author Bugnon, L.A.
Yones, C.
Raad, J.
Milone, D.H.
Stegmayer, G.
author_facet Bugnon, L.A.
Yones, C.
Raad, J.
Milone, D.H.
Stegmayer, G.
author_sort Bugnon, L.A.
collection PubMed
description This article makes available several genome-wide datasets, which can be used for training microRNA (miRNA) classifiers. The hairpin sequences available are from the genomes of: Homo sapiens, Arabidopsis thaliana, Anopheles gambiae, Caenorhabditis elegans and Drosophila melanogaster. Each dataset provides the genome data divided into sequences and a set of computed features for predictions. Each sequence has one label: i) “positive”: meaning that it is a well-known pre-miRNA, according to miRBase v21; or ii) “unlabeled”: indicating that the sequence has not (yet) a known function and could be a possible candidate to novel pre-miRNA. Due to the fact that selecting an informative feature set is very important for a good pre-miRNA classifier, a representative feature set with large discriminative power has been calculated and it is provided, as well, for each genome. This feature set contains typical information about sequence, topology and structure. Dataset was publically shared in https://sourceforge.net/projects/sourcesinc/files/mirdata/.
format Online
Article
Text
id pubmed-6700487
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-67004872019-08-26 Genome-wide hairpins datasets of animals and plants for novel miRNA prediction Bugnon, L.A. Yones, C. Raad, J. Milone, D.H. Stegmayer, G. Data Brief Computer Science This article makes available several genome-wide datasets, which can be used for training microRNA (miRNA) classifiers. The hairpin sequences available are from the genomes of: Homo sapiens, Arabidopsis thaliana, Anopheles gambiae, Caenorhabditis elegans and Drosophila melanogaster. Each dataset provides the genome data divided into sequences and a set of computed features for predictions. Each sequence has one label: i) “positive”: meaning that it is a well-known pre-miRNA, according to miRBase v21; or ii) “unlabeled”: indicating that the sequence has not (yet) a known function and could be a possible candidate to novel pre-miRNA. Due to the fact that selecting an informative feature set is very important for a good pre-miRNA classifier, a representative feature set with large discriminative power has been calculated and it is provided, as well, for each genome. This feature set contains typical information about sequence, topology and structure. Dataset was publically shared in https://sourceforge.net/projects/sourcesinc/files/mirdata/. Elsevier 2019-07-03 /pmc/articles/PMC6700487/ /pubmed/31453279 http://dx.doi.org/10.1016/j.dib.2019.104209 Text en © 2019 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Computer Science
Bugnon, L.A.
Yones, C.
Raad, J.
Milone, D.H.
Stegmayer, G.
Genome-wide hairpins datasets of animals and plants for novel miRNA prediction
title Genome-wide hairpins datasets of animals and plants for novel miRNA prediction
title_full Genome-wide hairpins datasets of animals and plants for novel miRNA prediction
title_fullStr Genome-wide hairpins datasets of animals and plants for novel miRNA prediction
title_full_unstemmed Genome-wide hairpins datasets of animals and plants for novel miRNA prediction
title_short Genome-wide hairpins datasets of animals and plants for novel miRNA prediction
title_sort genome-wide hairpins datasets of animals and plants for novel mirna prediction
topic Computer Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6700487/
https://www.ncbi.nlm.nih.gov/pubmed/31453279
http://dx.doi.org/10.1016/j.dib.2019.104209
work_keys_str_mv AT bugnonla genomewidehairpinsdatasetsofanimalsandplantsfornovelmirnaprediction
AT yonesc genomewidehairpinsdatasetsofanimalsandplantsfornovelmirnaprediction
AT raadj genomewidehairpinsdatasetsofanimalsandplantsfornovelmirnaprediction
AT milonedh genomewidehairpinsdatasetsofanimalsandplantsfornovelmirnaprediction
AT stegmayerg genomewidehairpinsdatasetsofanimalsandplantsfornovelmirnaprediction