Cargando…

Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature

The detection of MicroRNA (miRNA) mentions in scientific literature facilitates researchers with the ability to find relevant and appropriate literature based on queries formulated using miRNA information. Considering most published biological studies elaborated on signal transduction pathways or ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Dai, Hong-Jie, Wang, Chen-Kai, Chang, Nai-Wen, Huang, Ming-Siang, Jonnagaddala, Jitendra, Wang, Feng-Duo, Hsu, Wen-Lian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6391575/
https://www.ncbi.nlm.nih.gov/pubmed/30809637
http://dx.doi.org/10.1093/database/baz030
_version_ 1783398335811420160
author Dai, Hong-Jie
Wang, Chen-Kai
Chang, Nai-Wen
Huang, Ming-Siang
Jonnagaddala, Jitendra
Wang, Feng-Duo
Hsu, Wen-Lian
author_facet Dai, Hong-Jie
Wang, Chen-Kai
Chang, Nai-Wen
Huang, Ming-Siang
Jonnagaddala, Jitendra
Wang, Feng-Duo
Hsu, Wen-Lian
author_sort Dai, Hong-Jie
collection PubMed
description The detection of MicroRNA (miRNA) mentions in scientific literature facilitates researchers with the ability to find relevant and appropriate literature based on queries formulated using miRNA information. Considering most published biological studies elaborated on signal transduction pathways or genetic regulatory information in the form of figure captions, the extraction of miRNA from both the main content and figure captions of a manuscript is useful in aggregate analysis and comparative analysis of the studies published. In this study, we present a statistical principle-based miRNA recognition and normalization method to identify miRNAs and link them to the identifiers in the Rfam database. As one of the core components in the text mining pipeline of the database miRTarBase, the proposed method combined the advantages of previous works relying on pattern, dictionary and supervised learning and provided an integrated solution for the problem of miRNA identification. Furthermore, the knowledge learned from the training data was organized in a human-interpretable manner to understand the reason why the system considers a span of text as a miRNA mention, and the represented knowledge can be further complemented by domain experts. We studied the ambiguity level of miRNA nomenclature to connect the miRNA mentions to the Rfam database and evaluated the performance of our approach on two datasets: the BioCreative VI Bio-ID corpus and the miRNA interaction corpus by extending the later corpus with additional Rfam normalization information. Our study highlights and also proposes a better understanding of the challenges associated with miRNA identification and normalization in scientific literature and the research gap that needs to be further explored in prospective studies.
format Online
Article
Text
id pubmed-6391575
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-63915752019-03-04 Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature Dai, Hong-Jie Wang, Chen-Kai Chang, Nai-Wen Huang, Ming-Siang Jonnagaddala, Jitendra Wang, Feng-Duo Hsu, Wen-Lian Database (Oxford) Original Article The detection of MicroRNA (miRNA) mentions in scientific literature facilitates researchers with the ability to find relevant and appropriate literature based on queries formulated using miRNA information. Considering most published biological studies elaborated on signal transduction pathways or genetic regulatory information in the form of figure captions, the extraction of miRNA from both the main content and figure captions of a manuscript is useful in aggregate analysis and comparative analysis of the studies published. In this study, we present a statistical principle-based miRNA recognition and normalization method to identify miRNAs and link them to the identifiers in the Rfam database. As one of the core components in the text mining pipeline of the database miRTarBase, the proposed method combined the advantages of previous works relying on pattern, dictionary and supervised learning and provided an integrated solution for the problem of miRNA identification. Furthermore, the knowledge learned from the training data was organized in a human-interpretable manner to understand the reason why the system considers a span of text as a miRNA mention, and the represented knowledge can be further complemented by domain experts. We studied the ambiguity level of miRNA nomenclature to connect the miRNA mentions to the Rfam database and evaluated the performance of our approach on two datasets: the BioCreative VI Bio-ID corpus and the miRNA interaction corpus by extending the later corpus with additional Rfam normalization information. Our study highlights and also proposes a better understanding of the challenges associated with miRNA identification and normalization in scientific literature and the research gap that needs to be further explored in prospective studies. Oxford University Press 2019-02-27 /pmc/articles/PMC6391575/ /pubmed/30809637 http://dx.doi.org/10.1093/database/baz030 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Dai, Hong-Jie
Wang, Chen-Kai
Chang, Nai-Wen
Huang, Ming-Siang
Jonnagaddala, Jitendra
Wang, Feng-Duo
Hsu, Wen-Lian
Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature
title Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature
title_full Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature
title_fullStr Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature
title_full_unstemmed Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature
title_short Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature
title_sort statistical principle-based approach for recognizing and normalizing micrornas described in scientific literature
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6391575/
https://www.ncbi.nlm.nih.gov/pubmed/30809637
http://dx.doi.org/10.1093/database/baz030
work_keys_str_mv AT daihongjie statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature
AT wangchenkai statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature
AT changnaiwen statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature
AT huangmingsiang statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature
AT jonnagaddalajitendra statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature
AT wangfengduo statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature
AT hsuwenlian statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature