Cargando…
Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature
The detection of MicroRNA (miRNA) mentions in scientific literature facilitates researchers with the ability to find relevant and appropriate literature based on queries formulated using miRNA information. Considering most published biological studies elaborated on signal transduction pathways or ge...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6391575/ https://www.ncbi.nlm.nih.gov/pubmed/30809637 http://dx.doi.org/10.1093/database/baz030 |
_version_ | 1783398335811420160 |
---|---|
author | Dai, Hong-Jie Wang, Chen-Kai Chang, Nai-Wen Huang, Ming-Siang Jonnagaddala, Jitendra Wang, Feng-Duo Hsu, Wen-Lian |
author_facet | Dai, Hong-Jie Wang, Chen-Kai Chang, Nai-Wen Huang, Ming-Siang Jonnagaddala, Jitendra Wang, Feng-Duo Hsu, Wen-Lian |
author_sort | Dai, Hong-Jie |
collection | PubMed |
description | The detection of MicroRNA (miRNA) mentions in scientific literature facilitates researchers with the ability to find relevant and appropriate literature based on queries formulated using miRNA information. Considering most published biological studies elaborated on signal transduction pathways or genetic regulatory information in the form of figure captions, the extraction of miRNA from both the main content and figure captions of a manuscript is useful in aggregate analysis and comparative analysis of the studies published. In this study, we present a statistical principle-based miRNA recognition and normalization method to identify miRNAs and link them to the identifiers in the Rfam database. As one of the core components in the text mining pipeline of the database miRTarBase, the proposed method combined the advantages of previous works relying on pattern, dictionary and supervised learning and provided an integrated solution for the problem of miRNA identification. Furthermore, the knowledge learned from the training data was organized in a human-interpretable manner to understand the reason why the system considers a span of text as a miRNA mention, and the represented knowledge can be further complemented by domain experts. We studied the ambiguity level of miRNA nomenclature to connect the miRNA mentions to the Rfam database and evaluated the performance of our approach on two datasets: the BioCreative VI Bio-ID corpus and the miRNA interaction corpus by extending the later corpus with additional Rfam normalization information. Our study highlights and also proposes a better understanding of the challenges associated with miRNA identification and normalization in scientific literature and the research gap that needs to be further explored in prospective studies. |
format | Online Article Text |
id | pubmed-6391575 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-63915752019-03-04 Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature Dai, Hong-Jie Wang, Chen-Kai Chang, Nai-Wen Huang, Ming-Siang Jonnagaddala, Jitendra Wang, Feng-Duo Hsu, Wen-Lian Database (Oxford) Original Article The detection of MicroRNA (miRNA) mentions in scientific literature facilitates researchers with the ability to find relevant and appropriate literature based on queries formulated using miRNA information. Considering most published biological studies elaborated on signal transduction pathways or genetic regulatory information in the form of figure captions, the extraction of miRNA from both the main content and figure captions of a manuscript is useful in aggregate analysis and comparative analysis of the studies published. In this study, we present a statistical principle-based miRNA recognition and normalization method to identify miRNAs and link them to the identifiers in the Rfam database. As one of the core components in the text mining pipeline of the database miRTarBase, the proposed method combined the advantages of previous works relying on pattern, dictionary and supervised learning and provided an integrated solution for the problem of miRNA identification. Furthermore, the knowledge learned from the training data was organized in a human-interpretable manner to understand the reason why the system considers a span of text as a miRNA mention, and the represented knowledge can be further complemented by domain experts. We studied the ambiguity level of miRNA nomenclature to connect the miRNA mentions to the Rfam database and evaluated the performance of our approach on two datasets: the BioCreative VI Bio-ID corpus and the miRNA interaction corpus by extending the later corpus with additional Rfam normalization information. Our study highlights and also proposes a better understanding of the challenges associated with miRNA identification and normalization in scientific literature and the research gap that needs to be further explored in prospective studies. Oxford University Press 2019-02-27 /pmc/articles/PMC6391575/ /pubmed/30809637 http://dx.doi.org/10.1093/database/baz030 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Dai, Hong-Jie Wang, Chen-Kai Chang, Nai-Wen Huang, Ming-Siang Jonnagaddala, Jitendra Wang, Feng-Duo Hsu, Wen-Lian Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature |
title | Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature |
title_full | Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature |
title_fullStr | Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature |
title_full_unstemmed | Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature |
title_short | Statistical principle-based approach for recognizing and normalizing microRNAs described in scientific literature |
title_sort | statistical principle-based approach for recognizing and normalizing micrornas described in scientific literature |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6391575/ https://www.ncbi.nlm.nih.gov/pubmed/30809637 http://dx.doi.org/10.1093/database/baz030 |
work_keys_str_mv | AT daihongjie statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature AT wangchenkai statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature AT changnaiwen statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature AT huangmingsiang statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature AT jonnagaddalajitendra statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature AT wangfengduo statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature AT hsuwenlian statisticalprinciplebasedapproachforrecognizingandnormalizingmicrornasdescribedinscientificliterature |