Cargando…

Recognition of bacteria named entity using conditional random fields in Spark

BACKGROUND: Microbe plays a crucial role in the functional mechanism of an ecosystem. Identification of the interactions among microbes is an important step towards understand the structure and function of microbial communities, as well as of the impact of microbes on human health and disease. Despi...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Xiaoyan, Li, Yichuan, He, Tingting, Jiang, Xingpeng, Hu, Xiaohua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6249713/
https://www.ncbi.nlm.nih.gov/pubmed/30463540
http://dx.doi.org/10.1186/s12918-018-0625-3
_version_ 1783372798358454272
author Wang, Xiaoyan
Li, Yichuan
He, Tingting
Jiang, Xingpeng
Hu, Xiaohua
author_facet Wang, Xiaoyan
Li, Yichuan
He, Tingting
Jiang, Xingpeng
Hu, Xiaohua
author_sort Wang, Xiaoyan
collection PubMed
description BACKGROUND: Microbe plays a crucial role in the functional mechanism of an ecosystem. Identification of the interactions among microbes is an important step towards understand the structure and function of microbial communities, as well as of the impact of microbes on human health and disease. Despite the importance of it, there is not a gold-standard dataset of microbial interactions currently. Traditional approaches such as growth and co-culture analysis need to be performed in the laboratory, which are time-consuming and costly. By providing predicted candidate interactions to experimental verification, computational methods are able to alleviate this problem. Mining microbial interactions from mass medical texts is one type of computational methods. Identification of the named entity of bacteria and related entities from the text is the basis for microbial relation extraction. In the previous work, a system of bacteria named entities recognition based on the dictionary and conditional random field was proposed. However, it is inefficient when dealing with large-scale text. RESULTS: We implemented bacteria named entity recognition on Spark platform and designed experiments for comparison to verify the correctness and validity of the proposed system. The experimental results show that it can achieve higher F-Measure on the comparison of correctness. Moreover, the predicting speed is much faster than the previous version in large-scale biomedical datasets, and the computational efficiency is improved remarkably by about 3.1 to 6.7 times. CONCLUSIONS: The system for bacteria named entity recognition solves the inefficiency of the previous proposed system on large-scale datasets. The proposed system has good performance in accuracy and scalability.
format Online
Article
Text
id pubmed-6249713
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62497132018-11-26 Recognition of bacteria named entity using conditional random fields in Spark Wang, Xiaoyan Li, Yichuan He, Tingting Jiang, Xingpeng Hu, Xiaohua BMC Syst Biol Research BACKGROUND: Microbe plays a crucial role in the functional mechanism of an ecosystem. Identification of the interactions among microbes is an important step towards understand the structure and function of microbial communities, as well as of the impact of microbes on human health and disease. Despite the importance of it, there is not a gold-standard dataset of microbial interactions currently. Traditional approaches such as growth and co-culture analysis need to be performed in the laboratory, which are time-consuming and costly. By providing predicted candidate interactions to experimental verification, computational methods are able to alleviate this problem. Mining microbial interactions from mass medical texts is one type of computational methods. Identification of the named entity of bacteria and related entities from the text is the basis for microbial relation extraction. In the previous work, a system of bacteria named entities recognition based on the dictionary and conditional random field was proposed. However, it is inefficient when dealing with large-scale text. RESULTS: We implemented bacteria named entity recognition on Spark platform and designed experiments for comparison to verify the correctness and validity of the proposed system. The experimental results show that it can achieve higher F-Measure on the comparison of correctness. Moreover, the predicting speed is much faster than the previous version in large-scale biomedical datasets, and the computational efficiency is improved remarkably by about 3.1 to 6.7 times. CONCLUSIONS: The system for bacteria named entity recognition solves the inefficiency of the previous proposed system on large-scale datasets. The proposed system has good performance in accuracy and scalability. BioMed Central 2018-11-22 /pmc/articles/PMC6249713/ /pubmed/30463540 http://dx.doi.org/10.1186/s12918-018-0625-3 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Wang, Xiaoyan
Li, Yichuan
He, Tingting
Jiang, Xingpeng
Hu, Xiaohua
Recognition of bacteria named entity using conditional random fields in Spark
title Recognition of bacteria named entity using conditional random fields in Spark
title_full Recognition of bacteria named entity using conditional random fields in Spark
title_fullStr Recognition of bacteria named entity using conditional random fields in Spark
title_full_unstemmed Recognition of bacteria named entity using conditional random fields in Spark
title_short Recognition of bacteria named entity using conditional random fields in Spark
title_sort recognition of bacteria named entity using conditional random fields in spark
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6249713/
https://www.ncbi.nlm.nih.gov/pubmed/30463540
http://dx.doi.org/10.1186/s12918-018-0625-3
work_keys_str_mv AT wangxiaoyan recognitionofbacterianamedentityusingconditionalrandomfieldsinspark
AT liyichuan recognitionofbacterianamedentityusingconditionalrandomfieldsinspark
AT hetingting recognitionofbacterianamedentityusingconditionalrandomfieldsinspark
AT jiangxingpeng recognitionofbacterianamedentityusingconditionalrandomfieldsinspark
AT huxiaohua recognitionofbacterianamedentityusingconditionalrandomfieldsinspark