Cargando…

Named Entity Recognition for Bacterial Type IV Secretion Systems

Research on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes...

Descripción completa

Detalles Bibliográficos
Autores principales: Ananiadou, Sophia, Sullivan, Dan, Black, William, Levow, Gina-Anne, Gillespie, Joseph J., Mao, Chunhong, Pyysalo, Sampo, Kolluru, BalaKrishna, Tsujii, Junichi, Sobral, Bruno
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3066171/
https://www.ncbi.nlm.nih.gov/pubmed/21468321
http://dx.doi.org/10.1371/journal.pone.0014780
_version_ 1782201054846255104
author Ananiadou, Sophia
Sullivan, Dan
Black, William
Levow, Gina-Anne
Gillespie, Joseph J.
Mao, Chunhong
Pyysalo, Sampo
Kolluru, BalaKrishna
Tsujii, Junichi
Sobral, Bruno
author_facet Ananiadou, Sophia
Sullivan, Dan
Black, William
Levow, Gina-Anne
Gillespie, Joseph J.
Mao, Chunhong
Pyysalo, Sampo
Kolluru, BalaKrishna
Tsujii, Junichi
Sobral, Bruno
author_sort Ananiadou, Sophia
collection PubMed
description Research on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes, cellular components, molecular functions, and microorganism species should improve the precision and recall of literature searches allowing researchers to keep up with the exponentially growing literature, through resources such as the Pathosystems Resource Integration Center (PATRIC, patricbrc.org). We developed named entity recognition (NER) tools for four entities related to Type IV secretion systems: 1) bacteria names, 2) biological processes, 3) molecular functions, and 4) cellular components. These four entities are important to pathogenesis and virulence research but have received less attention than other entities, e.g., genes and proteins. Based on an annotated corpus, large domain terminological resources, and machine learning techniques, we developed recognizers for these entities. High accuracy rates (>80%) are achieved for bacteria, biological processes, and molecular function. Contrastive experiments highlighted the effectiveness of alternate recognition strategies; results of term extraction on contrasting document sets demonstrated the utility of these classes for identifying T4SS-related documents.
format Text
id pubmed-3066171
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-30661712011-04-05 Named Entity Recognition for Bacterial Type IV Secretion Systems Ananiadou, Sophia Sullivan, Dan Black, William Levow, Gina-Anne Gillespie, Joseph J. Mao, Chunhong Pyysalo, Sampo Kolluru, BalaKrishna Tsujii, Junichi Sobral, Bruno PLoS One Research Article Research on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes, cellular components, molecular functions, and microorganism species should improve the precision and recall of literature searches allowing researchers to keep up with the exponentially growing literature, through resources such as the Pathosystems Resource Integration Center (PATRIC, patricbrc.org). We developed named entity recognition (NER) tools for four entities related to Type IV secretion systems: 1) bacteria names, 2) biological processes, 3) molecular functions, and 4) cellular components. These four entities are important to pathogenesis and virulence research but have received less attention than other entities, e.g., genes and proteins. Based on an annotated corpus, large domain terminological resources, and machine learning techniques, we developed recognizers for these entities. High accuracy rates (>80%) are achieved for bacteria, biological processes, and molecular function. Contrastive experiments highlighted the effectiveness of alternate recognition strategies; results of term extraction on contrasting document sets demonstrated the utility of these classes for identifying T4SS-related documents. Public Library of Science 2011-03-29 /pmc/articles/PMC3066171/ /pubmed/21468321 http://dx.doi.org/10.1371/journal.pone.0014780 Text en Ananiadou et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Ananiadou, Sophia
Sullivan, Dan
Black, William
Levow, Gina-Anne
Gillespie, Joseph J.
Mao, Chunhong
Pyysalo, Sampo
Kolluru, BalaKrishna
Tsujii, Junichi
Sobral, Bruno
Named Entity Recognition for Bacterial Type IV Secretion Systems
title Named Entity Recognition for Bacterial Type IV Secretion Systems
title_full Named Entity Recognition for Bacterial Type IV Secretion Systems
title_fullStr Named Entity Recognition for Bacterial Type IV Secretion Systems
title_full_unstemmed Named Entity Recognition for Bacterial Type IV Secretion Systems
title_short Named Entity Recognition for Bacterial Type IV Secretion Systems
title_sort named entity recognition for bacterial type iv secretion systems
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3066171/
https://www.ncbi.nlm.nih.gov/pubmed/21468321
http://dx.doi.org/10.1371/journal.pone.0014780
work_keys_str_mv AT ananiadousophia namedentityrecognitionforbacterialtypeivsecretionsystems
AT sullivandan namedentityrecognitionforbacterialtypeivsecretionsystems
AT blackwilliam namedentityrecognitionforbacterialtypeivsecretionsystems
AT levowginaanne namedentityrecognitionforbacterialtypeivsecretionsystems
AT gillespiejosephj namedentityrecognitionforbacterialtypeivsecretionsystems
AT maochunhong namedentityrecognitionforbacterialtypeivsecretionsystems
AT pyysalosampo namedentityrecognitionforbacterialtypeivsecretionsystems
AT kollurubalakrishna namedentityrecognitionforbacterialtypeivsecretionsystems
AT tsujiijunichi namedentityrecognitionforbacterialtypeivsecretionsystems
AT sobralbruno namedentityrecognitionforbacterialtypeivsecretionsystems