Cargando…

Harvesting Patterns from Textual Web Sources with Tolerance Rough Sets

Construction of knowledge repositories from web corpora by harvesting linguistic patterns is of benefit for many natural language-processing applications that rely on question-answering schemes. These methods require minimal or no human intervention and can recursively learn new relational facts-ins...

Descripción completa

Detalles Bibliográficos
Autores principales: Moghaddam, Hoora Rezaei, Ramanna, Sheela
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7318947/
https://www.ncbi.nlm.nih.gov/pubmed/32835308
http://dx.doi.org/10.1016/j.patter.2020.100053
_version_ 1783550962060754944
author Moghaddam, Hoora Rezaei
Ramanna, Sheela
author_facet Moghaddam, Hoora Rezaei
Ramanna, Sheela
author_sort Moghaddam, Hoora Rezaei
collection PubMed
description Construction of knowledge repositories from web corpora by harvesting linguistic patterns is of benefit for many natural language-processing applications that rely on question-answering schemes. These methods require minimal or no human intervention and can recursively learn new relational facts-instances in a fully automated and scalable manner. This paper explores the performance of tolerance rough set-based learner with respect to two important issues: scalability and its effect on concept drift, by (1) designing a new version of the semi-supervised tolerance rough set-based pattern learner (TPL 2.0), (2) adapting a tolerance form of rough set methodology to categorize linguistic patterns, and (3) extracting categorical information from a large noisy dataset of crawled web pages. This work demonstrates that the TPL 2.0 learner is promising in terms of precision@30 metric when compared with three benchmark algorithms: Tolerant Pattern Learner 1.0, Fuzzy-Rough Set Pattern Learner, and Coupled Bayesian Sets-based learner.
format Online
Article
Text
id pubmed-7318947
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-73189472020-06-29 Harvesting Patterns from Textual Web Sources with Tolerance Rough Sets Moghaddam, Hoora Rezaei Ramanna, Sheela Patterns (N Y) Article Construction of knowledge repositories from web corpora by harvesting linguistic patterns is of benefit for many natural language-processing applications that rely on question-answering schemes. These methods require minimal or no human intervention and can recursively learn new relational facts-instances in a fully automated and scalable manner. This paper explores the performance of tolerance rough set-based learner with respect to two important issues: scalability and its effect on concept drift, by (1) designing a new version of the semi-supervised tolerance rough set-based pattern learner (TPL 2.0), (2) adapting a tolerance form of rough set methodology to categorize linguistic patterns, and (3) extracting categorical information from a large noisy dataset of crawled web pages. This work demonstrates that the TPL 2.0 learner is promising in terms of precision@30 metric when compared with three benchmark algorithms: Tolerant Pattern Learner 1.0, Fuzzy-Rough Set Pattern Learner, and Coupled Bayesian Sets-based learner. Elsevier 2020-06-26 /pmc/articles/PMC7318947/ /pubmed/32835308 http://dx.doi.org/10.1016/j.patter.2020.100053 Text en © 2020 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Moghaddam, Hoora Rezaei
Ramanna, Sheela
Harvesting Patterns from Textual Web Sources with Tolerance Rough Sets
title Harvesting Patterns from Textual Web Sources with Tolerance Rough Sets
title_full Harvesting Patterns from Textual Web Sources with Tolerance Rough Sets
title_fullStr Harvesting Patterns from Textual Web Sources with Tolerance Rough Sets
title_full_unstemmed Harvesting Patterns from Textual Web Sources with Tolerance Rough Sets
title_short Harvesting Patterns from Textual Web Sources with Tolerance Rough Sets
title_sort harvesting patterns from textual web sources with tolerance rough sets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7318947/
https://www.ncbi.nlm.nih.gov/pubmed/32835308
http://dx.doi.org/10.1016/j.patter.2020.100053
work_keys_str_mv AT moghaddamhoorarezaei harvestingpatternsfromtextualwebsourceswithtoleranceroughsets
AT ramannasheela harvestingpatternsfromtextualwebsourceswithtoleranceroughsets