Cargando…

Inferring Restricted Regular Expressions with Interleaving from Positive and Negative Samples

The presence of a schema for XML documents has numerous advantages. Unfortunately, many XML documents in practice are not accompanied by a schema or a valid schema. Therefore, it is essential to devise algorithms to infer schemas. The fundamental task in XML schema inference is to learn regular expr...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yeting, Chen, Haiming, Zhang, Lingqi, Huang, Bo, Zhang, Jianzhao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206278/
http://dx.doi.org/10.1007/978-3-030-47436-2_58
_version_ 1783530384337666048
author Li, Yeting
Chen, Haiming
Zhang, Lingqi
Huang, Bo
Zhang, Jianzhao
author_facet Li, Yeting
Chen, Haiming
Zhang, Lingqi
Huang, Bo
Zhang, Jianzhao
author_sort Li, Yeting
collection PubMed
description The presence of a schema for XML documents has numerous advantages. Unfortunately, many XML documents in practice are not accompanied by a schema or a valid schema. Therefore, it is essential to devise algorithms to infer schemas. The fundamental task in XML schema inference is to learn regular expressions. In this paper, we focus on learning the subclass of RE(&) called SIREs (the subclass of regular expressions with interleaving). Previous work in this direction lacks inference algorithms that support inference from positive and negative examples. We provide an algorithm to learn SIREs from positive and negative examples based on genetic algorithms and parallel techniques. Our algorithm also has better expansibility, which means that our algorithm not only supports learning with positive and negative examples, but also supports learning with positive or negative examples only. Experimental results demonstrate the effectiveness of our algorithm.
format Online
Article
Text
id pubmed-7206278
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72062782020-05-08 Inferring Restricted Regular Expressions with Interleaving from Positive and Negative Samples Li, Yeting Chen, Haiming Zhang, Lingqi Huang, Bo Zhang, Jianzhao Advances in Knowledge Discovery and Data Mining Article The presence of a schema for XML documents has numerous advantages. Unfortunately, many XML documents in practice are not accompanied by a schema or a valid schema. Therefore, it is essential to devise algorithms to infer schemas. The fundamental task in XML schema inference is to learn regular expressions. In this paper, we focus on learning the subclass of RE(&) called SIREs (the subclass of regular expressions with interleaving). Previous work in this direction lacks inference algorithms that support inference from positive and negative examples. We provide an algorithm to learn SIREs from positive and negative examples based on genetic algorithms and parallel techniques. Our algorithm also has better expansibility, which means that our algorithm not only supports learning with positive and negative examples, but also supports learning with positive or negative examples only. Experimental results demonstrate the effectiveness of our algorithm. 2020-04-17 /pmc/articles/PMC7206278/ http://dx.doi.org/10.1007/978-3-030-47436-2_58 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Li, Yeting
Chen, Haiming
Zhang, Lingqi
Huang, Bo
Zhang, Jianzhao
Inferring Restricted Regular Expressions with Interleaving from Positive and Negative Samples
title Inferring Restricted Regular Expressions with Interleaving from Positive and Negative Samples
title_full Inferring Restricted Regular Expressions with Interleaving from Positive and Negative Samples
title_fullStr Inferring Restricted Regular Expressions with Interleaving from Positive and Negative Samples
title_full_unstemmed Inferring Restricted Regular Expressions with Interleaving from Positive and Negative Samples
title_short Inferring Restricted Regular Expressions with Interleaving from Positive and Negative Samples
title_sort inferring restricted regular expressions with interleaving from positive and negative samples
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206278/
http://dx.doi.org/10.1007/978-3-030-47436-2_58
work_keys_str_mv AT liyeting inferringrestrictedregularexpressionswithinterleavingfrompositiveandnegativesamples
AT chenhaiming inferringrestrictedregularexpressionswithinterleavingfrompositiveandnegativesamples
AT zhanglingqi inferringrestrictedregularexpressionswithinterleavingfrompositiveandnegativesamples
AT huangbo inferringrestrictedregularexpressionswithinterleavingfrompositiveandnegativesamples
AT zhangjianzhao inferringrestrictedregularexpressionswithinterleavingfrompositiveandnegativesamples