Cargando…

CustFRE: An annotated dataset for extraction of family relations from English text

Meaningful Information extraction is an extremely important and challenging task due to the ever growing size of data. Training and evaluating automated systems for the task requires annotated datasets which are rarely available because of the great amount of human effort and time required for annot...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mumtaz, Raabia, Qadir, Muhammad Abdul, Saeed, Asif
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Elsevier 2022
Materias:	Data Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8885562/ https://www.ncbi.nlm.nih.gov/pubmed/35242953 http://dx.doi.org/10.1016/j.dib.2022.107980

_version_	1784660452248125440
author	Mumtaz, Raabia Qadir, Muhammad Abdul Saeed, Asif
author_facet	Mumtaz, Raabia Qadir, Muhammad Abdul Saeed, Asif
author_sort	Mumtaz, Raabia
collection	PubMed
description	Meaningful Information extraction is an extremely important and challenging task due to the ever growing size of data. Training and evaluating automated systems for the task requires annotated datasets which are rarely available because of the great amount of human effort and time required for annotating data. The dataset described in this manuscript, CustFRE, is meant for systems that learn extracting family relations from text. Sentences having at least two persons have been collected from the internet. The texts are first processed using Stanford's NLP pipeline for basic NLP tagging. Next, a team of natural language processing experts annotated the dataset. All family relations among persons in the texts have been annotated, or a no_relation is annotated if no family relation between two persons can be inferred from the text. After annotation, the dataset was verified by an NLP expert for completeness and correctness. CustFRE contains in total 2,716 annotations. The dataset can be used by information extraction researchers as a benchmark for evaluating their systems, and can also be used for training and evaluating family relation extraction systems.
format	Online Article Text
id	pubmed-8885562
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Elsevier
record_format	MEDLINE/PubMed
spelling	pubmed-88855622022-03-02 CustFRE: An annotated dataset for extraction of family relations from English text Mumtaz, Raabia Qadir, Muhammad Abdul Saeed, Asif Data Brief Data Article Meaningful Information extraction is an extremely important and challenging task due to the ever growing size of data. Training and evaluating automated systems for the task requires annotated datasets which are rarely available because of the great amount of human effort and time required for annotating data. The dataset described in this manuscript, CustFRE, is meant for systems that learn extracting family relations from text. Sentences having at least two persons have been collected from the internet. The texts are first processed using Stanford's NLP pipeline for basic NLP tagging. Next, a team of natural language processing experts annotated the dataset. All family relations among persons in the texts have been annotated, or a no_relation is annotated if no family relation between two persons can be inferred from the text. After annotation, the dataset was verified by an NLP expert for completeness and correctness. CustFRE contains in total 2,716 annotations. The dataset can be used by information extraction researchers as a benchmark for evaluating their systems, and can also be used for training and evaluating family relation extraction systems. Elsevier 2022-02-19 /pmc/articles/PMC8885562/ /pubmed/35242953 http://dx.doi.org/10.1016/j.dib.2022.107980 Text en © 2022 The Authors. Published by Elsevier Inc. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Data Article Mumtaz, Raabia Qadir, Muhammad Abdul Saeed, Asif CustFRE: An annotated dataset for extraction of family relations from English text
title	CustFRE: An annotated dataset for extraction of family relations from English text
title_full	CustFRE: An annotated dataset for extraction of family relations from English text
title_fullStr	CustFRE: An annotated dataset for extraction of family relations from English text
title_full_unstemmed	CustFRE: An annotated dataset for extraction of family relations from English text
title_short	CustFRE: An annotated dataset for extraction of family relations from English text
title_sort	custfre: an annotated dataset for extraction of family relations from english text
topic	Data Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8885562/ https://www.ncbi.nlm.nih.gov/pubmed/35242953 http://dx.doi.org/10.1016/j.dib.2022.107980
work_keys_str_mv	AT mumtazraabia custfreanannotateddatasetforextractionoffamilyrelationsfromenglishtext AT qadirmuhammadabdul custfreanannotateddatasetforextractionoffamilyrelationsfromenglishtext AT saeedasif custfreanannotateddatasetforextractionoffamilyrelationsfromenglishtext

CustFRE: An annotated dataset for extraction of family relations from English text

Ejemplares similares