Cargando…

GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction

Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic...

Descripción completa

Detalles Bibliográficos
Autores principales: Oh, So-Yeon, Kim, Ji-Hyeon, Kim, Seo-Jin, Nam, Hee-Jo, Park, Hyun-Seok
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Society of Gastrointestinal Intervention 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6187819/
https://www.ncbi.nlm.nih.gov/pubmed/30309207
http://dx.doi.org/10.5808/GI.2018.16.3.75
_version_ 1783363093431058432
author Oh, So-Yeon
Kim, Ji-Hyeon
Kim, Seo-Jin
Nam, Hee-Jo
Park, Hyun-Seok
author_facet Oh, So-Yeon
Kim, Ji-Hyeon
Kim, Seo-Jin
Nam, Hee-Jo
Park, Hyun-Seok
author_sort Oh, So-Yeon
collection PubMed
description Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining.
format Online
Article
Text
id pubmed-6187819
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Society of Gastrointestinal Intervention
record_format MEDLINE/PubMed
spelling pubmed-61878192018-10-17 GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction Oh, So-Yeon Kim, Ji-Hyeon Kim, Seo-Jin Nam, Hee-Jo Park, Hyun-Seok Genomics Inform Application Note Genomics & Informatics (NLM title abbreviation: Genomics Inform) is the official journal of the Korea Genome Organization. Text corpus for this journal annotated with various levels of linguistic information would be a valuable resource as the process of information extraction requires syntactic, semantic, and higher levels of natural language processing. In this study, we publish our new corpus called GNI Corpus version 1.0, extracted and annotated from full texts of Genomics & Informatics, with NLTK (Natural Language ToolKit)-based text mining script. The preliminary version of the corpus could be used as a training and testing set of a system that serves a variety of functions for future biomedical text mining. Society of Gastrointestinal Intervention 2018-09 2018-09-30 /pmc/articles/PMC6187819/ /pubmed/30309207 http://dx.doi.org/10.5808/GI.2018.16.3.75 Text en Copyright © 2018 by the Korea Genome Organization It is identical to the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/).
spellingShingle Application Note
Oh, So-Yeon
Kim, Ji-Hyeon
Kim, Seo-Jin
Nam, Hee-Jo
Park, Hyun-Seok
GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction
title GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction
title_full GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction
title_fullStr GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction
title_full_unstemmed GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction
title_short GNI Corpus Version 1.0: Annotated Full-Text Corpus of Genomics & Informatics to Support Biomedical Information Extraction
title_sort gni corpus version 1.0: annotated full-text corpus of genomics & informatics to support biomedical information extraction
topic Application Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6187819/
https://www.ncbi.nlm.nih.gov/pubmed/30309207
http://dx.doi.org/10.5808/GI.2018.16.3.75
work_keys_str_mv AT ohsoyeon gnicorpusversion10annotatedfulltextcorpusofgenomicsinformaticstosupportbiomedicalinformationextraction
AT kimjihyeon gnicorpusversion10annotatedfulltextcorpusofgenomicsinformaticstosupportbiomedicalinformationextraction
AT kimseojin gnicorpusversion10annotatedfulltextcorpusofgenomicsinformaticstosupportbiomedicalinformationextraction
AT namheejo gnicorpusversion10annotatedfulltextcorpusofgenomicsinformaticstosupportbiomedicalinformationextraction
AT parkhyunseok gnicorpusversion10annotatedfulltextcorpusofgenomicsinformaticstosupportbiomedicalinformationextraction