Cargando…

Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0

This paper describes a community effort to improve earlier versions of the full-text corpus of Genomics & Informatics by semi-automatically detecting and correcting PDF-to-text conversion errors and optical character recognition errors during the first hackathon of Genomics & Informatics Ann...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Sunho, Kim, Royoung, Nam, Hee-Jo, Kim, Ryeo-Gyeong, Ko, Enjin, Kim, Han-Su, Shin, Jihye, Cho, Daeun, Jin, Yurhee, Bae, Soyeon, Jo, Ye Won, Jeong, San Ah, Kim, Yena, Ahn, Seoyeon, Jang, Bomi, Seong, Jiheyon, Lee, Yujin, Seo, Si Eun, Kim, Yujin, Kim, Ha-Jeong, Kim, Hyeji, Sung, Hye-Lynn, Lho, Hyoyoung, Koo, Jaywon, Chu, Jion, Lim, Juwon, Kim, Youngju, Lee, Kyungyeon, Lim, Yuri, Kim, Meongeun, Hwang, Seonjeong, Han, Shinhye, Bae, Sohyeun, Kim, Sua, Yoo, Suhyeon, Seo, Yeonjeong, Shin, Yerim, Kim, Yonsoo, Ko, You-Jung, Baek, Jihee, Hyun, Hyejin, Choi, Hyemin, Oh, Ji-Hye, Kim, Da-Young, Park, Hyun-Seok
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korea Genome Organization 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7560450/
https://www.ncbi.nlm.nih.gov/pubmed/33017877
http://dx.doi.org/10.5808/GI.2020.18.3.e33
_version_ 1783595090434850816
author Kim, Sunho
Kim, Royoung
Nam, Hee-Jo
Kim, Ryeo-Gyeong
Ko, Enjin
Kim, Han-Su
Shin, Jihye
Cho, Daeun
Jin, Yurhee
Bae, Soyeon
Jo, Ye Won
Jeong, San Ah
Kim, Yena
Ahn, Seoyeon
Jang, Bomi
Seong, Jiheyon
Lee, Yujin
Seo, Si Eun
Kim, Yujin
Kim, Ha-Jeong
Kim, Hyeji
Sung, Hye-Lynn
Lho, Hyoyoung
Koo, Jaywon
Chu, Jion
Lim, Juwon
Kim, Youngju
Lee, Kyungyeon
Lim, Yuri
Kim, Meongeun
Hwang, Seonjeong
Han, Shinhye
Bae, Sohyeun
Kim, Sua
Yoo, Suhyeon
Seo, Yeonjeong
Shin, Yerim
Kim, Yonsoo
Ko, You-Jung
Baek, Jihee
Hyun, Hyejin
Choi, Hyemin
Oh, Ji-Hye
Kim, Da-Young
Park, Hyun-Seok
author_facet Kim, Sunho
Kim, Royoung
Nam, Hee-Jo
Kim, Ryeo-Gyeong
Ko, Enjin
Kim, Han-Su
Shin, Jihye
Cho, Daeun
Jin, Yurhee
Bae, Soyeon
Jo, Ye Won
Jeong, San Ah
Kim, Yena
Ahn, Seoyeon
Jang, Bomi
Seong, Jiheyon
Lee, Yujin
Seo, Si Eun
Kim, Yujin
Kim, Ha-Jeong
Kim, Hyeji
Sung, Hye-Lynn
Lho, Hyoyoung
Koo, Jaywon
Chu, Jion
Lim, Juwon
Kim, Youngju
Lee, Kyungyeon
Lim, Yuri
Kim, Meongeun
Hwang, Seonjeong
Han, Shinhye
Bae, Sohyeun
Kim, Sua
Yoo, Suhyeon
Seo, Yeonjeong
Shin, Yerim
Kim, Yonsoo
Ko, You-Jung
Baek, Jihee
Hyun, Hyejin
Choi, Hyemin
Oh, Ji-Hye
Kim, Da-Young
Park, Hyun-Seok
author_sort Kim, Sunho
collection PubMed
description This paper describes a community effort to improve earlier versions of the full-text corpus of Genomics & Informatics by semi-automatically detecting and correcting PDF-to-text conversion errors and optical character recognition errors during the first hackathon of Genomics & Informatics Annotation Hackathon (GIAH) event. Extracting text from multi-column biomedical documents such as Genomics & Informatics is known to be notoriously difficult. The hackathon was piloted as part of a coding competition of the ELTEC College of Engineering at Ewha Womans University in order to enable researchers and students to create or annotate their own versions of the Genomics & Informatics corpus, to gain and create knowledge about corpus linguistics, and simultaneously to acquire tangible and transferable skills. The proposed projects during the hackathon harness an internal database containing different versions of the corpus and annotations.
format Online
Article
Text
id pubmed-7560450
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Korea Genome Organization
record_format MEDLINE/PubMed
spelling pubmed-75604502020-10-21 Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0 Kim, Sunho Kim, Royoung Nam, Hee-Jo Kim, Ryeo-Gyeong Ko, Enjin Kim, Han-Su Shin, Jihye Cho, Daeun Jin, Yurhee Bae, Soyeon Jo, Ye Won Jeong, San Ah Kim, Yena Ahn, Seoyeon Jang, Bomi Seong, Jiheyon Lee, Yujin Seo, Si Eun Kim, Yujin Kim, Ha-Jeong Kim, Hyeji Sung, Hye-Lynn Lho, Hyoyoung Koo, Jaywon Chu, Jion Lim, Juwon Kim, Youngju Lee, Kyungyeon Lim, Yuri Kim, Meongeun Hwang, Seonjeong Han, Shinhye Bae, Sohyeun Kim, Sua Yoo, Suhyeon Seo, Yeonjeong Shin, Yerim Kim, Yonsoo Ko, You-Jung Baek, Jihee Hyun, Hyejin Choi, Hyemin Oh, Ji-Hye Kim, Da-Young Park, Hyun-Seok Genomics Inform Application Note This paper describes a community effort to improve earlier versions of the full-text corpus of Genomics & Informatics by semi-automatically detecting and correcting PDF-to-text conversion errors and optical character recognition errors during the first hackathon of Genomics & Informatics Annotation Hackathon (GIAH) event. Extracting text from multi-column biomedical documents such as Genomics & Informatics is known to be notoriously difficult. The hackathon was piloted as part of a coding competition of the ELTEC College of Engineering at Ewha Womans University in order to enable researchers and students to create or annotate their own versions of the Genomics & Informatics corpus, to gain and create knowledge about corpus linguistics, and simultaneously to acquire tangible and transferable skills. The proposed projects during the hackathon harness an internal database containing different versions of the corpus and annotations. Korea Genome Organization 2020-09-17 /pmc/articles/PMC7560450/ /pubmed/33017877 http://dx.doi.org/10.5808/GI.2020.18.3.e33 Text en (c) 2020, Korea Genome Organization (CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Application Note
Kim, Sunho
Kim, Royoung
Nam, Hee-Jo
Kim, Ryeo-Gyeong
Ko, Enjin
Kim, Han-Su
Shin, Jihye
Cho, Daeun
Jin, Yurhee
Bae, Soyeon
Jo, Ye Won
Jeong, San Ah
Kim, Yena
Ahn, Seoyeon
Jang, Bomi
Seong, Jiheyon
Lee, Yujin
Seo, Si Eun
Kim, Yujin
Kim, Ha-Jeong
Kim, Hyeji
Sung, Hye-Lynn
Lho, Hyoyoung
Koo, Jaywon
Chu, Jion
Lim, Juwon
Kim, Youngju
Lee, Kyungyeon
Lim, Yuri
Kim, Meongeun
Hwang, Seonjeong
Han, Shinhye
Bae, Sohyeun
Kim, Sua
Yoo, Suhyeon
Seo, Yeonjeong
Shin, Yerim
Kim, Yonsoo
Ko, You-Jung
Baek, Jihee
Hyun, Hyejin
Choi, Hyemin
Oh, Ji-Hye
Kim, Da-Young
Park, Hyun-Seok
Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0
title Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0
title_full Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0
title_fullStr Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0
title_full_unstemmed Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0
title_short Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0
title_sort organizing an in-class hackathon to correct pdf-to-text conversion errors of genomics & informatics 1.0
topic Application Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7560450/
https://www.ncbi.nlm.nih.gov/pubmed/33017877
http://dx.doi.org/10.5808/GI.2020.18.3.e33
work_keys_str_mv AT kimsunho organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT kimroyoung organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT namheejo organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT kimryeogyeong organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT koenjin organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT kimhansu organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT shinjihye organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT chodaeun organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT jinyurhee organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT baesoyeon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT joyewon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT jeongsanah organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT kimyena organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT ahnseoyeon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT jangbomi organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT seongjiheyon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT leeyujin organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT seosieun organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT kimyujin organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT kimhajeong organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT kimhyeji organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT sunghyelynn organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT lhohyoyoung organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT koojaywon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT chujion organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT limjuwon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT kimyoungju organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT leekyungyeon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT limyuri organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT kimmeongeun organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT hwangseonjeong organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT hanshinhye organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT baesohyeun organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT kimsua organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT yoosuhyeon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT seoyeonjeong organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT shinyerim organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT kimyonsoo organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT koyoujung organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT baekjihee organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT hyunhyejin organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT choihyemin organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT ohjihye organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT kimdayoung organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10
AT parkhyunseok organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10