Cargando…
Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0
This paper describes a community effort to improve earlier versions of the full-text corpus of Genomics & Informatics by semi-automatically detecting and correcting PDF-to-text conversion errors and optical character recognition errors during the first hackathon of Genomics & Informatics Ann...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Korea Genome Organization
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7560450/ https://www.ncbi.nlm.nih.gov/pubmed/33017877 http://dx.doi.org/10.5808/GI.2020.18.3.e33 |
_version_ | 1783595090434850816 |
---|---|
author | Kim, Sunho Kim, Royoung Nam, Hee-Jo Kim, Ryeo-Gyeong Ko, Enjin Kim, Han-Su Shin, Jihye Cho, Daeun Jin, Yurhee Bae, Soyeon Jo, Ye Won Jeong, San Ah Kim, Yena Ahn, Seoyeon Jang, Bomi Seong, Jiheyon Lee, Yujin Seo, Si Eun Kim, Yujin Kim, Ha-Jeong Kim, Hyeji Sung, Hye-Lynn Lho, Hyoyoung Koo, Jaywon Chu, Jion Lim, Juwon Kim, Youngju Lee, Kyungyeon Lim, Yuri Kim, Meongeun Hwang, Seonjeong Han, Shinhye Bae, Sohyeun Kim, Sua Yoo, Suhyeon Seo, Yeonjeong Shin, Yerim Kim, Yonsoo Ko, You-Jung Baek, Jihee Hyun, Hyejin Choi, Hyemin Oh, Ji-Hye Kim, Da-Young Park, Hyun-Seok |
author_facet | Kim, Sunho Kim, Royoung Nam, Hee-Jo Kim, Ryeo-Gyeong Ko, Enjin Kim, Han-Su Shin, Jihye Cho, Daeun Jin, Yurhee Bae, Soyeon Jo, Ye Won Jeong, San Ah Kim, Yena Ahn, Seoyeon Jang, Bomi Seong, Jiheyon Lee, Yujin Seo, Si Eun Kim, Yujin Kim, Ha-Jeong Kim, Hyeji Sung, Hye-Lynn Lho, Hyoyoung Koo, Jaywon Chu, Jion Lim, Juwon Kim, Youngju Lee, Kyungyeon Lim, Yuri Kim, Meongeun Hwang, Seonjeong Han, Shinhye Bae, Sohyeun Kim, Sua Yoo, Suhyeon Seo, Yeonjeong Shin, Yerim Kim, Yonsoo Ko, You-Jung Baek, Jihee Hyun, Hyejin Choi, Hyemin Oh, Ji-Hye Kim, Da-Young Park, Hyun-Seok |
author_sort | Kim, Sunho |
collection | PubMed |
description | This paper describes a community effort to improve earlier versions of the full-text corpus of Genomics & Informatics by semi-automatically detecting and correcting PDF-to-text conversion errors and optical character recognition errors during the first hackathon of Genomics & Informatics Annotation Hackathon (GIAH) event. Extracting text from multi-column biomedical documents such as Genomics & Informatics is known to be notoriously difficult. The hackathon was piloted as part of a coding competition of the ELTEC College of Engineering at Ewha Womans University in order to enable researchers and students to create or annotate their own versions of the Genomics & Informatics corpus, to gain and create knowledge about corpus linguistics, and simultaneously to acquire tangible and transferable skills. The proposed projects during the hackathon harness an internal database containing different versions of the corpus and annotations. |
format | Online Article Text |
id | pubmed-7560450 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Korea Genome Organization |
record_format | MEDLINE/PubMed |
spelling | pubmed-75604502020-10-21 Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0 Kim, Sunho Kim, Royoung Nam, Hee-Jo Kim, Ryeo-Gyeong Ko, Enjin Kim, Han-Su Shin, Jihye Cho, Daeun Jin, Yurhee Bae, Soyeon Jo, Ye Won Jeong, San Ah Kim, Yena Ahn, Seoyeon Jang, Bomi Seong, Jiheyon Lee, Yujin Seo, Si Eun Kim, Yujin Kim, Ha-Jeong Kim, Hyeji Sung, Hye-Lynn Lho, Hyoyoung Koo, Jaywon Chu, Jion Lim, Juwon Kim, Youngju Lee, Kyungyeon Lim, Yuri Kim, Meongeun Hwang, Seonjeong Han, Shinhye Bae, Sohyeun Kim, Sua Yoo, Suhyeon Seo, Yeonjeong Shin, Yerim Kim, Yonsoo Ko, You-Jung Baek, Jihee Hyun, Hyejin Choi, Hyemin Oh, Ji-Hye Kim, Da-Young Park, Hyun-Seok Genomics Inform Application Note This paper describes a community effort to improve earlier versions of the full-text corpus of Genomics & Informatics by semi-automatically detecting and correcting PDF-to-text conversion errors and optical character recognition errors during the first hackathon of Genomics & Informatics Annotation Hackathon (GIAH) event. Extracting text from multi-column biomedical documents such as Genomics & Informatics is known to be notoriously difficult. The hackathon was piloted as part of a coding competition of the ELTEC College of Engineering at Ewha Womans University in order to enable researchers and students to create or annotate their own versions of the Genomics & Informatics corpus, to gain and create knowledge about corpus linguistics, and simultaneously to acquire tangible and transferable skills. The proposed projects during the hackathon harness an internal database containing different versions of the corpus and annotations. Korea Genome Organization 2020-09-17 /pmc/articles/PMC7560450/ /pubmed/33017877 http://dx.doi.org/10.5808/GI.2020.18.3.e33 Text en (c) 2020, Korea Genome Organization (CC) This is an open-access article distributed under the terms of the Creative Commons Attribution license(https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Application Note Kim, Sunho Kim, Royoung Nam, Hee-Jo Kim, Ryeo-Gyeong Ko, Enjin Kim, Han-Su Shin, Jihye Cho, Daeun Jin, Yurhee Bae, Soyeon Jo, Ye Won Jeong, San Ah Kim, Yena Ahn, Seoyeon Jang, Bomi Seong, Jiheyon Lee, Yujin Seo, Si Eun Kim, Yujin Kim, Ha-Jeong Kim, Hyeji Sung, Hye-Lynn Lho, Hyoyoung Koo, Jaywon Chu, Jion Lim, Juwon Kim, Youngju Lee, Kyungyeon Lim, Yuri Kim, Meongeun Hwang, Seonjeong Han, Shinhye Bae, Sohyeun Kim, Sua Yoo, Suhyeon Seo, Yeonjeong Shin, Yerim Kim, Yonsoo Ko, You-Jung Baek, Jihee Hyun, Hyejin Choi, Hyemin Oh, Ji-Hye Kim, Da-Young Park, Hyun-Seok Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0 |
title | Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0 |
title_full | Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0 |
title_fullStr | Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0 |
title_full_unstemmed | Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0 |
title_short | Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0 |
title_sort | organizing an in-class hackathon to correct pdf-to-text conversion errors of genomics & informatics 1.0 |
topic | Application Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7560450/ https://www.ncbi.nlm.nih.gov/pubmed/33017877 http://dx.doi.org/10.5808/GI.2020.18.3.e33 |
work_keys_str_mv | AT kimsunho organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT kimroyoung organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT namheejo organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT kimryeogyeong organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT koenjin organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT kimhansu organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT shinjihye organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT chodaeun organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT jinyurhee organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT baesoyeon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT joyewon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT jeongsanah organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT kimyena organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT ahnseoyeon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT jangbomi organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT seongjiheyon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT leeyujin organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT seosieun organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT kimyujin organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT kimhajeong organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT kimhyeji organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT sunghyelynn organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT lhohyoyoung organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT koojaywon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT chujion organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT limjuwon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT kimyoungju organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT leekyungyeon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT limyuri organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT kimmeongeun organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT hwangseonjeong organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT hanshinhye organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT baesohyeun organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT kimsua organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT yoosuhyeon organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT seoyeonjeong organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT shinyerim organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT kimyonsoo organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT koyoujung organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT baekjihee organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT hyunhyejin organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT choihyemin organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT ohjihye organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT kimdayoung organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 AT parkhyunseok organizinganinclasshackathontocorrectpdftotextconversionerrorsofgenomicsinformatics10 |