Cargando…
An automatic system for extracting figure-caption pair from medical documents: a six-fold approach
BACKGROUND: Figures and captions in medical documentation contain important information. As a result, researchers are becoming more interested in obtaining published medical figures from medical papers and utilizing the captions as a knowledge source. METHODS: This work introduces a unique and succe...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403167/ https://www.ncbi.nlm.nih.gov/pubmed/37547417 http://dx.doi.org/10.7717/peerj-cs.1452 |
_version_ | 1785085008456712192 |
---|---|
author | Chaki, Jyotismita |
author_facet | Chaki, Jyotismita |
author_sort | Chaki, Jyotismita |
collection | PubMed |
description | BACKGROUND: Figures and captions in medical documentation contain important information. As a result, researchers are becoming more interested in obtaining published medical figures from medical papers and utilizing the captions as a knowledge source. METHODS: This work introduces a unique and successful six-fold methodology for extracting figure-caption pairs. The A-torus wavelet transform is used to retrieve the first edge from the scanned page. Then, using the maximally stable extremal regions connected component feature, text and graphical contents are isolated from the edge document, and multi-layer perceptron is used to successfully detect and retrieve figures and captions from medical records. The figure-caption pair is then extracted using the bounding box approach. The files that contain the figures and captions are saved separately and supplied to the end useras theoutput of any investigation. The proposed approach is evaluated using a self-created database based on the pages collected from five open access books: Sergey Makarov, Gregory Noetscher and Aapo Nummenmaa’s book “Brain and Human Body Modelling 2021”, “Healthcare and Disease Burden in Africa” by Ilha Niohuru, “All-Optical Methods to Study Neuronal Function” by Eirini Papagiakoumou, “RNA, the Epicenter of Genetic Information” by John Mattick and Paulo Amaral and “Illustrated Manual of Pediatric Dermatology” by Susan Bayliss Mallory, Alanna Bree and Peggy Chern. RESULTS: Experiments and findings comparing the new method to earlier systems reveal a significant increase in efficiency, demonstrating the suggested technique’s robustness and efficiency. |
format | Online Article Text |
id | pubmed-10403167 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-104031672023-08-05 An automatic system for extracting figure-caption pair from medical documents: a six-fold approach Chaki, Jyotismita PeerJ Comput Sci Artificial Intelligence BACKGROUND: Figures and captions in medical documentation contain important information. As a result, researchers are becoming more interested in obtaining published medical figures from medical papers and utilizing the captions as a knowledge source. METHODS: This work introduces a unique and successful six-fold methodology for extracting figure-caption pairs. The A-torus wavelet transform is used to retrieve the first edge from the scanned page. Then, using the maximally stable extremal regions connected component feature, text and graphical contents are isolated from the edge document, and multi-layer perceptron is used to successfully detect and retrieve figures and captions from medical records. The figure-caption pair is then extracted using the bounding box approach. The files that contain the figures and captions are saved separately and supplied to the end useras theoutput of any investigation. The proposed approach is evaluated using a self-created database based on the pages collected from five open access books: Sergey Makarov, Gregory Noetscher and Aapo Nummenmaa’s book “Brain and Human Body Modelling 2021”, “Healthcare and Disease Burden in Africa” by Ilha Niohuru, “All-Optical Methods to Study Neuronal Function” by Eirini Papagiakoumou, “RNA, the Epicenter of Genetic Information” by John Mattick and Paulo Amaral and “Illustrated Manual of Pediatric Dermatology” by Susan Bayliss Mallory, Alanna Bree and Peggy Chern. RESULTS: Experiments and findings comparing the new method to earlier systems reveal a significant increase in efficiency, demonstrating the suggested technique’s robustness and efficiency. PeerJ Inc. 2023-07-26 /pmc/articles/PMC10403167/ /pubmed/37547417 http://dx.doi.org/10.7717/peerj-cs.1452 Text en ©2023 Chaki https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Artificial Intelligence Chaki, Jyotismita An automatic system for extracting figure-caption pair from medical documents: a six-fold approach |
title | An automatic system for extracting figure-caption pair from medical documents: a six-fold approach |
title_full | An automatic system for extracting figure-caption pair from medical documents: a six-fold approach |
title_fullStr | An automatic system for extracting figure-caption pair from medical documents: a six-fold approach |
title_full_unstemmed | An automatic system for extracting figure-caption pair from medical documents: a six-fold approach |
title_short | An automatic system for extracting figure-caption pair from medical documents: a six-fold approach |
title_sort | automatic system for extracting figure-caption pair from medical documents: a six-fold approach |
topic | Artificial Intelligence |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403167/ https://www.ncbi.nlm.nih.gov/pubmed/37547417 http://dx.doi.org/10.7717/peerj-cs.1452 |
work_keys_str_mv | AT chakijyotismita anautomaticsystemforextractingfigurecaptionpairfrommedicaldocumentsasixfoldapproach AT chakijyotismita automaticsystemforextractingfigurecaptionpairfrommedicaldocumentsasixfoldapproach |