Cargando…

An automatic system for extracting figure-caption pair from medical documents: a six-fold approach

BACKGROUND: Figures and captions in medical documentation contain important information. As a result, researchers are becoming more interested in obtaining published medical figures from medical papers and utilizing the captions as a knowledge source. METHODS: This work introduces a unique and succe...

Descripción completa

Detalles Bibliográficos
Autor principal: Chaki, Jyotismita
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403167/
https://www.ncbi.nlm.nih.gov/pubmed/37547417
http://dx.doi.org/10.7717/peerj-cs.1452
_version_ 1785085008456712192
author Chaki, Jyotismita
author_facet Chaki, Jyotismita
author_sort Chaki, Jyotismita
collection PubMed
description BACKGROUND: Figures and captions in medical documentation contain important information. As a result, researchers are becoming more interested in obtaining published medical figures from medical papers and utilizing the captions as a knowledge source. METHODS: This work introduces a unique and successful six-fold methodology for extracting figure-caption pairs. The A-torus wavelet transform is used to retrieve the first edge from the scanned page. Then, using the maximally stable extremal regions connected component feature, text and graphical contents are isolated from the edge document, and multi-layer perceptron is used to successfully detect and retrieve figures and captions from medical records. The figure-caption pair is then extracted using the bounding box approach. The files that contain the figures and captions are saved separately and supplied to the end useras theoutput of any investigation. The proposed approach is evaluated using a self-created database based on the pages collected from five open access books: Sergey Makarov, Gregory Noetscher and Aapo Nummenmaa’s book “Brain and Human Body Modelling 2021”, “Healthcare and Disease Burden in Africa” by Ilha Niohuru, “All-Optical Methods to Study Neuronal Function” by Eirini Papagiakoumou, “RNA, the Epicenter of Genetic Information” by John Mattick and Paulo Amaral and “Illustrated Manual of Pediatric Dermatology” by Susan Bayliss Mallory, Alanna Bree and Peggy Chern. RESULTS: Experiments and findings comparing the new method to earlier systems reveal a significant increase in efficiency, demonstrating the suggested technique’s robustness and efficiency.
format Online
Article
Text
id pubmed-10403167
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-104031672023-08-05 An automatic system for extracting figure-caption pair from medical documents: a six-fold approach Chaki, Jyotismita PeerJ Comput Sci Artificial Intelligence BACKGROUND: Figures and captions in medical documentation contain important information. As a result, researchers are becoming more interested in obtaining published medical figures from medical papers and utilizing the captions as a knowledge source. METHODS: This work introduces a unique and successful six-fold methodology for extracting figure-caption pairs. The A-torus wavelet transform is used to retrieve the first edge from the scanned page. Then, using the maximally stable extremal regions connected component feature, text and graphical contents are isolated from the edge document, and multi-layer perceptron is used to successfully detect and retrieve figures and captions from medical records. The figure-caption pair is then extracted using the bounding box approach. The files that contain the figures and captions are saved separately and supplied to the end useras theoutput of any investigation. The proposed approach is evaluated using a self-created database based on the pages collected from five open access books: Sergey Makarov, Gregory Noetscher and Aapo Nummenmaa’s book “Brain and Human Body Modelling 2021”, “Healthcare and Disease Burden in Africa” by Ilha Niohuru, “All-Optical Methods to Study Neuronal Function” by Eirini Papagiakoumou, “RNA, the Epicenter of Genetic Information” by John Mattick and Paulo Amaral and “Illustrated Manual of Pediatric Dermatology” by Susan Bayliss Mallory, Alanna Bree and Peggy Chern. RESULTS: Experiments and findings comparing the new method to earlier systems reveal a significant increase in efficiency, demonstrating the suggested technique’s robustness and efficiency. PeerJ Inc. 2023-07-26 /pmc/articles/PMC10403167/ /pubmed/37547417 http://dx.doi.org/10.7717/peerj-cs.1452 Text en ©2023 Chaki https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Artificial Intelligence
Chaki, Jyotismita
An automatic system for extracting figure-caption pair from medical documents: a six-fold approach
title An automatic system for extracting figure-caption pair from medical documents: a six-fold approach
title_full An automatic system for extracting figure-caption pair from medical documents: a six-fold approach
title_fullStr An automatic system for extracting figure-caption pair from medical documents: a six-fold approach
title_full_unstemmed An automatic system for extracting figure-caption pair from medical documents: a six-fold approach
title_short An automatic system for extracting figure-caption pair from medical documents: a six-fold approach
title_sort automatic system for extracting figure-caption pair from medical documents: a six-fold approach
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403167/
https://www.ncbi.nlm.nih.gov/pubmed/37547417
http://dx.doi.org/10.7717/peerj-cs.1452
work_keys_str_mv AT chakijyotismita anautomaticsystemforextractingfigurecaptionpairfrommedicaldocumentsasixfoldapproach
AT chakijyotismita automaticsystemforextractingfigurecaptionpairfrommedicaldocumentsasixfoldapproach