Cargando…
ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis
Here, we introduce ITEXT-BIO, an intelligent process for biomedical domain terminology extraction from textual documents and subsequent analysis. The proposed methodology consists of two complementary approaches, including free and driven term extraction. The first is based on term extraction with s...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8272612/ https://www.ncbi.nlm.nih.gov/pubmed/34276970 http://dx.doi.org/10.1007/s13755-021-00156-6 |
_version_ | 1783721251543449600 |
---|---|
author | Kafando, Rodrique Decoupes, Rémy Valentin, Sarah Sautot, Lucile Teisseire, Maguelonne Roche, Mathieu |
author_facet | Kafando, Rodrique Decoupes, Rémy Valentin, Sarah Sautot, Lucile Teisseire, Maguelonne Roche, Mathieu |
author_sort | Kafando, Rodrique |
collection | PubMed |
description | Here, we introduce ITEXT-BIO, an intelligent process for biomedical domain terminology extraction from textual documents and subsequent analysis. The proposed methodology consists of two complementary approaches, including free and driven term extraction. The first is based on term extraction with statistical measures, while the second considers morphosyntactic variation rules to extract term variants from the corpus. The combination of two term extraction and analysis strategies is the keystone of ITEXT-BIO. These include combined intra-corpus strategies that enable term extraction and analysis either from a single corpus (intra), or from corpora (inter). We assessed the two approaches, the corpus or corpora to be analysed and the type of statistical measures used. Our experimental findings revealed that the proposed methodology could be used: (1) to efficiently extract representative, discriminant and new terms from a given corpus or corpora, and (2) to provide quantitative and qualitative analyses on these terms regarding the study domain. |
format | Online Article Text |
id | pubmed-8272612 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-82726122021-07-12 ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis Kafando, Rodrique Decoupes, Rémy Valentin, Sarah Sautot, Lucile Teisseire, Maguelonne Roche, Mathieu Health Inf Sci Syst Research Here, we introduce ITEXT-BIO, an intelligent process for biomedical domain terminology extraction from textual documents and subsequent analysis. The proposed methodology consists of two complementary approaches, including free and driven term extraction. The first is based on term extraction with statistical measures, while the second considers morphosyntactic variation rules to extract term variants from the corpus. The combination of two term extraction and analysis strategies is the keystone of ITEXT-BIO. These include combined intra-corpus strategies that enable term extraction and analysis either from a single corpus (intra), or from corpora (inter). We assessed the two approaches, the corpus or corpora to be analysed and the type of statistical measures used. Our experimental findings revealed that the proposed methodology could be used: (1) to efficiently extract representative, discriminant and new terms from a given corpus or corpora, and (2) to provide quantitative and qualitative analyses on these terms regarding the study domain. Springer International Publishing 2021-07-10 /pmc/articles/PMC8272612/ /pubmed/34276970 http://dx.doi.org/10.1007/s13755-021-00156-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Research Kafando, Rodrique Decoupes, Rémy Valentin, Sarah Sautot, Lucile Teisseire, Maguelonne Roche, Mathieu ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis |
title | ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis |
title_full | ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis |
title_fullStr | ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis |
title_full_unstemmed | ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis |
title_short | ITEXT-BIO: Intelligent Term EXTraction for BIOmedical analysis |
title_sort | itext-bio: intelligent term extraction for biomedical analysis |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8272612/ https://www.ncbi.nlm.nih.gov/pubmed/34276970 http://dx.doi.org/10.1007/s13755-021-00156-6 |
work_keys_str_mv | AT kafandorodrique itextbiointelligenttermextractionforbiomedicalanalysis AT decoupesremy itextbiointelligenttermextractionforbiomedicalanalysis AT valentinsarah itextbiointelligenttermextractionforbiomedicalanalysis AT sautotlucile itextbiointelligenttermextractionforbiomedicalanalysis AT teisseiremaguelonne itextbiointelligenttermextractionforbiomedicalanalysis AT rochemathieu itextbiointelligenttermextractionforbiomedicalanalysis |