Cargando…

Automated extraction of Biomarker information from pathology reports

BACKGROUND: Pathology reports are written in free-text form, which precludes efficient data gathering. We aimed to overcome this limitation and design an automated system for extracting biomarker profiles from accumulated pathology reports. METHODS: We designed a new data model for representing biom...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Jeongeun, Song, Hyun-Je, Yoon, Eunsil, Park, Seong-Bae, Park, Sung-Hye, Seo, Jeong-Wook, Park, Peom, Choi, Jinwook
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5963015/
https://www.ncbi.nlm.nih.gov/pubmed/29783980
http://dx.doi.org/10.1186/s12911-018-0609-7
_version_ 1783324971703992320
author Lee, Jeongeun
Song, Hyun-Je
Yoon, Eunsil
Park, Seong-Bae
Park, Sung-Hye
Seo, Jeong-Wook
Park, Peom
Choi, Jinwook
author_facet Lee, Jeongeun
Song, Hyun-Je
Yoon, Eunsil
Park, Seong-Bae
Park, Sung-Hye
Seo, Jeong-Wook
Park, Peom
Choi, Jinwook
author_sort Lee, Jeongeun
collection PubMed
description BACKGROUND: Pathology reports are written in free-text form, which precludes efficient data gathering. We aimed to overcome this limitation and design an automated system for extracting biomarker profiles from accumulated pathology reports. METHODS: We designed a new data model for representing biomarker knowledge. The automated system parses immunohistochemistry reports based on a “slide paragraph” unit defined as a set of immunohistochemistry findings obtained for the same tissue slide. Pathology reports are parsed using context-free grammar for immunohistochemistry, and using a tree-like structure for surgical pathology. The performance of the approach was validated on manually annotated pathology reports of 100 randomly selected patients managed at Seoul National University Hospital. RESULTS: High F-scores were obtained for parsing biomarker name and corresponding test results (0.999 and 0.998, respectively) from the immunohistochemistry reports, compared to relatively poor performance for parsing surgical pathology findings. However, applying the proposed approach to our single-center dataset revealed information on 221 unique biomarkers, which represents a richer result than biomarker profiles obtained based on the published literature. Owing to the data representation model, the proposed approach can associate biomarker profiles extracted from an immunohistochemistry report with corresponding pathology findings listed in one or more surgical pathology reports. Term variations are resolved by normalization to corresponding preferred terms determined by expanded dictionary look-up and text similarity-based search. CONCLUSIONS: Our proposed approach for biomarker data extraction addresses key limitations regarding data representation and can handle reports prepared in the clinical setting, which often contain incomplete sentences, typographical errors, and inconsistent formatting. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-018-0609-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5963015
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59630152018-06-25 Automated extraction of Biomarker information from pathology reports Lee, Jeongeun Song, Hyun-Je Yoon, Eunsil Park, Seong-Bae Park, Sung-Hye Seo, Jeong-Wook Park, Peom Choi, Jinwook BMC Med Inform Decis Mak Research Article BACKGROUND: Pathology reports are written in free-text form, which precludes efficient data gathering. We aimed to overcome this limitation and design an automated system for extracting biomarker profiles from accumulated pathology reports. METHODS: We designed a new data model for representing biomarker knowledge. The automated system parses immunohistochemistry reports based on a “slide paragraph” unit defined as a set of immunohistochemistry findings obtained for the same tissue slide. Pathology reports are parsed using context-free grammar for immunohistochemistry, and using a tree-like structure for surgical pathology. The performance of the approach was validated on manually annotated pathology reports of 100 randomly selected patients managed at Seoul National University Hospital. RESULTS: High F-scores were obtained for parsing biomarker name and corresponding test results (0.999 and 0.998, respectively) from the immunohistochemistry reports, compared to relatively poor performance for parsing surgical pathology findings. However, applying the proposed approach to our single-center dataset revealed information on 221 unique biomarkers, which represents a richer result than biomarker profiles obtained based on the published literature. Owing to the data representation model, the proposed approach can associate biomarker profiles extracted from an immunohistochemistry report with corresponding pathology findings listed in one or more surgical pathology reports. Term variations are resolved by normalization to corresponding preferred terms determined by expanded dictionary look-up and text similarity-based search. CONCLUSIONS: Our proposed approach for biomarker data extraction addresses key limitations regarding data representation and can handle reports prepared in the clinical setting, which often contain incomplete sentences, typographical errors, and inconsistent formatting. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-018-0609-7) contains supplementary material, which is available to authorized users. BioMed Central 2018-05-21 /pmc/articles/PMC5963015/ /pubmed/29783980 http://dx.doi.org/10.1186/s12911-018-0609-7 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Lee, Jeongeun
Song, Hyun-Je
Yoon, Eunsil
Park, Seong-Bae
Park, Sung-Hye
Seo, Jeong-Wook
Park, Peom
Choi, Jinwook
Automated extraction of Biomarker information from pathology reports
title Automated extraction of Biomarker information from pathology reports
title_full Automated extraction of Biomarker information from pathology reports
title_fullStr Automated extraction of Biomarker information from pathology reports
title_full_unstemmed Automated extraction of Biomarker information from pathology reports
title_short Automated extraction of Biomarker information from pathology reports
title_sort automated extraction of biomarker information from pathology reports
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5963015/
https://www.ncbi.nlm.nih.gov/pubmed/29783980
http://dx.doi.org/10.1186/s12911-018-0609-7
work_keys_str_mv AT leejeongeun automatedextractionofbiomarkerinformationfrompathologyreports
AT songhyunje automatedextractionofbiomarkerinformationfrompathologyreports
AT yooneunsil automatedextractionofbiomarkerinformationfrompathologyreports
AT parkseongbae automatedextractionofbiomarkerinformationfrompathologyreports
AT parksunghye automatedextractionofbiomarkerinformationfrompathologyreports
AT seojeongwook automatedextractionofbiomarkerinformationfrompathologyreports
AT parkpeom automatedextractionofbiomarkerinformationfrompathologyreports
AT choijinwook automatedextractionofbiomarkerinformationfrompathologyreports