Cargando…

Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies

BACKGROUND: In general, surgical pathology reviews report protein expression by tumors in a semi-quantitative manner, that is, -, -/+, +/-, +. At the same time, the experimental pathology literature provides multiple examples of precise expression levels determined by immunohistochemical (IHC) tissu...

Descripción completa

Detalles Bibliográficos
Autores principales: Chang, Jia-Fu, Popescu, Mihail, Arthur, Gerald L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Medknow Publications & Media Pvt Ltd 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3746413/
https://www.ncbi.nlm.nih.gov/pubmed/23967385
http://dx.doi.org/10.4103/2153-3539.115880
_version_ 1782280819579027456
author Chang, Jia-Fu
Popescu, Mihail
Arthur, Gerald L.
author_facet Chang, Jia-Fu
Popescu, Mihail
Arthur, Gerald L.
author_sort Chang, Jia-Fu
collection PubMed
description BACKGROUND: In general, surgical pathology reviews report protein expression by tumors in a semi-quantitative manner, that is, -, -/+, +/-, +. At the same time, the experimental pathology literature provides multiple examples of precise expression levels determined by immunohistochemical (IHC) tissue examination of populations of tumors. Natural language processing (NLP) techniques enable the automated extraction of such information through text mining. We propose establishing a database linking quantitative protein expression levels with specific tumor classifications through NLP. MATERIALS AND METHODS: Our method takes advantage of typical forms of representing experimental findings in terms of percentages of protein expression manifest by the tumor population under study. Characteristically, percentages are represented straightforwardly with the % symbol or as the number of positive findings of the total population. Such text is readily recognized using regular expressions and templates permitting extraction of sentences containing these forms for further analysis using grammatical structures and rule-based algorithms. RESULTS: Our pilot study is limited to the extraction of such information related to lymphomas. We achieved a satisfactory level of retrieval as reflected in scores of 69.91% precision and 57.25% recall with an F-score of 62.95%. In addition, we demonstrate the utility of a web-based curation tool for confirming and correcting our findings. CONCLUSIONS: The experimental pathology literature represents a rich source of pathobiological information, which has been relatively underutilized. There has been a combinatorial explosion of knowledge within the pathology domain as represented by increasing numbers of immunophenotypes and disease subclassifications. NLP techniques support practical text mining techniques for extracting this knowledge and organizing it in forms appropriate for pathology decision support systems.
format Online
Article
Text
id pubmed-3746413
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Medknow Publications & Media Pvt Ltd
record_format MEDLINE/PubMed
spelling pubmed-37464132013-08-21 Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies Chang, Jia-Fu Popescu, Mihail Arthur, Gerald L. J Pathol Inform Research Article BACKGROUND: In general, surgical pathology reviews report protein expression by tumors in a semi-quantitative manner, that is, -, -/+, +/-, +. At the same time, the experimental pathology literature provides multiple examples of precise expression levels determined by immunohistochemical (IHC) tissue examination of populations of tumors. Natural language processing (NLP) techniques enable the automated extraction of such information through text mining. We propose establishing a database linking quantitative protein expression levels with specific tumor classifications through NLP. MATERIALS AND METHODS: Our method takes advantage of typical forms of representing experimental findings in terms of percentages of protein expression manifest by the tumor population under study. Characteristically, percentages are represented straightforwardly with the % symbol or as the number of positive findings of the total population. Such text is readily recognized using regular expressions and templates permitting extraction of sentences containing these forms for further analysis using grammatical structures and rule-based algorithms. RESULTS: Our pilot study is limited to the extraction of such information related to lymphomas. We achieved a satisfactory level of retrieval as reflected in scores of 69.91% precision and 57.25% recall with an F-score of 62.95%. In addition, we demonstrate the utility of a web-based curation tool for confirming and correcting our findings. CONCLUSIONS: The experimental pathology literature represents a rich source of pathobiological information, which has been relatively underutilized. There has been a combinatorial explosion of knowledge within the pathology domain as represented by increasing numbers of immunophenotypes and disease subclassifications. NLP techniques support practical text mining techniques for extracting this knowledge and organizing it in forms appropriate for pathology decision support systems. Medknow Publications & Media Pvt Ltd 2013-07-31 /pmc/articles/PMC3746413/ /pubmed/23967385 http://dx.doi.org/10.4103/2153-3539.115880 Text en Copyright: © 2013 Chang JF. http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open.access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Chang, Jia-Fu
Popescu, Mihail
Arthur, Gerald L.
Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies
title Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies
title_full Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies
title_fullStr Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies
title_full_unstemmed Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies
title_short Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies
title_sort automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3746413/
https://www.ncbi.nlm.nih.gov/pubmed/23967385
http://dx.doi.org/10.4103/2153-3539.115880
work_keys_str_mv AT changjiafu automatedextractionofpreciseproteinexpressionpatternsinlymphomabytextminingabstractsofimmunohistochemicalstudies
AT popescumihail automatedextractionofpreciseproteinexpressionpatternsinlymphomabytextminingabstractsofimmunohistochemicalstudies
AT arthurgeraldl automatedextractionofpreciseproteinexpressionpatternsinlymphomabytextminingabstractsofimmunohistochemicalstudies