Cargando…
Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies
BACKGROUND: In general, surgical pathology reviews report protein expression by tumors in a semi-quantitative manner, that is, -, -/+, +/-, +. At the same time, the experimental pathology literature provides multiple examples of precise expression levels determined by immunohistochemical (IHC) tissu...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Medknow Publications & Media Pvt Ltd
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3746413/ https://www.ncbi.nlm.nih.gov/pubmed/23967385 http://dx.doi.org/10.4103/2153-3539.115880 |
_version_ | 1782280819579027456 |
---|---|
author | Chang, Jia-Fu Popescu, Mihail Arthur, Gerald L. |
author_facet | Chang, Jia-Fu Popescu, Mihail Arthur, Gerald L. |
author_sort | Chang, Jia-Fu |
collection | PubMed |
description | BACKGROUND: In general, surgical pathology reviews report protein expression by tumors in a semi-quantitative manner, that is, -, -/+, +/-, +. At the same time, the experimental pathology literature provides multiple examples of precise expression levels determined by immunohistochemical (IHC) tissue examination of populations of tumors. Natural language processing (NLP) techniques enable the automated extraction of such information through text mining. We propose establishing a database linking quantitative protein expression levels with specific tumor classifications through NLP. MATERIALS AND METHODS: Our method takes advantage of typical forms of representing experimental findings in terms of percentages of protein expression manifest by the tumor population under study. Characteristically, percentages are represented straightforwardly with the % symbol or as the number of positive findings of the total population. Such text is readily recognized using regular expressions and templates permitting extraction of sentences containing these forms for further analysis using grammatical structures and rule-based algorithms. RESULTS: Our pilot study is limited to the extraction of such information related to lymphomas. We achieved a satisfactory level of retrieval as reflected in scores of 69.91% precision and 57.25% recall with an F-score of 62.95%. In addition, we demonstrate the utility of a web-based curation tool for confirming and correcting our findings. CONCLUSIONS: The experimental pathology literature represents a rich source of pathobiological information, which has been relatively underutilized. There has been a combinatorial explosion of knowledge within the pathology domain as represented by increasing numbers of immunophenotypes and disease subclassifications. NLP techniques support practical text mining techniques for extracting this knowledge and organizing it in forms appropriate for pathology decision support systems. |
format | Online Article Text |
id | pubmed-3746413 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Medknow Publications & Media Pvt Ltd |
record_format | MEDLINE/PubMed |
spelling | pubmed-37464132013-08-21 Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies Chang, Jia-Fu Popescu, Mihail Arthur, Gerald L. J Pathol Inform Research Article BACKGROUND: In general, surgical pathology reviews report protein expression by tumors in a semi-quantitative manner, that is, -, -/+, +/-, +. At the same time, the experimental pathology literature provides multiple examples of precise expression levels determined by immunohistochemical (IHC) tissue examination of populations of tumors. Natural language processing (NLP) techniques enable the automated extraction of such information through text mining. We propose establishing a database linking quantitative protein expression levels with specific tumor classifications through NLP. MATERIALS AND METHODS: Our method takes advantage of typical forms of representing experimental findings in terms of percentages of protein expression manifest by the tumor population under study. Characteristically, percentages are represented straightforwardly with the % symbol or as the number of positive findings of the total population. Such text is readily recognized using regular expressions and templates permitting extraction of sentences containing these forms for further analysis using grammatical structures and rule-based algorithms. RESULTS: Our pilot study is limited to the extraction of such information related to lymphomas. We achieved a satisfactory level of retrieval as reflected in scores of 69.91% precision and 57.25% recall with an F-score of 62.95%. In addition, we demonstrate the utility of a web-based curation tool for confirming and correcting our findings. CONCLUSIONS: The experimental pathology literature represents a rich source of pathobiological information, which has been relatively underutilized. There has been a combinatorial explosion of knowledge within the pathology domain as represented by increasing numbers of immunophenotypes and disease subclassifications. NLP techniques support practical text mining techniques for extracting this knowledge and organizing it in forms appropriate for pathology decision support systems. Medknow Publications & Media Pvt Ltd 2013-07-31 /pmc/articles/PMC3746413/ /pubmed/23967385 http://dx.doi.org/10.4103/2153-3539.115880 Text en Copyright: © 2013 Chang JF. http://creativecommons.org/licenses/by-nc-sa/3.0 This is an open.access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Chang, Jia-Fu Popescu, Mihail Arthur, Gerald L. Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies |
title | Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies |
title_full | Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies |
title_fullStr | Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies |
title_full_unstemmed | Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies |
title_short | Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies |
title_sort | automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3746413/ https://www.ncbi.nlm.nih.gov/pubmed/23967385 http://dx.doi.org/10.4103/2153-3539.115880 |
work_keys_str_mv | AT changjiafu automatedextractionofpreciseproteinexpressionpatternsinlymphomabytextminingabstractsofimmunohistochemicalstudies AT popescumihail automatedextractionofpreciseproteinexpressionpatternsinlymphomabytextminingabstractsofimmunohistochemicalstudies AT arthurgeraldl automatedextractionofpreciseproteinexpressionpatternsinlymphomabytextminingabstractsofimmunohistochemicalstudies |