Cargando…
A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning
BACKGROUND: Pathology synopses consist of semi-structured or unstructured text summarizing visual information by observing human tissue. Experts write and interpret these synopses with high domain-specific knowledge to extract tissue semantics and formulate a diagnosis in the context of ancillary te...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9053264/ https://www.ncbi.nlm.nih.gov/pubmed/35602188 http://dx.doi.org/10.1038/s43856-021-00008-0 |
_version_ | 1784696960789250048 |
---|---|
author | Mu, Youqing Tizhoosh, Hamid R. Tayebi, Rohollah Moosavi Ross, Catherine Sur, Monalisa Leber, Brian Campbell, Clinton J. V. |
author_facet | Mu, Youqing Tizhoosh, Hamid R. Tayebi, Rohollah Moosavi Ross, Catherine Sur, Monalisa Leber, Brian Campbell, Clinton J. V. |
author_sort | Mu, Youqing |
collection | PubMed |
description | BACKGROUND: Pathology synopses consist of semi-structured or unstructured text summarizing visual information by observing human tissue. Experts write and interpret these synopses with high domain-specific knowledge to extract tissue semantics and formulate a diagnosis in the context of ancillary testing and clinical information. The limited number of specialists available to interpret pathology synopses restricts the utility of the inherent information. Deep learning offers a tool for information extraction and automatic feature generation from complex datasets. METHODS: Using an active learning approach, we developed a set of semantic labels for bone marrow aspirate pathology synopses. We then trained a transformer-based deep-learning model to map these synopses to one or more semantic labels, and extracted learned embeddings (i.e., meaningful attributes) from the model’s hidden layer. RESULTS: Here we demonstrate that with a small amount of training data, a transformer-based natural language model can extract embeddings from pathology synopses that capture diagnostically relevant information. On average, these embeddings can be used to generate semantic labels mapping patients to probable diagnostic groups with a micro-average F1 score of 0.779 Â ± 0.025. CONCLUSIONS: We provide a generalizable deep learning model and approach to unlock the semantic information inherent in pathology synopses toward improved diagnostics, biodiscovery and AI-assisted computational pathology. |
format | Online Article Text |
id | pubmed-9053264 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-90532642022-05-20 A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning Mu, Youqing Tizhoosh, Hamid R. Tayebi, Rohollah Moosavi Ross, Catherine Sur, Monalisa Leber, Brian Campbell, Clinton J. V. Commun Med (Lond) Article BACKGROUND: Pathology synopses consist of semi-structured or unstructured text summarizing visual information by observing human tissue. Experts write and interpret these synopses with high domain-specific knowledge to extract tissue semantics and formulate a diagnosis in the context of ancillary testing and clinical information. The limited number of specialists available to interpret pathology synopses restricts the utility of the inherent information. Deep learning offers a tool for information extraction and automatic feature generation from complex datasets. METHODS: Using an active learning approach, we developed a set of semantic labels for bone marrow aspirate pathology synopses. We then trained a transformer-based deep-learning model to map these synopses to one or more semantic labels, and extracted learned embeddings (i.e., meaningful attributes) from the model’s hidden layer. RESULTS: Here we demonstrate that with a small amount of training data, a transformer-based natural language model can extract embeddings from pathology synopses that capture diagnostically relevant information. On average, these embeddings can be used to generate semantic labels mapping patients to probable diagnostic groups with a micro-average F1 score of 0.779 Â ± 0.025. CONCLUSIONS: We provide a generalizable deep learning model and approach to unlock the semantic information inherent in pathology synopses toward improved diagnostics, biodiscovery and AI-assisted computational pathology. Nature Publishing Group UK 2021-07-05 /pmc/articles/PMC9053264/ /pubmed/35602188 http://dx.doi.org/10.1038/s43856-021-00008-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Mu, Youqing Tizhoosh, Hamid R. Tayebi, Rohollah Moosavi Ross, Catherine Sur, Monalisa Leber, Brian Campbell, Clinton J. V. A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning |
title | A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning |
title_full | A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning |
title_fullStr | A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning |
title_full_unstemmed | A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning |
title_short | A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning |
title_sort | bert model generates diagnostically relevant semantic embeddings from pathology synopses with active learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9053264/ https://www.ncbi.nlm.nih.gov/pubmed/35602188 http://dx.doi.org/10.1038/s43856-021-00008-0 |
work_keys_str_mv | AT muyouqing abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT tizhooshhamidr abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT tayebirohollahmoosavi abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT rosscatherine abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT surmonalisa abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT leberbrian abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT campbellclintonjv abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT muyouqing bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT tizhooshhamidr bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT tayebirohollahmoosavi bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT rosscatherine bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT surmonalisa bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT leberbrian bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning AT campbellclintonjv bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning |