Cargando…

A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning

BACKGROUND: Pathology synopses consist of semi-structured or unstructured text summarizing visual information by observing human tissue. Experts write and interpret these synopses with high domain-specific knowledge to extract tissue semantics and formulate a diagnosis in the context of ancillary te...

Descripción completa

Detalles Bibliográficos
Autores principales: Mu, Youqing, Tizhoosh, Hamid R., Tayebi, Rohollah Moosavi, Ross, Catherine, Sur, Monalisa, Leber, Brian, Campbell, Clinton J. V.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9053264/
https://www.ncbi.nlm.nih.gov/pubmed/35602188
http://dx.doi.org/10.1038/s43856-021-00008-0
_version_ 1784696960789250048
author Mu, Youqing
Tizhoosh, Hamid R.
Tayebi, Rohollah Moosavi
Ross, Catherine
Sur, Monalisa
Leber, Brian
Campbell, Clinton J. V.
author_facet Mu, Youqing
Tizhoosh, Hamid R.
Tayebi, Rohollah Moosavi
Ross, Catherine
Sur, Monalisa
Leber, Brian
Campbell, Clinton J. V.
author_sort Mu, Youqing
collection PubMed
description BACKGROUND: Pathology synopses consist of semi-structured or unstructured text summarizing visual information by observing human tissue. Experts write and interpret these synopses with high domain-specific knowledge to extract tissue semantics and formulate a diagnosis in the context of ancillary testing and clinical information. The limited number of specialists available to interpret pathology synopses restricts the utility of the inherent information. Deep learning offers a tool for information extraction and automatic feature generation from complex datasets. METHODS: Using an active learning approach, we developed a set of semantic labels for bone marrow aspirate pathology synopses. We then trained a transformer-based deep-learning model to map these synopses to one or more semantic labels, and extracted learned embeddings (i.e., meaningful attributes) from the model’s hidden layer. RESULTS: Here we demonstrate that with a small amount of training data, a transformer-based natural language model can extract embeddings from pathology synopses that capture diagnostically relevant information. On average, these embeddings can be used to generate semantic labels mapping patients to probable diagnostic groups with a micro-average F1 score of 0.779 Â ± 0.025. CONCLUSIONS: We provide a generalizable deep learning model and approach to unlock the semantic information inherent in pathology synopses toward improved diagnostics, biodiscovery and AI-assisted computational pathology.
format Online
Article
Text
id pubmed-9053264
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-90532642022-05-20 A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning Mu, Youqing Tizhoosh, Hamid R. Tayebi, Rohollah Moosavi Ross, Catherine Sur, Monalisa Leber, Brian Campbell, Clinton J. V. Commun Med (Lond) Article BACKGROUND: Pathology synopses consist of semi-structured or unstructured text summarizing visual information by observing human tissue. Experts write and interpret these synopses with high domain-specific knowledge to extract tissue semantics and formulate a diagnosis in the context of ancillary testing and clinical information. The limited number of specialists available to interpret pathology synopses restricts the utility of the inherent information. Deep learning offers a tool for information extraction and automatic feature generation from complex datasets. METHODS: Using an active learning approach, we developed a set of semantic labels for bone marrow aspirate pathology synopses. We then trained a transformer-based deep-learning model to map these synopses to one or more semantic labels, and extracted learned embeddings (i.e., meaningful attributes) from the model’s hidden layer. RESULTS: Here we demonstrate that with a small amount of training data, a transformer-based natural language model can extract embeddings from pathology synopses that capture diagnostically relevant information. On average, these embeddings can be used to generate semantic labels mapping patients to probable diagnostic groups with a micro-average F1 score of 0.779 Â ± 0.025. CONCLUSIONS: We provide a generalizable deep learning model and approach to unlock the semantic information inherent in pathology synopses toward improved diagnostics, biodiscovery and AI-assisted computational pathology. Nature Publishing Group UK 2021-07-05 /pmc/articles/PMC9053264/ /pubmed/35602188 http://dx.doi.org/10.1038/s43856-021-00008-0 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Mu, Youqing
Tizhoosh, Hamid R.
Tayebi, Rohollah Moosavi
Ross, Catherine
Sur, Monalisa
Leber, Brian
Campbell, Clinton J. V.
A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning
title A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning
title_full A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning
title_fullStr A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning
title_full_unstemmed A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning
title_short A BERT model generates diagnostically relevant semantic embeddings from pathology synopses with active learning
title_sort bert model generates diagnostically relevant semantic embeddings from pathology synopses with active learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9053264/
https://www.ncbi.nlm.nih.gov/pubmed/35602188
http://dx.doi.org/10.1038/s43856-021-00008-0
work_keys_str_mv AT muyouqing abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT tizhooshhamidr abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT tayebirohollahmoosavi abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT rosscatherine abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT surmonalisa abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT leberbrian abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT campbellclintonjv abertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT muyouqing bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT tizhooshhamidr bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT tayebirohollahmoosavi bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT rosscatherine bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT surmonalisa bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT leberbrian bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning
AT campbellclintonjv bertmodelgeneratesdiagnosticallyrelevantsemanticembeddingsfrompathologysynopseswithactivelearning