Cargando…

Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity

OBJECTIVE: We develop natural language processing (NLP) methods capable of accurately classifying tumor attributes from pathology reports given minimal labeled examples. Our hierarchical cancer to cancer transfer (HCTC) and zero-shot string similarity (ZSS) methods are designed to exploit shared inf...

Descripción completa

Detalles Bibliográficos
Autores principales:	Park, Briton, Altieri, Nicholas, DeNero, John, Odisho, Anobel Y, Yu, Bin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2021
Materias:	Research and Applications
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8484934/ https://www.ncbi.nlm.nih.gov/pubmed/34604711 http://dx.doi.org/10.1093/jamiaopen/ooab085

_version_	1784577430030123008
author	Park, Briton Altieri, Nicholas DeNero, John Odisho, Anobel Y Yu, Bin
author_facet	Park, Briton Altieri, Nicholas DeNero, John Odisho, Anobel Y Yu, Bin
author_sort	Park, Briton
collection	PubMed
description	OBJECTIVE: We develop natural language processing (NLP) methods capable of accurately classifying tumor attributes from pathology reports given minimal labeled examples. Our hierarchical cancer to cancer transfer (HCTC) and zero-shot string similarity (ZSS) methods are designed to exploit shared information between cancers and auxiliary class features, respectively, to boost performance using enriched annotations which give both location-based information and document level labels for each pathology report. MATERIALS AND METHODS: Our data consists of 250 pathology reports each for kidney, colon, and lung cancer from 2002 to 2019 from a single institution (UCSF). For each report, we classified 5 attributes: procedure, tumor location, histology, grade, and presence of lymphovascular invasion. We develop novel NLP techniques involving transfer learning and string similarity trained on enriched annotations. We compare HCTC and ZSS methods to the state-of-the-art including conventional machine learning methods as well as deep learning methods. RESULTS: For our HCTC method, we see an improvement of up to 0.1 micro-F1 score and 0.04 macro-F1 averaged across cancer and applicable attributes. For our ZSS method, we see an improvement of up to 0.26 micro-F1 and 0.23 macro-F1 averaged across cancer and applicable attributes. These comparisons are made after adjusting training data sizes to correct for the 20% increase in annotation time for enriched annotations compared to ordinary annotations. CONCLUSIONS: Methods based on transfer learning across cancers and augmenting information methods with string similarity priors can significantly reduce the amount of labeled data needed for accurate information extraction from pathology reports.
format	Online Article Text
id	pubmed-8484934
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-84849342021-10-01 Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity Park, Briton Altieri, Nicholas DeNero, John Odisho, Anobel Y Yu, Bin JAMIA Open Research and Applications OBJECTIVE: We develop natural language processing (NLP) methods capable of accurately classifying tumor attributes from pathology reports given minimal labeled examples. Our hierarchical cancer to cancer transfer (HCTC) and zero-shot string similarity (ZSS) methods are designed to exploit shared information between cancers and auxiliary class features, respectively, to boost performance using enriched annotations which give both location-based information and document level labels for each pathology report. MATERIALS AND METHODS: Our data consists of 250 pathology reports each for kidney, colon, and lung cancer from 2002 to 2019 from a single institution (UCSF). For each report, we classified 5 attributes: procedure, tumor location, histology, grade, and presence of lymphovascular invasion. We develop novel NLP techniques involving transfer learning and string similarity trained on enriched annotations. We compare HCTC and ZSS methods to the state-of-the-art including conventional machine learning methods as well as deep learning methods. RESULTS: For our HCTC method, we see an improvement of up to 0.1 micro-F1 score and 0.04 macro-F1 averaged across cancer and applicable attributes. For our ZSS method, we see an improvement of up to 0.26 micro-F1 and 0.23 macro-F1 averaged across cancer and applicable attributes. These comparisons are made after adjusting training data sizes to correct for the 20% increase in annotation time for enriched annotations compared to ordinary annotations. CONCLUSIONS: Methods based on transfer learning across cancers and augmenting information methods with string similarity priors can significantly reduce the amount of labeled data needed for accurate information extraction from pathology reports. Oxford University Press 2021-09-30 /pmc/articles/PMC8484934/ /pubmed/34604711 http://dx.doi.org/10.1093/jamiaopen/ooab085 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Research and Applications Park, Briton Altieri, Nicholas DeNero, John Odisho, Anobel Y Yu, Bin Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
title	Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
title_full	Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
title_fullStr	Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
title_full_unstemmed	Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
title_short	Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
title_sort	improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity
topic	Research and Applications
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8484934/ https://www.ncbi.nlm.nih.gov/pubmed/34604711 http://dx.doi.org/10.1093/jamiaopen/ooab085
work_keys_str_mv	AT parkbriton improvingnaturallanguageinformationextractionfromcancerpathologyreportsusingtransferlearningandzeroshotstringsimilarity AT altierinicholas improvingnaturallanguageinformationextractionfromcancerpathologyreportsusingtransferlearningandzeroshotstringsimilarity AT denerojohn improvingnaturallanguageinformationextractionfromcancerpathologyreportsusingtransferlearningandzeroshotstringsimilarity AT odishoanobely improvingnaturallanguageinformationextractionfromcancerpathologyreportsusingtransferlearningandzeroshotstringsimilarity AT yubin improvingnaturallanguageinformationextractionfromcancerpathologyreportsusingtransferlearningandzeroshotstringsimilarity

Improving natural language information extraction from cancer pathology reports using transfer learning and zero-shot string similarity

Ejemplares similares