Cargando…

Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre

Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation. Objective: To automate...

Descripción completa

Detalles Bibliográficos
Autores principales: Hunter, Benjamin, Reis, Sara, Campbell, Des, Matharu, Sheila, Ratnakumar, Prashanthi, Mercuri, Luca, Hindocha, Sumeet, Kalsi, Hardeep, Mayer, Erik, Glampson, Ben, Robinson, Emily J., Al-Lazikani, Bisan, Scerri, Lisa, Bloch, Susannah, Lee, Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8599820/
https://www.ncbi.nlm.nih.gov/pubmed/34805217
http://dx.doi.org/10.3389/fmed.2021.748168
_version_ 1784601026283700224
author Hunter, Benjamin
Reis, Sara
Campbell, Des
Matharu, Sheila
Ratnakumar, Prashanthi
Mercuri, Luca
Hindocha, Sumeet
Kalsi, Hardeep
Mayer, Erik
Glampson, Ben
Robinson, Emily J.
Al-Lazikani, Bisan
Scerri, Lisa
Bloch, Susannah
Lee, Richard
author_facet Hunter, Benjamin
Reis, Sara
Campbell, Des
Matharu, Sheila
Ratnakumar, Prashanthi
Mercuri, Luca
Hindocha, Sumeet
Kalsi, Hardeep
Mayer, Erik
Glampson, Ben
Robinson, Emily J.
Al-Lazikani, Bisan
Scerri, Lisa
Bloch, Susannah
Lee, Richard
author_sort Hunter, Benjamin
collection PubMed
description Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation. Objective: To automate lung nodule identification in a tertiary cancer centre. Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients. Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy. Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition.
format Online
Article
Text
id pubmed-8599820
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-85998202021-11-19 Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre Hunter, Benjamin Reis, Sara Campbell, Des Matharu, Sheila Ratnakumar, Prashanthi Mercuri, Luca Hindocha, Sumeet Kalsi, Hardeep Mayer, Erik Glampson, Ben Robinson, Emily J. Al-Lazikani, Bisan Scerri, Lisa Bloch, Susannah Lee, Richard Front Med (Lausanne) Medicine Importance: The stratification of indeterminate lung nodules is a growing problem, but the burden of lung nodules on healthcare services is not well-described. Manual service evaluation and research cohort curation can be time-consuming and potentially improved by automation. Objective: To automate lung nodule identification in a tertiary cancer centre. Methods: This retrospective cohort study used Electronic Healthcare Records to identify CT reports generated between 31st October 2011 and 24th July 2020. A structured query language/natural language processing tool was developed to classify reports according to lung nodule status. Performance was externally validated. Sentences were used to train machine-learning classifiers to predict concerning nodule features in 2,000 patients. Results: 14,586 patients with lung nodules were identified. The cancer types most commonly associated with lung nodules were lung (39%), neuro-endocrine (38%), skin (35%), colorectal (33%) and sarcoma (33%). Lung nodule patients had a greater proportion of metastatic diagnoses (45 vs. 23%, p < 0.001), a higher mean post-baseline scan number (6.56 vs. 1.93, p < 0.001), and a shorter mean scan interval (4.1 vs. 5.9 months, p < 0.001) than those without nodules. Inter-observer agreement for sentence classification was 0.94 internally and 0.98 externally. Sensitivity and specificity for nodule identification were 93 and 99% internally, and 100 and 100% at external validation, respectively. A linear-support vector machine model predicted concerning sentence features with 94% accuracy. Conclusion: We have developed and validated an accurate tool for automated lung nodule identification that is valuable for service evaluation and research data acquisition. Frontiers Media S.A. 2021-11-04 /pmc/articles/PMC8599820/ /pubmed/34805217 http://dx.doi.org/10.3389/fmed.2021.748168 Text en Copyright © 2021 Hunter, Reis, Campbell, Matharu, Ratnakumar, Mercuri, Hindocha, Kalsi, Mayer, Glampson, Robinson, Al-Lazikani, Scerri, Bloch and Lee. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Medicine
Hunter, Benjamin
Reis, Sara
Campbell, Des
Matharu, Sheila
Ratnakumar, Prashanthi
Mercuri, Luca
Hindocha, Sumeet
Kalsi, Hardeep
Mayer, Erik
Glampson, Ben
Robinson, Emily J.
Al-Lazikani, Bisan
Scerri, Lisa
Bloch, Susannah
Lee, Richard
Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
title Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
title_full Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
title_fullStr Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
title_full_unstemmed Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
title_short Development of a Structured Query Language and Natural Language Processing Algorithm to Identify Lung Nodules in a Cancer Centre
title_sort development of a structured query language and natural language processing algorithm to identify lung nodules in a cancer centre
topic Medicine
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8599820/
https://www.ncbi.nlm.nih.gov/pubmed/34805217
http://dx.doi.org/10.3389/fmed.2021.748168
work_keys_str_mv AT hunterbenjamin developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT reissara developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT campbelldes developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT matharusheila developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT ratnakumarprashanthi developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT mercuriluca developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT hindochasumeet developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT kalsihardeep developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT mayererik developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT glampsonben developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT robinsonemilyj developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT allazikanibisan developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT scerrilisa developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT blochsusannah developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre
AT leerichard developmentofastructuredquerylanguageandnaturallanguageprocessingalgorithmtoidentifylungnodulesinacancercentre