Clinical characteristics and prognostic factors for Crohn’s disease relapses using natural language processing and machine learning: a pilot study

The impact of relapses on disease burden in Crohn’s disease (CD) warrants searching for predictive factors to anticipate relapses. This requires analysis of large datasets, including elusive free-text annotations from electronic health records. This study aims to describe clinical characteristics an...

Descripción completa

Detalles Bibliográficos
Autores principales: Gomollón, Fernando, Gisbert, Javier P., Guerra, Iván, Plaza, Rocío, Pajares Villarroya, Ramón, Moreno Almazán, Luis, López Martín, Mª Carmen, Domínguez Antonaya, Mercedes, Vera Mendoza, María Isabel, Aparicio, Jesús, Martínez, Vicente, Tagarro, Ignacio, Fernández-Nistal, Alonso, Lumbreras, Sara, Maté, Claudia, Montoto, Carmen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Lippincott Williams And Wilkins 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8876385/
https://www.ncbi.nlm.nih.gov/pubmed/34882644
http://dx.doi.org/10.1097/MEG.0000000000002317
_version_ 1784658157994246144
author Gomollón, Fernando
Gisbert, Javier P.
Guerra, Iván
Plaza, Rocío
Pajares Villarroya, Ramón
Moreno Almazán, Luis
López Martín, Mª Carmen
Domínguez Antonaya, Mercedes
Vera Mendoza, María Isabel
Aparicio, Jesús
Martínez, Vicente
Tagarro, Ignacio
Fernández-Nistal, Alonso
Lumbreras, Sara
Maté, Claudia
Montoto, Carmen
author_facet Gomollón, Fernando
Gisbert, Javier P.
Guerra, Iván
Plaza, Rocío
Pajares Villarroya, Ramón
Moreno Almazán, Luis
López Martín, Mª Carmen
Domínguez Antonaya, Mercedes
Vera Mendoza, María Isabel
Aparicio, Jesús
Martínez, Vicente
Tagarro, Ignacio
Fernández-Nistal, Alonso
Lumbreras, Sara
Maté, Claudia
Montoto, Carmen
author_sort Gomollón, Fernando
collection PubMed
description The impact of relapses on disease burden in Crohn’s disease (CD) warrants searching for predictive factors to anticipate relapses. This requires analysis of large datasets, including elusive free-text annotations from electronic health records. This study aims to describe clinical characteristics and treatment with biologics of CD patients and generate a data-driven predictive model for relapse using natural language processing (NLP) and machine learning (ML). METHODS: We performed a multicenter, retrospective study using a previously validated corpus of CD patient data from eight hospitals of the Spanish National Healthcare Network from 1 January 2014 to 31 December 2018 using NLP. Predictive models were created with ML algorithms, namely, logistic regression, decision trees, and random forests. RESULTS: CD phenotype, analyzed in 5938 CD patients, was predominantly inflammatory, and tobacco smoking appeared as a risk factor, confirming previous clinical studies. We also documented treatments, treatment switches, and time to discontinuation in biologics-treated CD patients. We found correlations between CD and patient family history of gastrointestinal neoplasms. Our predictive model ranked 25 000 variables for their potential as risk factors for CD relapse. Of highest relative importance were past relapses and patients’ age, as well as leukocyte, hemoglobin, and fibrinogen levels. CONCLUSION: Through NLP, we identified variables such as smoking as a risk factor and described treatment patterns with biologics in CD patients. CD relapse prediction highlighted the importance of patients’ age and some biochemistry values, though it proved highly challenging and merits the assessment of risk factors for relapse in a clinical setting.
format Online
Article
Text
id pubmed-8876385
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Lippincott Williams And Wilkins
record_format MEDLINE/PubMed
spelling pubmed-88763852022-03-03 Clinical characteristics and prognostic factors for Crohn’s disease relapses using natural language processing and machine learning: a pilot study Gomollón, Fernando Gisbert, Javier P. Guerra, Iván Plaza, Rocío Pajares Villarroya, Ramón Moreno Almazán, Luis López Martín, Mª Carmen Domínguez Antonaya, Mercedes Vera Mendoza, María Isabel Aparicio, Jesús Martínez, Vicente Tagarro, Ignacio Fernández-Nistal, Alonso Lumbreras, Sara Maté, Claudia Montoto, Carmen Eur J Gastroenterol Hepatol Original Articles: Gastroenterology The impact of relapses on disease burden in Crohn’s disease (CD) warrants searching for predictive factors to anticipate relapses. This requires analysis of large datasets, including elusive free-text annotations from electronic health records. This study aims to describe clinical characteristics and treatment with biologics of CD patients and generate a data-driven predictive model for relapse using natural language processing (NLP) and machine learning (ML). METHODS: We performed a multicenter, retrospective study using a previously validated corpus of CD patient data from eight hospitals of the Spanish National Healthcare Network from 1 January 2014 to 31 December 2018 using NLP. Predictive models were created with ML algorithms, namely, logistic regression, decision trees, and random forests. RESULTS: CD phenotype, analyzed in 5938 CD patients, was predominantly inflammatory, and tobacco smoking appeared as a risk factor, confirming previous clinical studies. We also documented treatments, treatment switches, and time to discontinuation in biologics-treated CD patients. We found correlations between CD and patient family history of gastrointestinal neoplasms. Our predictive model ranked 25 000 variables for their potential as risk factors for CD relapse. Of highest relative importance were past relapses and patients’ age, as well as leukocyte, hemoglobin, and fibrinogen levels. CONCLUSION: Through NLP, we identified variables such as smoking as a risk factor and described treatment patterns with biologics in CD patients. CD relapse prediction highlighted the importance of patients’ age and some biochemistry values, though it proved highly challenging and merits the assessment of risk factors for relapse in a clinical setting. Lippincott Williams And Wilkins 2021-12-02 2022-04 /pmc/articles/PMC8876385/ /pubmed/34882644 http://dx.doi.org/10.1097/MEG.0000000000002317 Text en Copyright © 2021 The Author(s). Published by Wolters Kluwer Health, Inc. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/) (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
spellingShingle Original Articles: Gastroenterology
Gomollón, Fernando
Gisbert, Javier P.
Guerra, Iván
Plaza, Rocío
Pajares Villarroya, Ramón
Moreno Almazán, Luis
López Martín, Mª Carmen
Domínguez Antonaya, Mercedes
Vera Mendoza, María Isabel
Aparicio, Jesús
Martínez, Vicente
Tagarro, Ignacio
Fernández-Nistal, Alonso
Lumbreras, Sara
Maté, Claudia
Montoto, Carmen
Clinical characteristics and prognostic factors for Crohn’s disease relapses using natural language processing and machine learning: a pilot study
title Clinical characteristics and prognostic factors for Crohn’s disease relapses using natural language processing and machine learning: a pilot study
title_full Clinical characteristics and prognostic factors for Crohn’s disease relapses using natural language processing and machine learning: a pilot study
title_fullStr Clinical characteristics and prognostic factors for Crohn’s disease relapses using natural language processing and machine learning: a pilot study
title_full_unstemmed Clinical characteristics and prognostic factors for Crohn’s disease relapses using natural language processing and machine learning: a pilot study
title_short Clinical characteristics and prognostic factors for Crohn’s disease relapses using natural language processing and machine learning: a pilot study
title_sort clinical characteristics and prognostic factors for crohn’s disease relapses using natural language processing and machine learning: a pilot study
topic Original Articles: Gastroenterology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8876385/
https://www.ncbi.nlm.nih.gov/pubmed/34882644
http://dx.doi.org/10.1097/MEG.0000000000002317
work_keys_str_mv AT gomollonfernando clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT gisbertjavierp clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT guerraivan clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT plazarocio clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT pajaresvillarroyaramon clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT morenoalmazanluis clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT lopezmartinmacarmen clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT dominguezantonayamercedes clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT veramendozamariaisabel clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT apariciojesus clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT martinezvicente clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT tagarroignacio clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT fernandeznistalalonso clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT lumbrerassara clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT mateclaudia clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy
AT montotocarmen clinicalcharacteristicsandprognosticfactorsforcrohnsdiseaserelapsesusingnaturallanguageprocessingandmachinelearningapilotstudy