Cargando…

Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohorts

BACKGROUND: Bronchoscopy for suspected lung cancer has low diagnostic sensitivity, rendering many inconclusive results. The Bronchial Genomic Classifier (BGC) was developed to help with patient management by identifying those with low risk of lung cancer when bronchoscopy is inconclusive. The BGC wa...

Descripción completa

Detalles Bibliográficos
Autores principales: Choi, Yoonha, Qu, Jianghan, Wu, Shuyang, Hao, Yangyang, Zhang, Jiarui, Ning, Jianchang, Yang, Xinwu, Lofaro, Lori, Pankratz, Daniel G., Babiarz, Joshua, Walsh, P. Sean, Billatos, Ehab, Lenburg, Marc E., Kennedy, Giulia C., McAuliffe, Jon, Huang, Jing
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7579926/
https://www.ncbi.nlm.nih.gov/pubmed/33087128
http://dx.doi.org/10.1186/s12920-020-00782-1
_version_ 1783598692942479360
author Choi, Yoonha
Qu, Jianghan
Wu, Shuyang
Hao, Yangyang
Zhang, Jiarui
Ning, Jianchang
Yang, Xinwu
Lofaro, Lori
Pankratz, Daniel G.
Babiarz, Joshua
Walsh, P. Sean
Billatos, Ehab
Lenburg, Marc E.
Kennedy, Giulia C.
McAuliffe, Jon
Huang, Jing
author_facet Choi, Yoonha
Qu, Jianghan
Wu, Shuyang
Hao, Yangyang
Zhang, Jiarui
Ning, Jianchang
Yang, Xinwu
Lofaro, Lori
Pankratz, Daniel G.
Babiarz, Joshua
Walsh, P. Sean
Billatos, Ehab
Lenburg, Marc E.
Kennedy, Giulia C.
McAuliffe, Jon
Huang, Jing
author_sort Choi, Yoonha
collection PubMed
description BACKGROUND: Bronchoscopy for suspected lung cancer has low diagnostic sensitivity, rendering many inconclusive results. The Bronchial Genomic Classifier (BGC) was developed to help with patient management by identifying those with low risk of lung cancer when bronchoscopy is inconclusive. The BGC was trained and validated on patients in the Airway Epithelial Gene Expression in the Diagnosis of Lung Cancer (AEGIS) trials. A modern patient cohort, the BGC Registry, showed differences in key clinical factors from the AEGIS cohorts, with less smoking history, smaller nodules and older age. Additionally, we discovered interfering factors (inhaled medication and sample collection timing) that impacted gene expressions and potentially disguised genomic cancer signals. METHODS: In this study, we leveraged multiple cohorts and next generation sequencing technology to develop a robust Genomic Sequencing Classifier (GSC). To address demographic composition shift and interfering factors, we synergized three algorithmic strategies: 1) ensemble of clinical dominant and genomic dominant models; 2) development of hierarchical regression models where the main effects from clinical variables were regressed out prior to the genomic impact being fitted in the model; and 3) targeted placement of genomic and clinical interaction terms to stabilize the effect of interfering factors. The final GSC model uses 1232 genes and four clinical covariates – age, pack-years, inhaled medication use, and specimen collection timing. RESULTS: In the validation set (N = 412), the GSC down-classified low and intermediate pre-test risk subjects to very low and low post-test risk with a specificity of 45% (95% CI 37–53%) and a sensitivity of 91% (95%CI 81–97%), resulting in a negative predictive value of 95% (95% CI 89–98%). Twelve percent of intermediate pre-test risk subjects were up-classified to high post-test risk with a positive predictive value of 65% (95%CI 44–82%), and 27% of high pre-test risk subjects were up-classified to very high post-test risk with a positive predictive value of 91% (95% CI 78–97%). CONCLUSIONS: The GSC overcame the impact of interfering factors and achieved consistent performance across multiple cohorts. It demonstrated diagnostic accuracy in both down- and up-classification of cancer risk, providing physicians actionable information for many patients with inconclusive bronchoscopy.
format Online
Article
Text
id pubmed-7579926
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-75799262020-10-22 Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohorts Choi, Yoonha Qu, Jianghan Wu, Shuyang Hao, Yangyang Zhang, Jiarui Ning, Jianchang Yang, Xinwu Lofaro, Lori Pankratz, Daniel G. Babiarz, Joshua Walsh, P. Sean Billatos, Ehab Lenburg, Marc E. Kennedy, Giulia C. McAuliffe, Jon Huang, Jing BMC Med Genomics Research BACKGROUND: Bronchoscopy for suspected lung cancer has low diagnostic sensitivity, rendering many inconclusive results. The Bronchial Genomic Classifier (BGC) was developed to help with patient management by identifying those with low risk of lung cancer when bronchoscopy is inconclusive. The BGC was trained and validated on patients in the Airway Epithelial Gene Expression in the Diagnosis of Lung Cancer (AEGIS) trials. A modern patient cohort, the BGC Registry, showed differences in key clinical factors from the AEGIS cohorts, with less smoking history, smaller nodules and older age. Additionally, we discovered interfering factors (inhaled medication and sample collection timing) that impacted gene expressions and potentially disguised genomic cancer signals. METHODS: In this study, we leveraged multiple cohorts and next generation sequencing technology to develop a robust Genomic Sequencing Classifier (GSC). To address demographic composition shift and interfering factors, we synergized three algorithmic strategies: 1) ensemble of clinical dominant and genomic dominant models; 2) development of hierarchical regression models where the main effects from clinical variables were regressed out prior to the genomic impact being fitted in the model; and 3) targeted placement of genomic and clinical interaction terms to stabilize the effect of interfering factors. The final GSC model uses 1232 genes and four clinical covariates – age, pack-years, inhaled medication use, and specimen collection timing. RESULTS: In the validation set (N = 412), the GSC down-classified low and intermediate pre-test risk subjects to very low and low post-test risk with a specificity of 45% (95% CI 37–53%) and a sensitivity of 91% (95%CI 81–97%), resulting in a negative predictive value of 95% (95% CI 89–98%). Twelve percent of intermediate pre-test risk subjects were up-classified to high post-test risk with a positive predictive value of 65% (95%CI 44–82%), and 27% of high pre-test risk subjects were up-classified to very high post-test risk with a positive predictive value of 91% (95% CI 78–97%). CONCLUSIONS: The GSC overcame the impact of interfering factors and achieved consistent performance across multiple cohorts. It demonstrated diagnostic accuracy in both down- and up-classification of cancer risk, providing physicians actionable information for many patients with inconclusive bronchoscopy. BioMed Central 2020-10-22 /pmc/articles/PMC7579926/ /pubmed/33087128 http://dx.doi.org/10.1186/s12920-020-00782-1 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Choi, Yoonha
Qu, Jianghan
Wu, Shuyang
Hao, Yangyang
Zhang, Jiarui
Ning, Jianchang
Yang, Xinwu
Lofaro, Lori
Pankratz, Daniel G.
Babiarz, Joshua
Walsh, P. Sean
Billatos, Ehab
Lenburg, Marc E.
Kennedy, Giulia C.
McAuliffe, Jon
Huang, Jing
Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohorts
title Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohorts
title_full Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohorts
title_fullStr Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohorts
title_full_unstemmed Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohorts
title_short Improving lung cancer risk stratification leveraging whole transcriptome RNA sequencing and machine learning across multiple cohorts
title_sort improving lung cancer risk stratification leveraging whole transcriptome rna sequencing and machine learning across multiple cohorts
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7579926/
https://www.ncbi.nlm.nih.gov/pubmed/33087128
http://dx.doi.org/10.1186/s12920-020-00782-1
work_keys_str_mv AT choiyoonha improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT qujianghan improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT wushuyang improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT haoyangyang improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT zhangjiarui improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT ningjianchang improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT yangxinwu improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT lofarolori improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT pankratzdanielg improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT babiarzjoshua improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT walshpsean improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT billatosehab improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT lenburgmarce improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT kennedygiuliac improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT mcauliffejon improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts
AT huangjing improvinglungcancerriskstratificationleveragingwholetranscriptomernasequencingandmachinelearningacrossmultiplecohorts