Cargando…

Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier

BACKGROUND: The main cause of cancer death is lung cancer (LC) which usually presents at an advanced stage, but its early detection would increase the benefits of treatment. Blood is particularly favored in clinical research given the possibility of using it for relatively noninvasive analyses. Copy...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Daping, Liu, Zhidong, Su, Chongyu, Han, Yi, Duan, XinChun, Zhang, Rui, Liu, Xiaoshuang, Yang, Yang, Xu, Shaofa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons Australia, Ltd 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6938748/
https://www.ncbi.nlm.nih.gov/pubmed/31694073
http://dx.doi.org/10.1111/1759-7714.13204
_version_ 1783484088293785600
author Yu, Daping
Liu, Zhidong
Su, Chongyu
Han, Yi
Duan, XinChun
Zhang, Rui
Liu, Xiaoshuang
Yang, Yang
Xu, Shaofa
author_facet Yu, Daping
Liu, Zhidong
Su, Chongyu
Han, Yi
Duan, XinChun
Zhang, Rui
Liu, Xiaoshuang
Yang, Yang
Xu, Shaofa
author_sort Yu, Daping
collection PubMed
description BACKGROUND: The main cause of cancer death is lung cancer (LC) which usually presents at an advanced stage, but its early detection would increase the benefits of treatment. Blood is particularly favored in clinical research given the possibility of using it for relatively noninvasive analyses. Copy number variation (CNV) is a common genetic change in tumor genomes, and many studies have indicated that CNV‐derived cell‐free DNA (cfDNA) from plasma could be feasible as a biomarker for cancer diagnosis. METHODS: In this study, we determined the possibility of using chromosomal arm‐level CNV from cfDNA as a biomarker for lung cancer diagnosis in a small cohort of 40 patients and 41 healthy controls. Arm‐level CNV distributions were analyzed based on z score, and the machine‐learning algorithm Extreme Gradient Boosting (XGBoost) was applied for cancer prediction. RESULTS: The results showed that amplifications tended to emerge on chromosomes 3q, 8q, 12p, and 7q. Deletions were frequently detected on chromosomes 22q, 3p, 5q, 16q, 10q, and 15q. Upon applying a trained XGBoost classifier, specificity and sensitivity of 100% were finally achieved in the test group (12 patients and 13 healthy controls). In addition, five‐fold cross‐validation proved the stability of the model. Finally, our results suggested that the integration of four arm‐level CNVs and the concentration of cfDNA into the trained XGBoost classifier provides a potential method for detecting lung cancer. CONCLUSION: Our results suggested that the integration of four arm‐level CNVs and the concentration from of cfDNA integrated withinto the trained XGBoost classifier could become provides a potentially method for detecting lung cancer detection. KEY POINTS: Significant findings of the study: Healthy individuals have different arm‐level CNV profiles from cancer patients. Amplifications tend to emerge on chromosome 3q, 8q, 12p, 7q and deletions tend to emerge on chromosome 22q, 3p, 5q, 16q, 10q, 15q. What this study adds CfDNA concentration, arm 10q, 3q, 8q, 3p, and 22q are key features for prediction. Trained XGBoost classifier is a potential method for lung cancer detection.
format Online
Article
Text
id pubmed-6938748
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher John Wiley & Sons Australia, Ltd
record_format MEDLINE/PubMed
spelling pubmed-69387482020-01-06 Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier Yu, Daping Liu, Zhidong Su, Chongyu Han, Yi Duan, XinChun Zhang, Rui Liu, Xiaoshuang Yang, Yang Xu, Shaofa Thorac Cancer Original Articles BACKGROUND: The main cause of cancer death is lung cancer (LC) which usually presents at an advanced stage, but its early detection would increase the benefits of treatment. Blood is particularly favored in clinical research given the possibility of using it for relatively noninvasive analyses. Copy number variation (CNV) is a common genetic change in tumor genomes, and many studies have indicated that CNV‐derived cell‐free DNA (cfDNA) from plasma could be feasible as a biomarker for cancer diagnosis. METHODS: In this study, we determined the possibility of using chromosomal arm‐level CNV from cfDNA as a biomarker for lung cancer diagnosis in a small cohort of 40 patients and 41 healthy controls. Arm‐level CNV distributions were analyzed based on z score, and the machine‐learning algorithm Extreme Gradient Boosting (XGBoost) was applied for cancer prediction. RESULTS: The results showed that amplifications tended to emerge on chromosomes 3q, 8q, 12p, and 7q. Deletions were frequently detected on chromosomes 22q, 3p, 5q, 16q, 10q, and 15q. Upon applying a trained XGBoost classifier, specificity and sensitivity of 100% were finally achieved in the test group (12 patients and 13 healthy controls). In addition, five‐fold cross‐validation proved the stability of the model. Finally, our results suggested that the integration of four arm‐level CNVs and the concentration of cfDNA into the trained XGBoost classifier provides a potential method for detecting lung cancer. CONCLUSION: Our results suggested that the integration of four arm‐level CNVs and the concentration from of cfDNA integrated withinto the trained XGBoost classifier could become provides a potentially method for detecting lung cancer detection. KEY POINTS: Significant findings of the study: Healthy individuals have different arm‐level CNV profiles from cancer patients. Amplifications tend to emerge on chromosome 3q, 8q, 12p, 7q and deletions tend to emerge on chromosome 22q, 3p, 5q, 16q, 10q, 15q. What this study adds CfDNA concentration, arm 10q, 3q, 8q, 3p, and 22q are key features for prediction. Trained XGBoost classifier is a potential method for lung cancer detection. John Wiley & Sons Australia, Ltd 2019-11-06 2020-01 /pmc/articles/PMC6938748/ /pubmed/31694073 http://dx.doi.org/10.1111/1759-7714.13204 Text en © 2019 The Authors. Thoracic Cancer published by China Lung Oncology Group and John Wiley & Sons Australia, Ltd This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Articles
Yu, Daping
Liu, Zhidong
Su, Chongyu
Han, Yi
Duan, XinChun
Zhang, Rui
Liu, Xiaoshuang
Yang, Yang
Xu, Shaofa
Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier
title Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier
title_full Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier
title_fullStr Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier
title_full_unstemmed Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier
title_short Copy number variation in plasma as a tool for lung cancer prediction using Extreme Gradient Boosting (XGBoost) classifier
title_sort copy number variation in plasma as a tool for lung cancer prediction using extreme gradient boosting (xgboost) classifier
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6938748/
https://www.ncbi.nlm.nih.gov/pubmed/31694073
http://dx.doi.org/10.1111/1759-7714.13204
work_keys_str_mv AT yudaping copynumbervariationinplasmaasatoolforlungcancerpredictionusingextremegradientboostingxgboostclassifier
AT liuzhidong copynumbervariationinplasmaasatoolforlungcancerpredictionusingextremegradientboostingxgboostclassifier
AT suchongyu copynumbervariationinplasmaasatoolforlungcancerpredictionusingextremegradientboostingxgboostclassifier
AT hanyi copynumbervariationinplasmaasatoolforlungcancerpredictionusingextremegradientboostingxgboostclassifier
AT duanxinchun copynumbervariationinplasmaasatoolforlungcancerpredictionusingextremegradientboostingxgboostclassifier
AT zhangrui copynumbervariationinplasmaasatoolforlungcancerpredictionusingextremegradientboostingxgboostclassifier
AT liuxiaoshuang copynumbervariationinplasmaasatoolforlungcancerpredictionusingextremegradientboostingxgboostclassifier
AT yangyang copynumbervariationinplasmaasatoolforlungcancerpredictionusingextremegradientboostingxgboostclassifier
AT xushaofa copynumbervariationinplasmaasatoolforlungcancerpredictionusingextremegradientboostingxgboostclassifier