Cargando…

Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns

BACKGROUND: When considering therapies for lung adenocarcinoma (LUAD) patients, the carcinogenic mechanisms of smokers are believed to differ from those who have never smoked. The rising trend in the proportion of nonsmokers in LUAD urgently requires the understanding of such differences at a molecu...

Descripción completa

Detalles Bibliográficos
Autores principales: Dong, Yan-mei, Qin, Li-da, Tong, Yi-fan, He, Qi-en, Wang, Ling, Song, Kai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6995662/
https://www.ncbi.nlm.nih.gov/pubmed/32030321
http://dx.doi.org/10.7717/peerj.8349
_version_ 1783493418811392000
author Dong, Yan-mei
Qin, Li-da
Tong, Yi-fan
He, Qi-en
Wang, Ling
Song, Kai
author_facet Dong, Yan-mei
Qin, Li-da
Tong, Yi-fan
He, Qi-en
Wang, Ling
Song, Kai
author_sort Dong, Yan-mei
collection PubMed
description BACKGROUND: When considering therapies for lung adenocarcinoma (LUAD) patients, the carcinogenic mechanisms of smokers are believed to differ from those who have never smoked. The rising trend in the proportion of nonsmokers in LUAD urgently requires the understanding of such differences at a molecular level for the development of precision medicine. METHODS: Three independent LUAD tumor sample sets—TCGA, SPORE and EDRN—were used. Genome patterns of expression (GE), copy number variation (CNV) and methylation (ME) were reviewed to discover the differences between them for both smokers and nonsmokers. Tobacco-related signature genes distinguishing these two groups of LUAD were identified using the GE, ME and CNV values of the whole genome. To do this, a novel iterative multi-step selection method based on the partial least squares (PLS) algorithm was proposed to overcome the high variable dimension and high noise inherent in the data. This method can thoroughly evaluate the importance of genes according to their statistical differences, biological functions and contributions to the tobacco exposure classification model. The kernel partial least squares (KPLS) method was used to further optimize the accuracies of the classification models. RESULTS: Forty-three, forty-eight and seventy-five genes were identified as GE, ME and CNV signatures, respectively, to distinguish smokers from nonsmokers. Using only the gene expression values of these 43 GE signature genes, ME values of the 48 ME signature genes or copy numbers of the 75 CNV signature genes, the accuracies of TCGA training and SPORE/EDRN independent validation datasets all exceed 76%. More importantly, the focal amplicon in Telomerase Reverse Transcriptase in nonsmokers, the broad deletion in ChrY in male nonsmokers and the greater amplification of MDM2 in female nonsmokers may explain why nonsmokers of both genders tend to suffer LUAD. These pattern analysis results may have clear biological interpretation in the molecular mechanism of tumorigenesis. Meanwhile, the identified signature genes may serve as potential drug targets for the precision medicine of LUAD.
format Online
Article
Text
id pubmed-6995662
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-69956622020-02-06 Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns Dong, Yan-mei Qin, Li-da Tong, Yi-fan He, Qi-en Wang, Ling Song, Kai PeerJ Bioinformatics BACKGROUND: When considering therapies for lung adenocarcinoma (LUAD) patients, the carcinogenic mechanisms of smokers are believed to differ from those who have never smoked. The rising trend in the proportion of nonsmokers in LUAD urgently requires the understanding of such differences at a molecular level for the development of precision medicine. METHODS: Three independent LUAD tumor sample sets—TCGA, SPORE and EDRN—were used. Genome patterns of expression (GE), copy number variation (CNV) and methylation (ME) were reviewed to discover the differences between them for both smokers and nonsmokers. Tobacco-related signature genes distinguishing these two groups of LUAD were identified using the GE, ME and CNV values of the whole genome. To do this, a novel iterative multi-step selection method based on the partial least squares (PLS) algorithm was proposed to overcome the high variable dimension and high noise inherent in the data. This method can thoroughly evaluate the importance of genes according to their statistical differences, biological functions and contributions to the tobacco exposure classification model. The kernel partial least squares (KPLS) method was used to further optimize the accuracies of the classification models. RESULTS: Forty-three, forty-eight and seventy-five genes were identified as GE, ME and CNV signatures, respectively, to distinguish smokers from nonsmokers. Using only the gene expression values of these 43 GE signature genes, ME values of the 48 ME signature genes or copy numbers of the 75 CNV signature genes, the accuracies of TCGA training and SPORE/EDRN independent validation datasets all exceed 76%. More importantly, the focal amplicon in Telomerase Reverse Transcriptase in nonsmokers, the broad deletion in ChrY in male nonsmokers and the greater amplification of MDM2 in female nonsmokers may explain why nonsmokers of both genders tend to suffer LUAD. These pattern analysis results may have clear biological interpretation in the molecular mechanism of tumorigenesis. Meanwhile, the identified signature genes may serve as potential drug targets for the precision medicine of LUAD. PeerJ Inc. 2020-01-30 /pmc/articles/PMC6995662/ /pubmed/32030321 http://dx.doi.org/10.7717/peerj.8349 Text en © 2020 Dong et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Dong, Yan-mei
Qin, Li-da
Tong, Yi-fan
He, Qi-en
Wang, Ling
Song, Kai
Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
title Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
title_full Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
title_fullStr Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
title_full_unstemmed Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
title_short Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
title_sort multiple genome pattern analysis and signature gene identification for the caucasian lung adenocarcinoma patients with different tobacco exposure patterns
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6995662/
https://www.ncbi.nlm.nih.gov/pubmed/32030321
http://dx.doi.org/10.7717/peerj.8349
work_keys_str_mv AT dongyanmei multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns
AT qinlida multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns
AT tongyifan multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns
AT heqien multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns
AT wangling multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns
AT songkai multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns