Cargando…
Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns
BACKGROUND: When considering therapies for lung adenocarcinoma (LUAD) patients, the carcinogenic mechanisms of smokers are believed to differ from those who have never smoked. The rising trend in the proportion of nonsmokers in LUAD urgently requires the understanding of such differences at a molecu...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6995662/ https://www.ncbi.nlm.nih.gov/pubmed/32030321 http://dx.doi.org/10.7717/peerj.8349 |
_version_ | 1783493418811392000 |
---|---|
author | Dong, Yan-mei Qin, Li-da Tong, Yi-fan He, Qi-en Wang, Ling Song, Kai |
author_facet | Dong, Yan-mei Qin, Li-da Tong, Yi-fan He, Qi-en Wang, Ling Song, Kai |
author_sort | Dong, Yan-mei |
collection | PubMed |
description | BACKGROUND: When considering therapies for lung adenocarcinoma (LUAD) patients, the carcinogenic mechanisms of smokers are believed to differ from those who have never smoked. The rising trend in the proportion of nonsmokers in LUAD urgently requires the understanding of such differences at a molecular level for the development of precision medicine. METHODS: Three independent LUAD tumor sample sets—TCGA, SPORE and EDRN—were used. Genome patterns of expression (GE), copy number variation (CNV) and methylation (ME) were reviewed to discover the differences between them for both smokers and nonsmokers. Tobacco-related signature genes distinguishing these two groups of LUAD were identified using the GE, ME and CNV values of the whole genome. To do this, a novel iterative multi-step selection method based on the partial least squares (PLS) algorithm was proposed to overcome the high variable dimension and high noise inherent in the data. This method can thoroughly evaluate the importance of genes according to their statistical differences, biological functions and contributions to the tobacco exposure classification model. The kernel partial least squares (KPLS) method was used to further optimize the accuracies of the classification models. RESULTS: Forty-three, forty-eight and seventy-five genes were identified as GE, ME and CNV signatures, respectively, to distinguish smokers from nonsmokers. Using only the gene expression values of these 43 GE signature genes, ME values of the 48 ME signature genes or copy numbers of the 75 CNV signature genes, the accuracies of TCGA training and SPORE/EDRN independent validation datasets all exceed 76%. More importantly, the focal amplicon in Telomerase Reverse Transcriptase in nonsmokers, the broad deletion in ChrY in male nonsmokers and the greater amplification of MDM2 in female nonsmokers may explain why nonsmokers of both genders tend to suffer LUAD. These pattern analysis results may have clear biological interpretation in the molecular mechanism of tumorigenesis. Meanwhile, the identified signature genes may serve as potential drug targets for the precision medicine of LUAD. |
format | Online Article Text |
id | pubmed-6995662 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-69956622020-02-06 Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns Dong, Yan-mei Qin, Li-da Tong, Yi-fan He, Qi-en Wang, Ling Song, Kai PeerJ Bioinformatics BACKGROUND: When considering therapies for lung adenocarcinoma (LUAD) patients, the carcinogenic mechanisms of smokers are believed to differ from those who have never smoked. The rising trend in the proportion of nonsmokers in LUAD urgently requires the understanding of such differences at a molecular level for the development of precision medicine. METHODS: Three independent LUAD tumor sample sets—TCGA, SPORE and EDRN—were used. Genome patterns of expression (GE), copy number variation (CNV) and methylation (ME) were reviewed to discover the differences between them for both smokers and nonsmokers. Tobacco-related signature genes distinguishing these two groups of LUAD were identified using the GE, ME and CNV values of the whole genome. To do this, a novel iterative multi-step selection method based on the partial least squares (PLS) algorithm was proposed to overcome the high variable dimension and high noise inherent in the data. This method can thoroughly evaluate the importance of genes according to their statistical differences, biological functions and contributions to the tobacco exposure classification model. The kernel partial least squares (KPLS) method was used to further optimize the accuracies of the classification models. RESULTS: Forty-three, forty-eight and seventy-five genes were identified as GE, ME and CNV signatures, respectively, to distinguish smokers from nonsmokers. Using only the gene expression values of these 43 GE signature genes, ME values of the 48 ME signature genes or copy numbers of the 75 CNV signature genes, the accuracies of TCGA training and SPORE/EDRN independent validation datasets all exceed 76%. More importantly, the focal amplicon in Telomerase Reverse Transcriptase in nonsmokers, the broad deletion in ChrY in male nonsmokers and the greater amplification of MDM2 in female nonsmokers may explain why nonsmokers of both genders tend to suffer LUAD. These pattern analysis results may have clear biological interpretation in the molecular mechanism of tumorigenesis. Meanwhile, the identified signature genes may serve as potential drug targets for the precision medicine of LUAD. PeerJ Inc. 2020-01-30 /pmc/articles/PMC6995662/ /pubmed/32030321 http://dx.doi.org/10.7717/peerj.8349 Text en © 2020 Dong et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited. |
spellingShingle | Bioinformatics Dong, Yan-mei Qin, Li-da Tong, Yi-fan He, Qi-en Wang, Ling Song, Kai Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
title | Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
title_full | Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
title_fullStr | Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
title_full_unstemmed | Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
title_short | Multiple genome pattern analysis and signature gene identification for the Caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
title_sort | multiple genome pattern analysis and signature gene identification for the caucasian lung adenocarcinoma patients with different tobacco exposure patterns |
topic | Bioinformatics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6995662/ https://www.ncbi.nlm.nih.gov/pubmed/32030321 http://dx.doi.org/10.7717/peerj.8349 |
work_keys_str_mv | AT dongyanmei multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns AT qinlida multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns AT tongyifan multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns AT heqien multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns AT wangling multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns AT songkai multiplegenomepatternanalysisandsignaturegeneidentificationforthecaucasianlungadenocarcinomapatientswithdifferenttobaccoexposurepatterns |