Cargando…

A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma

BACKGROUND: Lung cancer is the leading cause of cancer deaths in the world. The most common type of lung cancer is lung adenocarcinoma (AC). The genetic mechanisms of the early stages and lung AC progression steps are poorly understood. There is currently no clinically applicable gene test for the e...

Descripción completa

Detalles Bibliográficos
Autores principales: Toh, Swee Heng, Prathipati, Philip, Motakis, Efthimios, Kwoh, Chee Keong, Yenamandra, Surya Pavan, Kuznetsov, Vladimir A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3377915/
https://www.ncbi.nlm.nih.gov/pubmed/22369099
http://dx.doi.org/10.1186/1471-2164-12-S3-S24
_version_ 1782236004035330048
author Toh, Swee Heng
Prathipati, Philip
Motakis, Efthimios
Kwoh, Chee Keong
Yenamandra, Surya Pavan
Kuznetsov, Vladimir A
author_facet Toh, Swee Heng
Prathipati, Philip
Motakis, Efthimios
Kwoh, Chee Keong
Yenamandra, Surya Pavan
Kuznetsov, Vladimir A
author_sort Toh, Swee Heng
collection PubMed
description BACKGROUND: Lung cancer is the leading cause of cancer deaths in the world. The most common type of lung cancer is lung adenocarcinoma (AC). The genetic mechanisms of the early stages and lung AC progression steps are poorly understood. There is currently no clinically applicable gene test for the early diagnosis and AC aggressiveness. Among the major reasons for the lack of reliable diagnostic biomarkers are the extraordinary heterogeneity of the cancer cells, complex and poorly understudied interactions of the AC cells with adjacent tissue and immune system, gene variation across patient cohorts, measurement variability, small sample sizes and sub-optimal analytical methods. We suggest that gene expression profiling of the primary tumours and adjacent tissues (PT-AT) handled with a rational statistical and bioinformatics strategy of biomarker prediction and validation could provide significant progress in the identification of clinical biomarkers of AC. To minimise sample-to-sample variability, repeated multivariate measurements in the same object (organ or tissue, e.g. PT-AT in lung) across patients should be designed, but prediction and validation on the genome scale with small sample size is a great methodical challenge. RESULTS: To analyse PT-AT relationships efficiently in the statistical modelling, we propose an Extreme Class Discrimination (ECD) feature selection method that identifies a sub-set of the most discriminative variables (e.g. expressed genes). Our method consists of a paired Cross-normalization (CN) step followed by a modified sign Wilcoxon test with multivariate adjustment carried out for each variable. Using an Affymetrix U133A microarray paired dataset of 27 AC patients, we reviewed the global reprogramming of the transcriptome in human lung AC tissue versus normal lung tissue, which is associated with about 2,300 genes discriminating the tissues with 100% accuracy. Cluster analysis applied to these genes resulted in four distinct gene groups which we classified as associated with (i) up-regulated genes in the mitotic cell cycle lung AC, (ii) silenced/suppressed gene specific for normal lung tissue, (iii) cell communication and cell motility and (iv) the immune system features. The genes related to mutagenesis, specific lung cancers, early stage of AC development, tumour aggressiveness and metabolic pathway alterations and adaptations of cancer cells are strongly enriched in the AC PT-AT discriminative gene set. Two AC diagnostic biomarkers SPP1 and CENPA were successfully validated on RT-RCR tissue array. ECD method was systematically compared to several alternative methods and proved to be of better performance and as well as it was validated by comparison of the predicted gene set with literature meta-signature. CONCLUSIONS: We developed a method that identifies and selects highly discriminative variables from high dimensional data spaces of potential biomarkers based on a statistical analysis of paired samples when the number of samples is small. This method provides superior selection in comparison to conventional methods and can be widely used in different applications. Our method revealed at least 23 hundreds patho-biologically essential genes associated with the global transcriptional reprogramming of human lung epithelium cells and lung AC aggressiveness. This gene set includes many previously published AC biomarkers reflecting inherent disease complexity and specifies the mechanisms of carcinogenesis in the lung AC. SPP1, CENPA and many other PT-AT discriminative genes could be considered as the prospective diagnostic and prognostic biomarkers of lung AC.
format Online
Article
Text
id pubmed-3377915
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33779152012-06-20 A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma Toh, Swee Heng Prathipati, Philip Motakis, Efthimios Kwoh, Chee Keong Yenamandra, Surya Pavan Kuznetsov, Vladimir A BMC Genomics Proceedings BACKGROUND: Lung cancer is the leading cause of cancer deaths in the world. The most common type of lung cancer is lung adenocarcinoma (AC). The genetic mechanisms of the early stages and lung AC progression steps are poorly understood. There is currently no clinically applicable gene test for the early diagnosis and AC aggressiveness. Among the major reasons for the lack of reliable diagnostic biomarkers are the extraordinary heterogeneity of the cancer cells, complex and poorly understudied interactions of the AC cells with adjacent tissue and immune system, gene variation across patient cohorts, measurement variability, small sample sizes and sub-optimal analytical methods. We suggest that gene expression profiling of the primary tumours and adjacent tissues (PT-AT) handled with a rational statistical and bioinformatics strategy of biomarker prediction and validation could provide significant progress in the identification of clinical biomarkers of AC. To minimise sample-to-sample variability, repeated multivariate measurements in the same object (organ or tissue, e.g. PT-AT in lung) across patients should be designed, but prediction and validation on the genome scale with small sample size is a great methodical challenge. RESULTS: To analyse PT-AT relationships efficiently in the statistical modelling, we propose an Extreme Class Discrimination (ECD) feature selection method that identifies a sub-set of the most discriminative variables (e.g. expressed genes). Our method consists of a paired Cross-normalization (CN) step followed by a modified sign Wilcoxon test with multivariate adjustment carried out for each variable. Using an Affymetrix U133A microarray paired dataset of 27 AC patients, we reviewed the global reprogramming of the transcriptome in human lung AC tissue versus normal lung tissue, which is associated with about 2,300 genes discriminating the tissues with 100% accuracy. Cluster analysis applied to these genes resulted in four distinct gene groups which we classified as associated with (i) up-regulated genes in the mitotic cell cycle lung AC, (ii) silenced/suppressed gene specific for normal lung tissue, (iii) cell communication and cell motility and (iv) the immune system features. The genes related to mutagenesis, specific lung cancers, early stage of AC development, tumour aggressiveness and metabolic pathway alterations and adaptations of cancer cells are strongly enriched in the AC PT-AT discriminative gene set. Two AC diagnostic biomarkers SPP1 and CENPA were successfully validated on RT-RCR tissue array. ECD method was systematically compared to several alternative methods and proved to be of better performance and as well as it was validated by comparison of the predicted gene set with literature meta-signature. CONCLUSIONS: We developed a method that identifies and selects highly discriminative variables from high dimensional data spaces of potential biomarkers based on a statistical analysis of paired samples when the number of samples is small. This method provides superior selection in comparison to conventional methods and can be widely used in different applications. Our method revealed at least 23 hundreds patho-biologically essential genes associated with the global transcriptional reprogramming of human lung epithelium cells and lung AC aggressiveness. This gene set includes many previously published AC biomarkers reflecting inherent disease complexity and specifies the mechanisms of carcinogenesis in the lung AC. SPP1, CENPA and many other PT-AT discriminative genes could be considered as the prospective diagnostic and prognostic biomarkers of lung AC. BioMed Central 2011-11-30 /pmc/articles/PMC3377915/ /pubmed/22369099 http://dx.doi.org/10.1186/1471-2164-12-S3-S24 Text en Copyright ©2011 Toh et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Toh, Swee Heng
Prathipati, Philip
Motakis, Efthimios
Kwoh, Chee Keong
Yenamandra, Surya Pavan
Kuznetsov, Vladimir A
A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma
title A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma
title_full A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma
title_fullStr A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma
title_full_unstemmed A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma
title_short A robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma
title_sort robust tool for discriminative analysis and feature selection in paired samples impacts the identification of the genes essential for reprogramming lung tissue to adenocarcinoma
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3377915/
https://www.ncbi.nlm.nih.gov/pubmed/22369099
http://dx.doi.org/10.1186/1471-2164-12-S3-S24
work_keys_str_mv AT tohsweeheng arobusttoolfordiscriminativeanalysisandfeatureselectioninpairedsamplesimpactstheidentificationofthegenesessentialforreprogramminglungtissuetoadenocarcinoma
AT prathipatiphilip arobusttoolfordiscriminativeanalysisandfeatureselectioninpairedsamplesimpactstheidentificationofthegenesessentialforreprogramminglungtissuetoadenocarcinoma
AT motakisefthimios arobusttoolfordiscriminativeanalysisandfeatureselectioninpairedsamplesimpactstheidentificationofthegenesessentialforreprogramminglungtissuetoadenocarcinoma
AT kwohcheekeong arobusttoolfordiscriminativeanalysisandfeatureselectioninpairedsamplesimpactstheidentificationofthegenesessentialforreprogramminglungtissuetoadenocarcinoma
AT yenamandrasuryapavan arobusttoolfordiscriminativeanalysisandfeatureselectioninpairedsamplesimpactstheidentificationofthegenesessentialforreprogramminglungtissuetoadenocarcinoma
AT kuznetsovvladimira arobusttoolfordiscriminativeanalysisandfeatureselectioninpairedsamplesimpactstheidentificationofthegenesessentialforreprogramminglungtissuetoadenocarcinoma
AT tohsweeheng robusttoolfordiscriminativeanalysisandfeatureselectioninpairedsamplesimpactstheidentificationofthegenesessentialforreprogramminglungtissuetoadenocarcinoma
AT prathipatiphilip robusttoolfordiscriminativeanalysisandfeatureselectioninpairedsamplesimpactstheidentificationofthegenesessentialforreprogramminglungtissuetoadenocarcinoma
AT motakisefthimios robusttoolfordiscriminativeanalysisandfeatureselectioninpairedsamplesimpactstheidentificationofthegenesessentialforreprogramminglungtissuetoadenocarcinoma
AT kwohcheekeong robusttoolfordiscriminativeanalysisandfeatureselectioninpairedsamplesimpactstheidentificationofthegenesessentialforreprogramminglungtissuetoadenocarcinoma
AT yenamandrasuryapavan robusttoolfordiscriminativeanalysisandfeatureselectioninpairedsamplesimpactstheidentificationofthegenesessentialforreprogramminglungtissuetoadenocarcinoma
AT kuznetsovvladimira robusttoolfordiscriminativeanalysisandfeatureselectioninpairedsamplesimpactstheidentificationofthegenesessentialforreprogramminglungtissuetoadenocarcinoma