Cargando…

Identifying Significant Features in Cancer Methylation Data Using Gene Pathway Segmentation

In order to provide the most effective therapy for cancer, it is important to be able to diagnose whether a patient’s cancer will respond to a proposed treatment. Methylation profiling could contain information from which such predictions could be made. Currently, hypothesis testing is used to deter...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hira, Zena M., Gillies, Duncan F.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Libertas Academica 2016
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5030825/ https://www.ncbi.nlm.nih.gov/pubmed/27688706 http://dx.doi.org/10.4137/CIN.S39859

_version_	1782454744186355712
author	Hira, Zena M. Gillies, Duncan F.
author_facet	Hira, Zena M. Gillies, Duncan F.
author_sort	Hira, Zena M.
collection	PubMed
description	In order to provide the most effective therapy for cancer, it is important to be able to diagnose whether a patient’s cancer will respond to a proposed treatment. Methylation profiling could contain information from which such predictions could be made. Currently, hypothesis testing is used to determine whether possible biomarkers for cancer progression produce statistically significant results. However, this approach requires the identification of individual genes, or sets of genes, as candidate hypotheses, and with the increasing size of modern microarrays, this task is becoming progressively harder. Exhaustive testing of small sets of genes is computationally infeasible, and so hypothesis generation depends either on the use of established biological knowledge or on heuristic methods. As an alternative machine learning, methods can be used to identify groups of genes that are acting together within sets of cancer data and associate their behaviors with cancer progression. These methods have the advantage of being multivariate and unbiased but unfortunately also rapidly become computationally infeasible as the number of gene probes and datasets increases. To address this problem, we have investigated a way of utilizing prior knowledge to segment microarray datasets in such a way that machine learning can be used to identify candidate sets of genes for hypothesis testing. A methylation dataset is divided into subsets, where each subset contains only the probes that relate to a known gene pathway. Each of these pathway subsets is used independently for classification. The classification method is AdaBoost with decision trees as weak classifiers. Since each pathway subset contains a relatively small number of gene probes, it is possible to train and test its classification accuracy quickly and determine whether it has valuable diagnostic information. Finally, genes from successful pathway subsets can be combined to create a classifier of high accuracy.
format	Online Article Text
id	pubmed-5030825
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	Libertas Academica
record_format	MEDLINE/PubMed
spelling	pubmed-50308252016-09-29 Identifying Significant Features in Cancer Methylation Data Using Gene Pathway Segmentation Hira, Zena M. Gillies, Duncan F. Cancer Inform Methodology In order to provide the most effective therapy for cancer, it is important to be able to diagnose whether a patient’s cancer will respond to a proposed treatment. Methylation profiling could contain information from which such predictions could be made. Currently, hypothesis testing is used to determine whether possible biomarkers for cancer progression produce statistically significant results. However, this approach requires the identification of individual genes, or sets of genes, as candidate hypotheses, and with the increasing size of modern microarrays, this task is becoming progressively harder. Exhaustive testing of small sets of genes is computationally infeasible, and so hypothesis generation depends either on the use of established biological knowledge or on heuristic methods. As an alternative machine learning, methods can be used to identify groups of genes that are acting together within sets of cancer data and associate their behaviors with cancer progression. These methods have the advantage of being multivariate and unbiased but unfortunately also rapidly become computationally infeasible as the number of gene probes and datasets increases. To address this problem, we have investigated a way of utilizing prior knowledge to segment microarray datasets in such a way that machine learning can be used to identify candidate sets of genes for hypothesis testing. A methylation dataset is divided into subsets, where each subset contains only the probes that relate to a known gene pathway. Each of these pathway subsets is used independently for classification. The classification method is AdaBoost with decision trees as weak classifiers. Since each pathway subset contains a relatively small number of gene probes, it is possible to train and test its classification accuracy quickly and determine whether it has valuable diagnostic information. Finally, genes from successful pathway subsets can be combined to create a classifier of high accuracy. Libertas Academica 2016-09-20 /pmc/articles/PMC5030825/ /pubmed/27688706 http://dx.doi.org/10.4137/CIN.S39859 Text en © 2016 the author(s), publisher and licensee Libertas Academica Ltd. This is an open-access article distributed under the terms of the Creative Commons CC-BY-NC 3.0 License.
spellingShingle	Methodology Hira, Zena M. Gillies, Duncan F. Identifying Significant Features in Cancer Methylation Data Using Gene Pathway Segmentation
title	Identifying Significant Features in Cancer Methylation Data Using Gene Pathway Segmentation
title_full	Identifying Significant Features in Cancer Methylation Data Using Gene Pathway Segmentation
title_fullStr	Identifying Significant Features in Cancer Methylation Data Using Gene Pathway Segmentation
title_full_unstemmed	Identifying Significant Features in Cancer Methylation Data Using Gene Pathway Segmentation
title_short	Identifying Significant Features in Cancer Methylation Data Using Gene Pathway Segmentation
title_sort	identifying significant features in cancer methylation data using gene pathway segmentation
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5030825/ https://www.ncbi.nlm.nih.gov/pubmed/27688706 http://dx.doi.org/10.4137/CIN.S39859
work_keys_str_mv	AT hirazenam identifyingsignificantfeaturesincancermethylationdatausinggenepathwaysegmentation AT gilliesduncanf identifyingsignificantfeaturesincancermethylationdatausinggenepathwaysegmentation

Identifying Significant Features in Cancer Methylation Data Using Gene Pathway Segmentation

Ejemplares similares