Cargando…

Phenotype prediction based on genome-wide DNA methylation data

BACKGROUND: DNA methylation (DNAm) has important regulatory roles in many biological processes and diseases. It is the only epigenetic mark with a clear mechanism of mitotic inheritance and the only one easily available on a genome scale. Aberrant cytosine-phosphate-guanine (CpG) methylation has bee...

Descripción completa

Detalles Bibliográficos
Autor principal:	Wilhelm, Thomas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4073816/ https://www.ncbi.nlm.nih.gov/pubmed/24934728 http://dx.doi.org/10.1186/1471-2105-15-193

_version_	1782323147207344128
author	Wilhelm, Thomas
author_facet	Wilhelm, Thomas
author_sort	Wilhelm, Thomas
collection	PubMed
description	BACKGROUND: DNA methylation (DNAm) has important regulatory roles in many biological processes and diseases. It is the only epigenetic mark with a clear mechanism of mitotic inheritance and the only one easily available on a genome scale. Aberrant cytosine-phosphate-guanine (CpG) methylation has been discussed in the context of disease aetiology, especially cancer. CpG hypermethylation of promoter regions is often associated with silencing of tumour suppressor genes and hypomethylation with activation of oncogenes. Supervised principal component analysis (SPCA) is a popular machine learning method. However, in a recent application to phenotype prediction from DNAm data SPCA was inferior to the specific method EVORA. RESULTS: We present Model-Selection-SPCA (MS-SPCA), an enhanced version of SPCA. MS-SPCA applies several models that perform well in the training data to the test data and selects the very best models for final prediction based on parameters of the test data. We have applied MS-SPCA for phenotype prediction from genome-wide DNAm data. CpGs used for prediction are selected based on the quantification of three features of their methylation (average methylation difference, methylation variation difference and methylation-age-correlation). We analysed four independent case–control datasets that correspond to different stages of cervical cancer: (i) cases currently cytologically normal, but will later develop neoplastic transformations, (ii, iii) cases showing neoplastic transformations and (iv) cases with confirmed cancer. The first dataset was split into several smaller case–control datasets (samples either Human Papilloma Virus (HPV) positive or negative). We demonstrate that cytology normal HPV+ and HPV- samples contain DNAm patterns which are associated with later neoplastic transformations. We present evidence that DNAm patterns exist in cytology normal HPV- samples that (i) predispose to neoplastic transformations after HPV infection and (ii) predispose to HPV infection itself. MS-SPCA performs significantly better than EVORA. CONCLUSIONS: MS-SPCA can be applied to many classification problems. Additional improvements could include usage of more than one principal component (PC), with automatic selection of the optimal number of PCs. We expect that MS-SPCA will be useful for analysing recent larger DNAm data to predict future neoplastic transformations.
format	Online Article Text
id	pubmed-4073816
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-40738162014-07-01 Phenotype prediction based on genome-wide DNA methylation data Wilhelm, Thomas BMC Bioinformatics Methodology Article BACKGROUND: DNA methylation (DNAm) has important regulatory roles in many biological processes and diseases. It is the only epigenetic mark with a clear mechanism of mitotic inheritance and the only one easily available on a genome scale. Aberrant cytosine-phosphate-guanine (CpG) methylation has been discussed in the context of disease aetiology, especially cancer. CpG hypermethylation of promoter regions is often associated with silencing of tumour suppressor genes and hypomethylation with activation of oncogenes. Supervised principal component analysis (SPCA) is a popular machine learning method. However, in a recent application to phenotype prediction from DNAm data SPCA was inferior to the specific method EVORA. RESULTS: We present Model-Selection-SPCA (MS-SPCA), an enhanced version of SPCA. MS-SPCA applies several models that perform well in the training data to the test data and selects the very best models for final prediction based on parameters of the test data. We have applied MS-SPCA for phenotype prediction from genome-wide DNAm data. CpGs used for prediction are selected based on the quantification of three features of their methylation (average methylation difference, methylation variation difference and methylation-age-correlation). We analysed four independent case–control datasets that correspond to different stages of cervical cancer: (i) cases currently cytologically normal, but will later develop neoplastic transformations, (ii, iii) cases showing neoplastic transformations and (iv) cases with confirmed cancer. The first dataset was split into several smaller case–control datasets (samples either Human Papilloma Virus (HPV) positive or negative). We demonstrate that cytology normal HPV+ and HPV- samples contain DNAm patterns which are associated with later neoplastic transformations. We present evidence that DNAm patterns exist in cytology normal HPV- samples that (i) predispose to neoplastic transformations after HPV infection and (ii) predispose to HPV infection itself. MS-SPCA performs significantly better than EVORA. CONCLUSIONS: MS-SPCA can be applied to many classification problems. Additional improvements could include usage of more than one principal component (PC), with automatic selection of the optimal number of PCs. We expect that MS-SPCA will be useful for analysing recent larger DNAm data to predict future neoplastic transformations. BioMed Central 2014-06-17 /pmc/articles/PMC4073816/ /pubmed/24934728 http://dx.doi.org/10.1186/1471-2105-15-193 Text en Copyright © 2014 Wilhelm; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Wilhelm, Thomas Phenotype prediction based on genome-wide DNA methylation data
title	Phenotype prediction based on genome-wide DNA methylation data
title_full	Phenotype prediction based on genome-wide DNA methylation data
title_fullStr	Phenotype prediction based on genome-wide DNA methylation data
title_full_unstemmed	Phenotype prediction based on genome-wide DNA methylation data
title_short	Phenotype prediction based on genome-wide DNA methylation data
title_sort	phenotype prediction based on genome-wide dna methylation data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4073816/ https://www.ncbi.nlm.nih.gov/pubmed/24934728 http://dx.doi.org/10.1186/1471-2105-15-193
work_keys_str_mv	AT wilhelmthomas phenotypepredictionbasedongenomewidednamethylationdata

Phenotype prediction based on genome-wide DNA methylation data

Ejemplares similares