Cargando…

DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method

Aberrant methylation is one of the early detectable events in many tumors, which is very promising for pan-cancer early-stage diagnosis and prognosis. To efficiently analyze the big pan-cancer methylation data and to overcome the co-methylation phenomenon, a MapReduce-based distributed and parallel-...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Qi-en, Zhu, Jun-xuan, Wang, Li-yan, Ding, En-ci, Song, Kai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9626520/
https://www.ncbi.nlm.nih.gov/pubmed/36338981
http://dx.doi.org/10.3389/fgene.2022.940214
_version_ 1784822751885787136
author He, Qi-en
Zhu, Jun-xuan
Wang, Li-yan
Ding, En-ci
Song, Kai
author_facet He, Qi-en
Zhu, Jun-xuan
Wang, Li-yan
Ding, En-ci
Song, Kai
author_sort He, Qi-en
collection PubMed
description Aberrant methylation is one of the early detectable events in many tumors, which is very promising for pan-cancer early-stage diagnosis and prognosis. To efficiently analyze the big pan-cancer methylation data and to overcome the co-methylation phenomenon, a MapReduce-based distributed and parallel-designed partial least squares approach was proposed. The large-scale high-dimensional methylation data were first decomposed into distributed blocks according to their genome locations. A distributed and parallel data processing strategy was proposed based on the framework of MapReduce, and then latent variables were further extracted for each distributed block. A set of pan-cancer signatures through a differential co-expression network followed by statistical tests was further identified based on their gene expression profiles. In total, 15 TCGA and 3 GEO datasets were used as the training and testing data, respectively, to verify our method. As a result, 22,000 potential methylation loci were selected as highly related loci with early-stage pan-cancer diagnosis. Of these, 67 methylation loci were further identified as pan-cancer signatures considering their gene expression as well. The survival analysis as well as pathway enrichment analysis on them shows that not only these loci may serve as potential drug targets, but also the proposed method may serve as a uniform framework for signature identification with big data.
format Online
Article
Text
id pubmed-9626520
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-96265202022-11-03 DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method He, Qi-en Zhu, Jun-xuan Wang, Li-yan Ding, En-ci Song, Kai Front Genet Genetics Aberrant methylation is one of the early detectable events in many tumors, which is very promising for pan-cancer early-stage diagnosis and prognosis. To efficiently analyze the big pan-cancer methylation data and to overcome the co-methylation phenomenon, a MapReduce-based distributed and parallel-designed partial least squares approach was proposed. The large-scale high-dimensional methylation data were first decomposed into distributed blocks according to their genome locations. A distributed and parallel data processing strategy was proposed based on the framework of MapReduce, and then latent variables were further extracted for each distributed block. A set of pan-cancer signatures through a differential co-expression network followed by statistical tests was further identified based on their gene expression profiles. In total, 15 TCGA and 3 GEO datasets were used as the training and testing data, respectively, to verify our method. As a result, 22,000 potential methylation loci were selected as highly related loci with early-stage pan-cancer diagnosis. Of these, 67 methylation loci were further identified as pan-cancer signatures considering their gene expression as well. The survival analysis as well as pathway enrichment analysis on them shows that not only these loci may serve as potential drug targets, but also the proposed method may serve as a uniform framework for signature identification with big data. Frontiers Media S.A. 2022-10-19 /pmc/articles/PMC9626520/ /pubmed/36338981 http://dx.doi.org/10.3389/fgene.2022.940214 Text en Copyright © 2022 He, Zhu, Wang, Ding and Song. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
He, Qi-en
Zhu, Jun-xuan
Wang, Li-yan
Ding, En-ci
Song, Kai
DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method
title DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method
title_full DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method
title_fullStr DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method
title_full_unstemmed DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method
title_short DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method
title_sort dna methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9626520/
https://www.ncbi.nlm.nih.gov/pubmed/36338981
http://dx.doi.org/10.3389/fgene.2022.940214
work_keys_str_mv AT heqien dnamethylationlociidentificationforpancancerearlystagediagnosisandprognosisusinganewdistributedparallelpartialleastsquaresmethod
AT zhujunxuan dnamethylationlociidentificationforpancancerearlystagediagnosisandprognosisusinganewdistributedparallelpartialleastsquaresmethod
AT wangliyan dnamethylationlociidentificationforpancancerearlystagediagnosisandprognosisusinganewdistributedparallelpartialleastsquaresmethod
AT dingenci dnamethylationlociidentificationforpancancerearlystagediagnosisandprognosisusinganewdistributedparallelpartialleastsquaresmethod
AT songkai dnamethylationlociidentificationforpancancerearlystagediagnosisandprognosisusinganewdistributedparallelpartialleastsquaresmethod