Cargando…
DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method
Aberrant methylation is one of the early detectable events in many tumors, which is very promising for pan-cancer early-stage diagnosis and prognosis. To efficiently analyze the big pan-cancer methylation data and to overcome the co-methylation phenomenon, a MapReduce-based distributed and parallel-...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9626520/ https://www.ncbi.nlm.nih.gov/pubmed/36338981 http://dx.doi.org/10.3389/fgene.2022.940214 |
_version_ | 1784822751885787136 |
---|---|
author | He, Qi-en Zhu, Jun-xuan Wang, Li-yan Ding, En-ci Song, Kai |
author_facet | He, Qi-en Zhu, Jun-xuan Wang, Li-yan Ding, En-ci Song, Kai |
author_sort | He, Qi-en |
collection | PubMed |
description | Aberrant methylation is one of the early detectable events in many tumors, which is very promising for pan-cancer early-stage diagnosis and prognosis. To efficiently analyze the big pan-cancer methylation data and to overcome the co-methylation phenomenon, a MapReduce-based distributed and parallel-designed partial least squares approach was proposed. The large-scale high-dimensional methylation data were first decomposed into distributed blocks according to their genome locations. A distributed and parallel data processing strategy was proposed based on the framework of MapReduce, and then latent variables were further extracted for each distributed block. A set of pan-cancer signatures through a differential co-expression network followed by statistical tests was further identified based on their gene expression profiles. In total, 15 TCGA and 3 GEO datasets were used as the training and testing data, respectively, to verify our method. As a result, 22,000 potential methylation loci were selected as highly related loci with early-stage pan-cancer diagnosis. Of these, 67 methylation loci were further identified as pan-cancer signatures considering their gene expression as well. The survival analysis as well as pathway enrichment analysis on them shows that not only these loci may serve as potential drug targets, but also the proposed method may serve as a uniform framework for signature identification with big data. |
format | Online Article Text |
id | pubmed-9626520 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-96265202022-11-03 DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method He, Qi-en Zhu, Jun-xuan Wang, Li-yan Ding, En-ci Song, Kai Front Genet Genetics Aberrant methylation is one of the early detectable events in many tumors, which is very promising for pan-cancer early-stage diagnosis and prognosis. To efficiently analyze the big pan-cancer methylation data and to overcome the co-methylation phenomenon, a MapReduce-based distributed and parallel-designed partial least squares approach was proposed. The large-scale high-dimensional methylation data were first decomposed into distributed blocks according to their genome locations. A distributed and parallel data processing strategy was proposed based on the framework of MapReduce, and then latent variables were further extracted for each distributed block. A set of pan-cancer signatures through a differential co-expression network followed by statistical tests was further identified based on their gene expression profiles. In total, 15 TCGA and 3 GEO datasets were used as the training and testing data, respectively, to verify our method. As a result, 22,000 potential methylation loci were selected as highly related loci with early-stage pan-cancer diagnosis. Of these, 67 methylation loci were further identified as pan-cancer signatures considering their gene expression as well. The survival analysis as well as pathway enrichment analysis on them shows that not only these loci may serve as potential drug targets, but also the proposed method may serve as a uniform framework for signature identification with big data. Frontiers Media S.A. 2022-10-19 /pmc/articles/PMC9626520/ /pubmed/36338981 http://dx.doi.org/10.3389/fgene.2022.940214 Text en Copyright © 2022 He, Zhu, Wang, Ding and Song. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics He, Qi-en Zhu, Jun-xuan Wang, Li-yan Ding, En-ci Song, Kai DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method |
title | DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method |
title_full | DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method |
title_fullStr | DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method |
title_full_unstemmed | DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method |
title_short | DNA methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method |
title_sort | dna methylation loci identification for pan-cancer early-stage diagnosis and prognosis using a new distributed parallel partial least squares method |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9626520/ https://www.ncbi.nlm.nih.gov/pubmed/36338981 http://dx.doi.org/10.3389/fgene.2022.940214 |
work_keys_str_mv | AT heqien dnamethylationlociidentificationforpancancerearlystagediagnosisandprognosisusinganewdistributedparallelpartialleastsquaresmethod AT zhujunxuan dnamethylationlociidentificationforpancancerearlystagediagnosisandprognosisusinganewdistributedparallelpartialleastsquaresmethod AT wangliyan dnamethylationlociidentificationforpancancerearlystagediagnosisandprognosisusinganewdistributedparallelpartialleastsquaresmethod AT dingenci dnamethylationlociidentificationforpancancerearlystagediagnosisandprognosisusinganewdistributedparallelpartialleastsquaresmethod AT songkai dnamethylationlociidentificationforpancancerearlystagediagnosisandprognosisusinganewdistributedparallelpartialleastsquaresmethod |