Cargando…

Identification of Methylation Signatures and Rules for Sarcoma Subtypes by Machine Learning Methods

Sarcoma, the second common type of solid tumor in children and adolescents, has a wide variety of subtypes that are often not properly diagnosed at an early stage, leading to late metastases and causing serious loss of life and property to patients and families. It exhibits a high degree of heteroge...

Descripción completa

Detalles Bibliográficos
Autores principales: Ren, Jingxin, Zhou, XianChao, Guo, Wei, Feng, KaiYan, Huang, Tao, Cai, Yu-Dong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9812612/
https://www.ncbi.nlm.nih.gov/pubmed/36619306
http://dx.doi.org/10.1155/2022/5297235
_version_ 1784863766642425856
author Ren, Jingxin
Zhou, XianChao
Guo, Wei
Feng, KaiYan
Huang, Tao
Cai, Yu-Dong
author_facet Ren, Jingxin
Zhou, XianChao
Guo, Wei
Feng, KaiYan
Huang, Tao
Cai, Yu-Dong
author_sort Ren, Jingxin
collection PubMed
description Sarcoma, the second common type of solid tumor in children and adolescents, has a wide variety of subtypes that are often not properly diagnosed at an early stage, leading to late metastases and causing serious loss of life and property to patients and families. It exhibits a high degree of heterogeneity at the cellular, molecular, and epigenetic levels, where DNA methylation has been proposed to play a role in the diagnosis of sarcoma subtypes. Thus, this study is aimed at finding potential biomarkers at the DNA methylation level to distinguish different sarcoma subtypes. A machine learning process was designed to analyse sarcoma samples, each of which was represented by lots of methylation sites. Irrelevant sites were removed using the Boruta method, and remaining sites related to the target variables were kept for further analyses. Afterward, three feature ranking methods (LASSO, LightGBM, and MCFS) were adopted to rank these features, and six classification models were constructed by combining incremental feature selection and two classification algorithms (decision tree and random forest). Among these models, the performance of RF model was higher than that of DT model under all three ranking conditions. The specific expression of genes obtained from the annotation of highly correlated methylation site features, such as PRKAR1B, INPP5A, and GLI3, was proven to be associated with sarcoma by publications. Moreover, the quantitative rules obtained by decision tree algorithm helped us to understand the essential differences between various sarcoma types and classify sarcoma subtypes, providing a new means of clinical identification and determining new therapeutic targets.
format Online
Article
Text
id pubmed-9812612
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-98126122023-01-05 Identification of Methylation Signatures and Rules for Sarcoma Subtypes by Machine Learning Methods Ren, Jingxin Zhou, XianChao Guo, Wei Feng, KaiYan Huang, Tao Cai, Yu-Dong Biomed Res Int Research Article Sarcoma, the second common type of solid tumor in children and adolescents, has a wide variety of subtypes that are often not properly diagnosed at an early stage, leading to late metastases and causing serious loss of life and property to patients and families. It exhibits a high degree of heterogeneity at the cellular, molecular, and epigenetic levels, where DNA methylation has been proposed to play a role in the diagnosis of sarcoma subtypes. Thus, this study is aimed at finding potential biomarkers at the DNA methylation level to distinguish different sarcoma subtypes. A machine learning process was designed to analyse sarcoma samples, each of which was represented by lots of methylation sites. Irrelevant sites were removed using the Boruta method, and remaining sites related to the target variables were kept for further analyses. Afterward, three feature ranking methods (LASSO, LightGBM, and MCFS) were adopted to rank these features, and six classification models were constructed by combining incremental feature selection and two classification algorithms (decision tree and random forest). Among these models, the performance of RF model was higher than that of DT model under all three ranking conditions. The specific expression of genes obtained from the annotation of highly correlated methylation site features, such as PRKAR1B, INPP5A, and GLI3, was proven to be associated with sarcoma by publications. Moreover, the quantitative rules obtained by decision tree algorithm helped us to understand the essential differences between various sarcoma types and classify sarcoma subtypes, providing a new means of clinical identification and determining new therapeutic targets. Hindawi 2022-12-28 /pmc/articles/PMC9812612/ /pubmed/36619306 http://dx.doi.org/10.1155/2022/5297235 Text en Copyright © 2022 Jingxin Ren et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Ren, Jingxin
Zhou, XianChao
Guo, Wei
Feng, KaiYan
Huang, Tao
Cai, Yu-Dong
Identification of Methylation Signatures and Rules for Sarcoma Subtypes by Machine Learning Methods
title Identification of Methylation Signatures and Rules for Sarcoma Subtypes by Machine Learning Methods
title_full Identification of Methylation Signatures and Rules for Sarcoma Subtypes by Machine Learning Methods
title_fullStr Identification of Methylation Signatures and Rules for Sarcoma Subtypes by Machine Learning Methods
title_full_unstemmed Identification of Methylation Signatures and Rules for Sarcoma Subtypes by Machine Learning Methods
title_short Identification of Methylation Signatures and Rules for Sarcoma Subtypes by Machine Learning Methods
title_sort identification of methylation signatures and rules for sarcoma subtypes by machine learning methods
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9812612/
https://www.ncbi.nlm.nih.gov/pubmed/36619306
http://dx.doi.org/10.1155/2022/5297235
work_keys_str_mv AT renjingxin identificationofmethylationsignaturesandrulesforsarcomasubtypesbymachinelearningmethods
AT zhouxianchao identificationofmethylationsignaturesandrulesforsarcomasubtypesbymachinelearningmethods
AT guowei identificationofmethylationsignaturesandrulesforsarcomasubtypesbymachinelearningmethods
AT fengkaiyan identificationofmethylationsignaturesandrulesforsarcomasubtypesbymachinelearningmethods
AT huangtao identificationofmethylationsignaturesandrulesforsarcomasubtypesbymachinelearningmethods
AT caiyudong identificationofmethylationsignaturesandrulesforsarcomasubtypesbymachinelearningmethods