Cargando…

Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin

Carcinoma of unknown primary (CUP) is a type of metastatic cancer, the primary tumor site of which cannot be identified. CUP occupies approximately 5% of cancer incidences in the United States with usually unfavorable prognosis, making it a big threat to public health. Traditional methods to identif...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Haiyan, Qiu, Chun, Wang, Bo, Bing, Pingping, Tian, Geng, Zhang, Xueliang, Ma, Jun, He, Bingsheng, Yang, Jialiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8126648/
https://www.ncbi.nlm.nih.gov/pubmed/34012960
http://dx.doi.org/10.3389/fcell.2021.619330
_version_ 1783693803937333248
author Liu, Haiyan
Qiu, Chun
Wang, Bo
Bing, Pingping
Tian, Geng
Zhang, Xueliang
Ma, Jun
He, Bingsheng
Yang, Jialiang
author_facet Liu, Haiyan
Qiu, Chun
Wang, Bo
Bing, Pingping
Tian, Geng
Zhang, Xueliang
Ma, Jun
He, Bingsheng
Yang, Jialiang
author_sort Liu, Haiyan
collection PubMed
description Carcinoma of unknown primary (CUP) is a type of metastatic cancer, the primary tumor site of which cannot be identified. CUP occupies approximately 5% of cancer incidences in the United States with usually unfavorable prognosis, making it a big threat to public health. Traditional methods to identify the tissue-of-origin (TOO) of CUP like immunohistochemistry can only deal with around 20% CUP patients. In recent years, more and more studies suggest that it is promising to solve the problem by integrating machine learning techniques with big biomedical data involving multiple types of biomarkers including epigenetic, genetic, and gene expression profiles, such as DNA methylation. Different biomarkers play different roles in cancer research; for example, genomic mutations in a patient’s tumor could lead to specific anticancer drugs for treatment; DNA methylation and copy number variation could reveal tumor tissue of origin and molecular classification. However, there is no systematic comparison on which biomarker is better at identifying the cancer type and site of origin. In addition, it might also be possible to further improve the inference accuracy by integrating multiple types of biomarkers. In this study, we used primary tumor data rather than metastatic tumor data. Although the use of primary tumors may lead to some biases in our classification model, their tumor-of-origins are known. In addition, previous studies have suggested that the CUP prediction model built from primary tumors could efficiently predict TOO of metastatic cancers (Lal et al., 2013; Brachtel et al., 2016). We systematically compared the performances of three types of biomarkers including DNA methylation, gene expression profile, and somatic mutation as well as their combinations in inferring the TOO of CUP patients. First, we downloaded the gene expression profile, somatic mutation and DNA methylation data of 7,224 tumor samples across 21 common cancer types from the cancer genome atlas (TCGA) and generated seven different feature matrices through various combinations. Second, we performed feature selection by the Pearson correlation method. The selected features for each matrix were used to build up an XGBoost multi-label classification model to infer cancer TOO, an algorithm proven to be effective in a few previous studies. The performance of each biomarker and combination was compared by the 10-fold cross-validation process. Our results showed that the TOO tracing accuracy using gene expression profile was the highest, followed by DNA methylation, while somatic mutation performed the worst. Meanwhile, we found that simply combining multiple biomarkers does not have much effect in improving prediction accuracy.
format Online
Article
Text
id pubmed-8126648
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-81266482021-05-18 Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin Liu, Haiyan Qiu, Chun Wang, Bo Bing, Pingping Tian, Geng Zhang, Xueliang Ma, Jun He, Bingsheng Yang, Jialiang Front Cell Dev Biol Cell and Developmental Biology Carcinoma of unknown primary (CUP) is a type of metastatic cancer, the primary tumor site of which cannot be identified. CUP occupies approximately 5% of cancer incidences in the United States with usually unfavorable prognosis, making it a big threat to public health. Traditional methods to identify the tissue-of-origin (TOO) of CUP like immunohistochemistry can only deal with around 20% CUP patients. In recent years, more and more studies suggest that it is promising to solve the problem by integrating machine learning techniques with big biomedical data involving multiple types of biomarkers including epigenetic, genetic, and gene expression profiles, such as DNA methylation. Different biomarkers play different roles in cancer research; for example, genomic mutations in a patient’s tumor could lead to specific anticancer drugs for treatment; DNA methylation and copy number variation could reveal tumor tissue of origin and molecular classification. However, there is no systematic comparison on which biomarker is better at identifying the cancer type and site of origin. In addition, it might also be possible to further improve the inference accuracy by integrating multiple types of biomarkers. In this study, we used primary tumor data rather than metastatic tumor data. Although the use of primary tumors may lead to some biases in our classification model, their tumor-of-origins are known. In addition, previous studies have suggested that the CUP prediction model built from primary tumors could efficiently predict TOO of metastatic cancers (Lal et al., 2013; Brachtel et al., 2016). We systematically compared the performances of three types of biomarkers including DNA methylation, gene expression profile, and somatic mutation as well as their combinations in inferring the TOO of CUP patients. First, we downloaded the gene expression profile, somatic mutation and DNA methylation data of 7,224 tumor samples across 21 common cancer types from the cancer genome atlas (TCGA) and generated seven different feature matrices through various combinations. Second, we performed feature selection by the Pearson correlation method. The selected features for each matrix were used to build up an XGBoost multi-label classification model to infer cancer TOO, an algorithm proven to be effective in a few previous studies. The performance of each biomarker and combination was compared by the 10-fold cross-validation process. Our results showed that the TOO tracing accuracy using gene expression profile was the highest, followed by DNA methylation, while somatic mutation performed the worst. Meanwhile, we found that simply combining multiple biomarkers does not have much effect in improving prediction accuracy. Frontiers Media S.A. 2021-05-03 /pmc/articles/PMC8126648/ /pubmed/34012960 http://dx.doi.org/10.3389/fcell.2021.619330 Text en Copyright © 2021 Liu, Qiu, Wang, Bing, Tian, Zhang, Ma, He and Yang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Cell and Developmental Biology
Liu, Haiyan
Qiu, Chun
Wang, Bo
Bing, Pingping
Tian, Geng
Zhang, Xueliang
Ma, Jun
He, Bingsheng
Yang, Jialiang
Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin
title Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin
title_full Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin
title_fullStr Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin
title_full_unstemmed Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin
title_short Evaluating DNA Methylation, Gene Expression, Somatic Mutation, and Their Combinations in Inferring Tumor Tissue-of-Origin
title_sort evaluating dna methylation, gene expression, somatic mutation, and their combinations in inferring tumor tissue-of-origin
topic Cell and Developmental Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8126648/
https://www.ncbi.nlm.nih.gov/pubmed/34012960
http://dx.doi.org/10.3389/fcell.2021.619330
work_keys_str_mv AT liuhaiyan evaluatingdnamethylationgeneexpressionsomaticmutationandtheircombinationsininferringtumortissueoforigin
AT qiuchun evaluatingdnamethylationgeneexpressionsomaticmutationandtheircombinationsininferringtumortissueoforigin
AT wangbo evaluatingdnamethylationgeneexpressionsomaticmutationandtheircombinationsininferringtumortissueoforigin
AT bingpingping evaluatingdnamethylationgeneexpressionsomaticmutationandtheircombinationsininferringtumortissueoforigin
AT tiangeng evaluatingdnamethylationgeneexpressionsomaticmutationandtheircombinationsininferringtumortissueoforigin
AT zhangxueliang evaluatingdnamethylationgeneexpressionsomaticmutationandtheircombinationsininferringtumortissueoforigin
AT majun evaluatingdnamethylationgeneexpressionsomaticmutationandtheircombinationsininferringtumortissueoforigin
AT hebingsheng evaluatingdnamethylationgeneexpressionsomaticmutationandtheircombinationsininferringtumortissueoforigin
AT yangjialiang evaluatingdnamethylationgeneexpressionsomaticmutationandtheircombinationsininferringtumortissueoforigin