Cargando…

Improving Model Performance on the Stratification of Breast Cancer Patients by Integrating Multiscale Genomic Features

In clinical cancer research, it is a hot topic on how to accurately stratify patients based on genomic data. With the development of next-generation sequencing technology, more and more types of genomic features, such as mRNA expression level, can be used to distinguish cancer patients. Previous stu...

Descripción completa

Detalles Bibliográficos
Autores principales: Hao, Yingyi, He, Li, Zhou, Yifan, Zhao, Yiru, Li, Menglong, Jing, Runyu, Wen, Zhining
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7471833/
https://www.ncbi.nlm.nih.gov/pubmed/32908867
http://dx.doi.org/10.1155/2020/1475368
_version_ 1783578852077862912
author Hao, Yingyi
He, Li
Zhou, Yifan
Zhao, Yiru
Li, Menglong
Jing, Runyu
Wen, Zhining
author_facet Hao, Yingyi
He, Li
Zhou, Yifan
Zhao, Yiru
Li, Menglong
Jing, Runyu
Wen, Zhining
author_sort Hao, Yingyi
collection PubMed
description In clinical cancer research, it is a hot topic on how to accurately stratify patients based on genomic data. With the development of next-generation sequencing technology, more and more types of genomic features, such as mRNA expression level, can be used to distinguish cancer patients. Previous studies commonly stratified patients by using a single type of genomic features, which can only reflect one aspect of the cancer. In fact, multiscale genomic features will provide more information and may be helpful for clinical prediction. In addition, most of the conventional machine learning algorithms use a handcrafted gene set as features to construct models, which is generally selected by a statistical method with an arbitrary cut-off, e.g., p value < 0.05. The genes in the gene set are not necessarily related to the cancer and will make the model unreliable. Therefore, in our study, we thoroughly investigated the performance of different machine learning methods on stratifying breast cancer patients with a single type of genomic features. Then, we proposed a strategy, which can take into account the degree of correlation between genes and cancer patients, to identify the features from mRNAs and microRNAs, and evaluated the performance of the models with the new combined features of the multiscale genomic features. The results showed that, compared with the models constructed with a single type of features, the models with the multiscale genomic features generated by our proposed method achieved better performance on stratifying the ER status of breast cancer patients. Moreover, we found that the identified multiscale genomic features were closely related to the cancer by gene set enrichment analysis, indicating that our proposed strategy can well reflect the biological relevance of the genes to breast cancer. In conclusion, modelling with multiscale genomic features closely related to the cancer not only can guarantee the prediction performance of the models but also can effectively provide candidate genes for interpreting the mechanisms of cancer.
format Online
Article
Text
id pubmed-7471833
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-74718332020-09-08 Improving Model Performance on the Stratification of Breast Cancer Patients by Integrating Multiscale Genomic Features Hao, Yingyi He, Li Zhou, Yifan Zhao, Yiru Li, Menglong Jing, Runyu Wen, Zhining Biomed Res Int Research Article In clinical cancer research, it is a hot topic on how to accurately stratify patients based on genomic data. With the development of next-generation sequencing technology, more and more types of genomic features, such as mRNA expression level, can be used to distinguish cancer patients. Previous studies commonly stratified patients by using a single type of genomic features, which can only reflect one aspect of the cancer. In fact, multiscale genomic features will provide more information and may be helpful for clinical prediction. In addition, most of the conventional machine learning algorithms use a handcrafted gene set as features to construct models, which is generally selected by a statistical method with an arbitrary cut-off, e.g., p value < 0.05. The genes in the gene set are not necessarily related to the cancer and will make the model unreliable. Therefore, in our study, we thoroughly investigated the performance of different machine learning methods on stratifying breast cancer patients with a single type of genomic features. Then, we proposed a strategy, which can take into account the degree of correlation between genes and cancer patients, to identify the features from mRNAs and microRNAs, and evaluated the performance of the models with the new combined features of the multiscale genomic features. The results showed that, compared with the models constructed with a single type of features, the models with the multiscale genomic features generated by our proposed method achieved better performance on stratifying the ER status of breast cancer patients. Moreover, we found that the identified multiscale genomic features were closely related to the cancer by gene set enrichment analysis, indicating that our proposed strategy can well reflect the biological relevance of the genes to breast cancer. In conclusion, modelling with multiscale genomic features closely related to the cancer not only can guarantee the prediction performance of the models but also can effectively provide candidate genes for interpreting the mechanisms of cancer. Hindawi 2020-08-25 /pmc/articles/PMC7471833/ /pubmed/32908867 http://dx.doi.org/10.1155/2020/1475368 Text en Copyright © 2020 Yingyi Hao et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hao, Yingyi
He, Li
Zhou, Yifan
Zhao, Yiru
Li, Menglong
Jing, Runyu
Wen, Zhining
Improving Model Performance on the Stratification of Breast Cancer Patients by Integrating Multiscale Genomic Features
title Improving Model Performance on the Stratification of Breast Cancer Patients by Integrating Multiscale Genomic Features
title_full Improving Model Performance on the Stratification of Breast Cancer Patients by Integrating Multiscale Genomic Features
title_fullStr Improving Model Performance on the Stratification of Breast Cancer Patients by Integrating Multiscale Genomic Features
title_full_unstemmed Improving Model Performance on the Stratification of Breast Cancer Patients by Integrating Multiscale Genomic Features
title_short Improving Model Performance on the Stratification of Breast Cancer Patients by Integrating Multiscale Genomic Features
title_sort improving model performance on the stratification of breast cancer patients by integrating multiscale genomic features
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7471833/
https://www.ncbi.nlm.nih.gov/pubmed/32908867
http://dx.doi.org/10.1155/2020/1475368
work_keys_str_mv AT haoyingyi improvingmodelperformanceonthestratificationofbreastcancerpatientsbyintegratingmultiscalegenomicfeatures
AT heli improvingmodelperformanceonthestratificationofbreastcancerpatientsbyintegratingmultiscalegenomicfeatures
AT zhouyifan improvingmodelperformanceonthestratificationofbreastcancerpatientsbyintegratingmultiscalegenomicfeatures
AT zhaoyiru improvingmodelperformanceonthestratificationofbreastcancerpatientsbyintegratingmultiscalegenomicfeatures
AT limenglong improvingmodelperformanceonthestratificationofbreastcancerpatientsbyintegratingmultiscalegenomicfeatures
AT jingrunyu improvingmodelperformanceonthestratificationofbreastcancerpatientsbyintegratingmultiscalegenomicfeatures
AT wenzhining improvingmodelperformanceonthestratificationofbreastcancerpatientsbyintegratingmultiscalegenomicfeatures