Cargando…
Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach
BACKGROUND: Mutation processes leave different signatures in genes. For single-base substitutions, previous studies have suggested that mutation signatures are not only reflected in mutation bases but also in neighboring bases. However, because of the lack of a method to identify features of long se...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8686331/ https://www.ncbi.nlm.nih.gov/pubmed/34930241 http://dx.doi.org/10.1186/s12920-021-01144-1 |
_version_ | 1784617996182880256 |
---|---|
author | Ji, Hongchen Li, Junjie Zhang, Qiong Yang, Jingyue Duan, Juanli Wang, Xiaowen Ma, Ben Zhang, Zhuochao Pan, Wei Zhang, Hongmei |
author_facet | Ji, Hongchen Li, Junjie Zhang, Qiong Yang, Jingyue Duan, Juanli Wang, Xiaowen Ma, Ben Zhang, Zhuochao Pan, Wei Zhang, Hongmei |
author_sort | Ji, Hongchen |
collection | PubMed |
description | BACKGROUND: Mutation processes leave different signatures in genes. For single-base substitutions, previous studies have suggested that mutation signatures are not only reflected in mutation bases but also in neighboring bases. However, because of the lack of a method to identify features of long sequences next to mutation bases, the understanding of how flanking sequences influence mutation signatures is limited. METHODS: We constructed a long short-term memory-self organizing map (LSTM-SOM) unsupervised neural network. By extracting mutated sequence features via LSTM and clustering similar features with the SOM, single-base substitutions in The Cancer Genome Atlas database were clustered according to both their mutation site and flanking sequences. The relationship between mutation sequence signatures and clinical features was then analyzed. Finally, we clustered patients into different classes according to the composition of the mutation sequence signatures by the K-means method and then studied the differences in clinical features and survival between classes. RESULTS: Ten classes of mutant sequence signatures (mutation blots, MBs) were obtained from 2,141,527 single-base substitutions via LSTM-SOM machine learning approach. Different features in mutation bases and flanking sequences were revealed among MBs. MBs reflect both the site and pathological features of cancers. MBs were related to clinical features, including age, sex, and cancer stage. The class of an MB in a given gene was associated with survival. Finally, patients were clustered into 7 classes according to the MB composition. Significant differences in survival and clinical features were observed among different patient classes. CONCLUSIONS: We provided a method for analyzing the characteristics of mutant sequences. Result of this study showed that flanking sequences, together with mutation bases, shape the signatures of SBSs. MBs were shown related to clinical features and survival of cancer patients. Composition of MBs is a feasible predictive factor of clinical prognosis. Further study of the mechanism of MBs related to cancer characteristics is suggested. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-021-01144-1. |
format | Online Article Text |
id | pubmed-8686331 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-86863312021-12-20 Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach Ji, Hongchen Li, Junjie Zhang, Qiong Yang, Jingyue Duan, Juanli Wang, Xiaowen Ma, Ben Zhang, Zhuochao Pan, Wei Zhang, Hongmei BMC Med Genomics Research Article BACKGROUND: Mutation processes leave different signatures in genes. For single-base substitutions, previous studies have suggested that mutation signatures are not only reflected in mutation bases but also in neighboring bases. However, because of the lack of a method to identify features of long sequences next to mutation bases, the understanding of how flanking sequences influence mutation signatures is limited. METHODS: We constructed a long short-term memory-self organizing map (LSTM-SOM) unsupervised neural network. By extracting mutated sequence features via LSTM and clustering similar features with the SOM, single-base substitutions in The Cancer Genome Atlas database were clustered according to both their mutation site and flanking sequences. The relationship between mutation sequence signatures and clinical features was then analyzed. Finally, we clustered patients into different classes according to the composition of the mutation sequence signatures by the K-means method and then studied the differences in clinical features and survival between classes. RESULTS: Ten classes of mutant sequence signatures (mutation blots, MBs) were obtained from 2,141,527 single-base substitutions via LSTM-SOM machine learning approach. Different features in mutation bases and flanking sequences were revealed among MBs. MBs reflect both the site and pathological features of cancers. MBs were related to clinical features, including age, sex, and cancer stage. The class of an MB in a given gene was associated with survival. Finally, patients were clustered into 7 classes according to the MB composition. Significant differences in survival and clinical features were observed among different patient classes. CONCLUSIONS: We provided a method for analyzing the characteristics of mutant sequences. Result of this study showed that flanking sequences, together with mutation bases, shape the signatures of SBSs. MBs were shown related to clinical features and survival of cancer patients. Composition of MBs is a feasible predictive factor of clinical prognosis. Further study of the mechanism of MBs related to cancer characteristics is suggested. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-021-01144-1. BioMed Central 2021-12-20 /pmc/articles/PMC8686331/ /pubmed/34930241 http://dx.doi.org/10.1186/s12920-021-01144-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Ji, Hongchen Li, Junjie Zhang, Qiong Yang, Jingyue Duan, Juanli Wang, Xiaowen Ma, Ben Zhang, Zhuochao Pan, Wei Zhang, Hongmei Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach |
title | Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach |
title_full | Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach |
title_fullStr | Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach |
title_full_unstemmed | Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach |
title_short | Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach |
title_sort | clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8686331/ https://www.ncbi.nlm.nih.gov/pubmed/34930241 http://dx.doi.org/10.1186/s12920-021-01144-1 |
work_keys_str_mv | AT jihongchen clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach AT lijunjie clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach AT zhangqiong clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach AT yangjingyue clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach AT duanjuanli clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach AT wangxiaowen clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach AT maben clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach AT zhangzhuochao clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach AT panwei clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach AT zhanghongmei clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach |