Cargando…

Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach

BACKGROUND: Mutation processes leave different signatures in genes. For single-base substitutions, previous studies have suggested that mutation signatures are not only reflected in mutation bases but also in neighboring bases. However, because of the lack of a method to identify features of long se...

Descripción completa

Detalles Bibliográficos
Autores principales: Ji, Hongchen, Li, Junjie, Zhang, Qiong, Yang, Jingyue, Duan, Juanli, Wang, Xiaowen, Ma, Ben, Zhang, Zhuochao, Pan, Wei, Zhang, Hongmei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8686331/
https://www.ncbi.nlm.nih.gov/pubmed/34930241
http://dx.doi.org/10.1186/s12920-021-01144-1
_version_ 1784617996182880256
author Ji, Hongchen
Li, Junjie
Zhang, Qiong
Yang, Jingyue
Duan, Juanli
Wang, Xiaowen
Ma, Ben
Zhang, Zhuochao
Pan, Wei
Zhang, Hongmei
author_facet Ji, Hongchen
Li, Junjie
Zhang, Qiong
Yang, Jingyue
Duan, Juanli
Wang, Xiaowen
Ma, Ben
Zhang, Zhuochao
Pan, Wei
Zhang, Hongmei
author_sort Ji, Hongchen
collection PubMed
description BACKGROUND: Mutation processes leave different signatures in genes. For single-base substitutions, previous studies have suggested that mutation signatures are not only reflected in mutation bases but also in neighboring bases. However, because of the lack of a method to identify features of long sequences next to mutation bases, the understanding of how flanking sequences influence mutation signatures is limited. METHODS: We constructed a long short-term memory-self organizing map (LSTM-SOM) unsupervised neural network. By extracting mutated sequence features via LSTM and clustering similar features with the SOM, single-base substitutions in The Cancer Genome Atlas database were clustered according to both their mutation site and flanking sequences. The relationship between mutation sequence signatures and clinical features was then analyzed. Finally, we clustered patients into different classes according to the composition of the mutation sequence signatures by the K-means method and then studied the differences in clinical features and survival between classes. RESULTS: Ten classes of mutant sequence signatures (mutation blots, MBs) were obtained from 2,141,527 single-base substitutions via LSTM-SOM machine learning approach. Different features in mutation bases and flanking sequences were revealed among MBs. MBs reflect both the site and pathological features of cancers. MBs were related to clinical features, including age, sex, and cancer stage. The class of an MB in a given gene was associated with survival. Finally, patients were clustered into 7 classes according to the MB composition. Significant differences in survival and clinical features were observed among different patient classes. CONCLUSIONS: We provided a method for analyzing the characteristics of mutant sequences. Result of this study showed that flanking sequences, together with mutation bases, shape the signatures of SBSs. MBs were shown related to clinical features and survival of cancer patients. Composition of MBs is a feasible predictive factor of clinical prognosis. Further study of the mechanism of MBs related to cancer characteristics is suggested. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-021-01144-1.
format Online
Article
Text
id pubmed-8686331
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-86863312021-12-20 Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach Ji, Hongchen Li, Junjie Zhang, Qiong Yang, Jingyue Duan, Juanli Wang, Xiaowen Ma, Ben Zhang, Zhuochao Pan, Wei Zhang, Hongmei BMC Med Genomics Research Article BACKGROUND: Mutation processes leave different signatures in genes. For single-base substitutions, previous studies have suggested that mutation signatures are not only reflected in mutation bases but also in neighboring bases. However, because of the lack of a method to identify features of long sequences next to mutation bases, the understanding of how flanking sequences influence mutation signatures is limited. METHODS: We constructed a long short-term memory-self organizing map (LSTM-SOM) unsupervised neural network. By extracting mutated sequence features via LSTM and clustering similar features with the SOM, single-base substitutions in The Cancer Genome Atlas database were clustered according to both their mutation site and flanking sequences. The relationship between mutation sequence signatures and clinical features was then analyzed. Finally, we clustered patients into different classes according to the composition of the mutation sequence signatures by the K-means method and then studied the differences in clinical features and survival between classes. RESULTS: Ten classes of mutant sequence signatures (mutation blots, MBs) were obtained from 2,141,527 single-base substitutions via LSTM-SOM machine learning approach. Different features in mutation bases and flanking sequences were revealed among MBs. MBs reflect both the site and pathological features of cancers. MBs were related to clinical features, including age, sex, and cancer stage. The class of an MB in a given gene was associated with survival. Finally, patients were clustered into 7 classes according to the MB composition. Significant differences in survival and clinical features were observed among different patient classes. CONCLUSIONS: We provided a method for analyzing the characteristics of mutant sequences. Result of this study showed that flanking sequences, together with mutation bases, shape the signatures of SBSs. MBs were shown related to clinical features and survival of cancer patients. Composition of MBs is a feasible predictive factor of clinical prognosis. Further study of the mechanism of MBs related to cancer characteristics is suggested. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-021-01144-1. BioMed Central 2021-12-20 /pmc/articles/PMC8686331/ /pubmed/34930241 http://dx.doi.org/10.1186/s12920-021-01144-1 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Ji, Hongchen
Li, Junjie
Zhang, Qiong
Yang, Jingyue
Duan, Juanli
Wang, Xiaowen
Ma, Ben
Zhang, Zhuochao
Pan, Wei
Zhang, Hongmei
Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach
title Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach
title_full Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach
title_fullStr Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach
title_full_unstemmed Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach
title_short Clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach
title_sort clinical feature-related single-base substitution sequence signatures identified with an unsupervised machine learning approach
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8686331/
https://www.ncbi.nlm.nih.gov/pubmed/34930241
http://dx.doi.org/10.1186/s12920-021-01144-1
work_keys_str_mv AT jihongchen clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach
AT lijunjie clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach
AT zhangqiong clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach
AT yangjingyue clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach
AT duanjuanli clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach
AT wangxiaowen clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach
AT maben clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach
AT zhangzhuochao clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach
AT panwei clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach
AT zhanghongmei clinicalfeaturerelatedsinglebasesubstitutionsequencesignaturesidentifiedwithanunsupervisedmachinelearningapproach