Cargando…

SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model

A large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affect drug–DNA interactions, but they can promote or inhibit the expression of the critical genes associated...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhang, Yikang, Chu, Xiaomin, Jiang, Yelu, Wu, Hongjie, Quan, Lijun
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9028922/ https://www.ncbi.nlm.nih.gov/pubmed/35456374 http://dx.doi.org/10.3390/genes13040568

_version_	1784691747763257344
author	Zhang, Yikang Chu, Xiaomin Jiang, Yelu Wu, Hongjie Quan, Lijun
author_facet	Zhang, Yikang Chu, Xiaomin Jiang, Yelu Wu, Hongjie Quan, Lijun
author_sort	Zhang, Yikang
collection	PubMed
description	A large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affect drug–DNA interactions, but they can promote or inhibit the expression of the critical genes associated with drug resistance by affecting the DNA binding capacity of TFs and transcriptional regulators. However, the biological experimental techniques for measuring it are expensive and time-consuming. In recent years, several kinds of computational methods have been proposed to identify accessible regions of the genome. Existing computational models mostly ignore the contextual information provided by the bases in gene sequences. To address these issues, we proposed a new solution called SemanticCAP. It introduces a gene language model that models the context of gene sequences and is thus able to provide an effective representation of a certain site in a gene sequence. Basically, we merged the features provided by the gene language model into our chromatin accessibility model. During the process, we designed methods called SFA and SFC to make feature fusion smoother. Compared to DeepSEA, gkm-SVM, and k-mer using public benchmarks, our model proved to have better performance, showing a 1.25% maximum improvement in auROC and a 2.41% maximum improvement in auPRC.
format	Online Article Text
id	pubmed-9028922
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-90289222022-04-23 SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model Zhang, Yikang Chu, Xiaomin Jiang, Yelu Wu, Hongjie Quan, Lijun Genes (Basel) Article A large number of inorganic and organic compounds are able to bind DNA and form complexes, among which drug-related molecules are important. Chromatin accessibility changes not only directly affect drug–DNA interactions, but they can promote or inhibit the expression of the critical genes associated with drug resistance by affecting the DNA binding capacity of TFs and transcriptional regulators. However, the biological experimental techniques for measuring it are expensive and time-consuming. In recent years, several kinds of computational methods have been proposed to identify accessible regions of the genome. Existing computational models mostly ignore the contextual information provided by the bases in gene sequences. To address these issues, we proposed a new solution called SemanticCAP. It introduces a gene language model that models the context of gene sequences and is thus able to provide an effective representation of a certain site in a gene sequence. Basically, we merged the features provided by the gene language model into our chromatin accessibility model. During the process, we designed methods called SFA and SFC to make feature fusion smoother. Compared to DeepSEA, gkm-SVM, and k-mer using public benchmarks, our model proved to have better performance, showing a 1.25% maximum improvement in auROC and a 2.41% maximum improvement in auPRC. MDPI 2022-03-23 /pmc/articles/PMC9028922/ /pubmed/35456374 http://dx.doi.org/10.3390/genes13040568 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Zhang, Yikang Chu, Xiaomin Jiang, Yelu Wu, Hongjie Quan, Lijun SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
title	SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
title_full	SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
title_fullStr	SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
title_full_unstemmed	SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
title_short	SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model
title_sort	semanticcap: chromatin accessibility prediction enhanced by features learning from a language model
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9028922/ https://www.ncbi.nlm.nih.gov/pubmed/35456374 http://dx.doi.org/10.3390/genes13040568
work_keys_str_mv	AT zhangyikang semanticcapchromatinaccessibilitypredictionenhancedbyfeatureslearningfromalanguagemodel AT chuxiaomin semanticcapchromatinaccessibilitypredictionenhancedbyfeatureslearningfromalanguagemodel AT jiangyelu semanticcapchromatinaccessibilitypredictionenhancedbyfeatureslearningfromalanguagemodel AT wuhongjie semanticcapchromatinaccessibilitypredictionenhancedbyfeatureslearningfromalanguagemodel AT quanlijun semanticcapchromatinaccessibilitypredictionenhancedbyfeatureslearningfromalanguagemodel

SemanticCAP: Chromatin Accessibility Prediction Enhanced by Features Learning from a Language Model

Ejemplares similares