Cargando…

Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition

[Image: see text] Transcription factors (TFs) play an important role in gene expression and regulation of 3D genome conformation. TFs have ability to bind to specific DNA fragments called enhancers and promoters. Some TFs bind to promoter DNA fragments which are near the transcription initiation sit...

Descripción completa

Detalles Bibliográficos
Autores principales: Nguyen, Quang H., Tran, Hoang V., Nguyen, Binh P., Do, Trang T. T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2022
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9475634/
https://www.ncbi.nlm.nih.gov/pubmed/36119976
http://dx.doi.org/10.1021/acsomega.2c03696
_version_ 1784789952777682944
author Nguyen, Quang H.
Tran, Hoang V.
Nguyen, Binh P.
Do, Trang T. T.
author_facet Nguyen, Quang H.
Tran, Hoang V.
Nguyen, Binh P.
Do, Trang T. T.
author_sort Nguyen, Quang H.
collection PubMed
description [Image: see text] Transcription factors (TFs) play an important role in gene expression and regulation of 3D genome conformation. TFs have ability to bind to specific DNA fragments called enhancers and promoters. Some TFs bind to promoter DNA fragments which are near the transcription initiation site and form complexes that allow polymerase enzymes to bind to initiate transcription. Previous studies showed that methylated DNAs had ability to inhibit and prevent TFs from binding to DNA fragments. However, recent studies have found that there were TFs that could bind to methylated DNA fragments. The identification of these TFs is an important steppingstone to a better understanding of cellular gene expression mechanisms. However, as experimental methods are often time-consuming and labor-intensive, developing computational methods is essential. In this study, we propose two machine learning methods for two problems: (1) identifying TFs and (2) identifying TFs that prefer binding to methylated DNA targets (TFPMs). For the TF identification problem, the proposed method uses the position-specific scoring matrix for data representation and a deep convolutional neural network for modeling. This method achieved 90.56% sensitivity, 83.96% specificity, and an area under the receiver operating characteristic curve (AUC) of 0.9596 on an independent test set. For the TFPM identification problem, we propose to use the reduced g-gap dipeptide composition for data representation and the support vector machine algorithm for modeling. This method achieved 82.61% sensitivity, 64.86% specificity, and an AUC of 0.8486 on another independent test set. These results are higher than those of other studies on the same problems.
format Online
Article
Text
id pubmed-9475634
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-94756342022-09-16 Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition Nguyen, Quang H. Tran, Hoang V. Nguyen, Binh P. Do, Trang T. T. ACS Omega [Image: see text] Transcription factors (TFs) play an important role in gene expression and regulation of 3D genome conformation. TFs have ability to bind to specific DNA fragments called enhancers and promoters. Some TFs bind to promoter DNA fragments which are near the transcription initiation site and form complexes that allow polymerase enzymes to bind to initiate transcription. Previous studies showed that methylated DNAs had ability to inhibit and prevent TFs from binding to DNA fragments. However, recent studies have found that there were TFs that could bind to methylated DNA fragments. The identification of these TFs is an important steppingstone to a better understanding of cellular gene expression mechanisms. However, as experimental methods are often time-consuming and labor-intensive, developing computational methods is essential. In this study, we propose two machine learning methods for two problems: (1) identifying TFs and (2) identifying TFs that prefer binding to methylated DNA targets (TFPMs). For the TF identification problem, the proposed method uses the position-specific scoring matrix for data representation and a deep convolutional neural network for modeling. This method achieved 90.56% sensitivity, 83.96% specificity, and an area under the receiver operating characteristic curve (AUC) of 0.9596 on an independent test set. For the TFPM identification problem, we propose to use the reduced g-gap dipeptide composition for data representation and the support vector machine algorithm for modeling. This method achieved 82.61% sensitivity, 64.86% specificity, and an AUC of 0.8486 on another independent test set. These results are higher than those of other studies on the same problems. American Chemical Society 2022-08-30 /pmc/articles/PMC9475634/ /pubmed/36119976 http://dx.doi.org/10.1021/acsomega.2c03696 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Nguyen, Quang H.
Tran, Hoang V.
Nguyen, Binh P.
Do, Trang T. T.
Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition
title Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition
title_full Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition
title_fullStr Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition
title_full_unstemmed Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition
title_short Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition
title_sort identifying transcription factors that prefer binding to methylated dna using reduced g-gap dipeptide composition
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9475634/
https://www.ncbi.nlm.nih.gov/pubmed/36119976
http://dx.doi.org/10.1021/acsomega.2c03696
work_keys_str_mv AT nguyenquangh identifyingtranscriptionfactorsthatpreferbindingtomethylateddnausingreducedggapdipeptidecomposition
AT tranhoangv identifyingtranscriptionfactorsthatpreferbindingtomethylateddnausingreducedggapdipeptidecomposition
AT nguyenbinhp identifyingtranscriptionfactorsthatpreferbindingtomethylateddnausingreducedggapdipeptidecomposition
AT dotrangtt identifyingtranscriptionfactorsthatpreferbindingtomethylateddnausingreducedggapdipeptidecomposition