Cargando…
Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition
[Image: see text] Transcription factors (TFs) play an important role in gene expression and regulation of 3D genome conformation. TFs have ability to bind to specific DNA fragments called enhancers and promoters. Some TFs bind to promoter DNA fragments which are near the transcription initiation sit...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical Society
2022
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9475634/ https://www.ncbi.nlm.nih.gov/pubmed/36119976 http://dx.doi.org/10.1021/acsomega.2c03696 |
_version_ | 1784789952777682944 |
---|---|
author | Nguyen, Quang H. Tran, Hoang V. Nguyen, Binh P. Do, Trang T. T. |
author_facet | Nguyen, Quang H. Tran, Hoang V. Nguyen, Binh P. Do, Trang T. T. |
author_sort | Nguyen, Quang H. |
collection | PubMed |
description | [Image: see text] Transcription factors (TFs) play an important role in gene expression and regulation of 3D genome conformation. TFs have ability to bind to specific DNA fragments called enhancers and promoters. Some TFs bind to promoter DNA fragments which are near the transcription initiation site and form complexes that allow polymerase enzymes to bind to initiate transcription. Previous studies showed that methylated DNAs had ability to inhibit and prevent TFs from binding to DNA fragments. However, recent studies have found that there were TFs that could bind to methylated DNA fragments. The identification of these TFs is an important steppingstone to a better understanding of cellular gene expression mechanisms. However, as experimental methods are often time-consuming and labor-intensive, developing computational methods is essential. In this study, we propose two machine learning methods for two problems: (1) identifying TFs and (2) identifying TFs that prefer binding to methylated DNA targets (TFPMs). For the TF identification problem, the proposed method uses the position-specific scoring matrix for data representation and a deep convolutional neural network for modeling. This method achieved 90.56% sensitivity, 83.96% specificity, and an area under the receiver operating characteristic curve (AUC) of 0.9596 on an independent test set. For the TFPM identification problem, we propose to use the reduced g-gap dipeptide composition for data representation and the support vector machine algorithm for modeling. This method achieved 82.61% sensitivity, 64.86% specificity, and an AUC of 0.8486 on another independent test set. These results are higher than those of other studies on the same problems. |
format | Online Article Text |
id | pubmed-9475634 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | American Chemical Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-94756342022-09-16 Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition Nguyen, Quang H. Tran, Hoang V. Nguyen, Binh P. Do, Trang T. T. ACS Omega [Image: see text] Transcription factors (TFs) play an important role in gene expression and regulation of 3D genome conformation. TFs have ability to bind to specific DNA fragments called enhancers and promoters. Some TFs bind to promoter DNA fragments which are near the transcription initiation site and form complexes that allow polymerase enzymes to bind to initiate transcription. Previous studies showed that methylated DNAs had ability to inhibit and prevent TFs from binding to DNA fragments. However, recent studies have found that there were TFs that could bind to methylated DNA fragments. The identification of these TFs is an important steppingstone to a better understanding of cellular gene expression mechanisms. However, as experimental methods are often time-consuming and labor-intensive, developing computational methods is essential. In this study, we propose two machine learning methods for two problems: (1) identifying TFs and (2) identifying TFs that prefer binding to methylated DNA targets (TFPMs). For the TF identification problem, the proposed method uses the position-specific scoring matrix for data representation and a deep convolutional neural network for modeling. This method achieved 90.56% sensitivity, 83.96% specificity, and an area under the receiver operating characteristic curve (AUC) of 0.9596 on an independent test set. For the TFPM identification problem, we propose to use the reduced g-gap dipeptide composition for data representation and the support vector machine algorithm for modeling. This method achieved 82.61% sensitivity, 64.86% specificity, and an AUC of 0.8486 on another independent test set. These results are higher than those of other studies on the same problems. American Chemical Society 2022-08-30 /pmc/articles/PMC9475634/ /pubmed/36119976 http://dx.doi.org/10.1021/acsomega.2c03696 Text en © 2022 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Nguyen, Quang H. Tran, Hoang V. Nguyen, Binh P. Do, Trang T. T. Identifying Transcription Factors That Prefer Binding to Methylated DNA Using Reduced G-Gap Dipeptide Composition |
title | Identifying Transcription
Factors That Prefer Binding
to Methylated DNA Using Reduced G-Gap Dipeptide
Composition |
title_full | Identifying Transcription
Factors That Prefer Binding
to Methylated DNA Using Reduced G-Gap Dipeptide
Composition |
title_fullStr | Identifying Transcription
Factors That Prefer Binding
to Methylated DNA Using Reduced G-Gap Dipeptide
Composition |
title_full_unstemmed | Identifying Transcription
Factors That Prefer Binding
to Methylated DNA Using Reduced G-Gap Dipeptide
Composition |
title_short | Identifying Transcription
Factors That Prefer Binding
to Methylated DNA Using Reduced G-Gap Dipeptide
Composition |
title_sort | identifying transcription
factors that prefer binding
to methylated dna using reduced g-gap dipeptide
composition |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9475634/ https://www.ncbi.nlm.nih.gov/pubmed/36119976 http://dx.doi.org/10.1021/acsomega.2c03696 |
work_keys_str_mv | AT nguyenquangh identifyingtranscriptionfactorsthatpreferbindingtomethylateddnausingreducedggapdipeptidecomposition AT tranhoangv identifyingtranscriptionfactorsthatpreferbindingtomethylateddnausingreducedggapdipeptidecomposition AT nguyenbinhp identifyingtranscriptionfactorsthatpreferbindingtomethylateddnausingreducedggapdipeptidecomposition AT dotrangtt identifyingtranscriptionfactorsthatpreferbindingtomethylateddnausingreducedggapdipeptidecomposition |