Cargando…

Discovering protein–DNA binding sequence patterns using association rule mining

Protein–DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein–DNA bindings. However, it is considered that the...

Descripción completa

Detalles Bibliográficos
Autores principales: Leung, Kwong-Sak, Wong, Ka-Chun, Chan, Tak-Ming, Wong, Man-Hon, Lee, Kin-Hong, Lau, Chi-Kong, Tsui, Stephen K. W.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2965231/
https://www.ncbi.nlm.nih.gov/pubmed/20529874
http://dx.doi.org/10.1093/nar/gkq500
_version_ 1782189493526200320
author Leung, Kwong-Sak
Wong, Ka-Chun
Chan, Tak-Ming
Wong, Man-Hon
Lee, Kin-Hong
Lau, Chi-Kong
Tsui, Stephen K. W.
author_facet Leung, Kwong-Sak
Wong, Ka-Chun
Chan, Tak-Ming
Wong, Man-Hon
Lee, Kin-Hong
Lau, Chi-Kong
Tsui, Stephen K. W.
author_sort Leung, Kwong-Sak
collection PubMed
description Protein–DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein–DNA bindings. However, it is considered that there are no simple one-to-one rules between amino acids and nucleotides. Many methods impose complicated features beyond sequence patterns. Protein-DNA bindings are formed from associated amino acid and nucleotide sequence pairs, which determine many functional characteristics. Therefore, it is desirable to investigate associated sequence patterns between TFs and TFBSs. With increasing computational power, availability of massive experimental databases on DNA and proteins, and mature data mining techniques, we propose a framework to discover associated TF–TFBS binding sequence patterns in the most explicit and interpretable form from TRANSFAC. The framework is based on association rule mining with Apriori algorithm. The patterns found are evaluated by quantitative measurements at several levels on TRANSFAC. With further independent verifications from literatures, Protein Data Bank and homology modeling, there are strong evidences that the patterns discovered reveal real TF–TFBS bindings across different TFs and TFBSs, which can drive for further knowledge to better understand TF–TFBS bindings.
format Text
id pubmed-2965231
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-29652312010-10-28 Discovering protein–DNA binding sequence patterns using association rule mining Leung, Kwong-Sak Wong, Ka-Chun Chan, Tak-Ming Wong, Man-Hon Lee, Kin-Hong Lau, Chi-Kong Tsui, Stephen K. W. Nucleic Acids Res Computational Biology Protein–DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein–DNA bindings. However, it is considered that there are no simple one-to-one rules between amino acids and nucleotides. Many methods impose complicated features beyond sequence patterns. Protein-DNA bindings are formed from associated amino acid and nucleotide sequence pairs, which determine many functional characteristics. Therefore, it is desirable to investigate associated sequence patterns between TFs and TFBSs. With increasing computational power, availability of massive experimental databases on DNA and proteins, and mature data mining techniques, we propose a framework to discover associated TF–TFBS binding sequence patterns in the most explicit and interpretable form from TRANSFAC. The framework is based on association rule mining with Apriori algorithm. The patterns found are evaluated by quantitative measurements at several levels on TRANSFAC. With further independent verifications from literatures, Protein Data Bank and homology modeling, there are strong evidences that the patterns discovered reveal real TF–TFBS bindings across different TFs and TFBSs, which can drive for further knowledge to better understand TF–TFBS bindings. Oxford University Press 2010-10 2010-06-06 /pmc/articles/PMC2965231/ /pubmed/20529874 http://dx.doi.org/10.1093/nar/gkq500 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Computational Biology
Leung, Kwong-Sak
Wong, Ka-Chun
Chan, Tak-Ming
Wong, Man-Hon
Lee, Kin-Hong
Lau, Chi-Kong
Tsui, Stephen K. W.
Discovering protein–DNA binding sequence patterns using association rule mining
title Discovering protein–DNA binding sequence patterns using association rule mining
title_full Discovering protein–DNA binding sequence patterns using association rule mining
title_fullStr Discovering protein–DNA binding sequence patterns using association rule mining
title_full_unstemmed Discovering protein–DNA binding sequence patterns using association rule mining
title_short Discovering protein–DNA binding sequence patterns using association rule mining
title_sort discovering protein–dna binding sequence patterns using association rule mining
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2965231/
https://www.ncbi.nlm.nih.gov/pubmed/20529874
http://dx.doi.org/10.1093/nar/gkq500
work_keys_str_mv AT leungkwongsak discoveringproteindnabindingsequencepatternsusingassociationrulemining
AT wongkachun discoveringproteindnabindingsequencepatternsusingassociationrulemining
AT chantakming discoveringproteindnabindingsequencepatternsusingassociationrulemining
AT wongmanhon discoveringproteindnabindingsequencepatternsusingassociationrulemining
AT leekinhong discoveringproteindnabindingsequencepatternsusingassociationrulemining
AT lauchikong discoveringproteindnabindingsequencepatternsusingassociationrulemining
AT tsuistephenkw discoveringproteindnabindingsequencepatternsusingassociationrulemining