Cargando…
Discovering protein–DNA binding sequence patterns using association rule mining
Protein–DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein–DNA bindings. However, it is considered that the...
Autores principales: | , , , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2965231/ https://www.ncbi.nlm.nih.gov/pubmed/20529874 http://dx.doi.org/10.1093/nar/gkq500 |
_version_ | 1782189493526200320 |
---|---|
author | Leung, Kwong-Sak Wong, Ka-Chun Chan, Tak-Ming Wong, Man-Hon Lee, Kin-Hong Lau, Chi-Kong Tsui, Stephen K. W. |
author_facet | Leung, Kwong-Sak Wong, Ka-Chun Chan, Tak-Ming Wong, Man-Hon Lee, Kin-Hong Lau, Chi-Kong Tsui, Stephen K. W. |
author_sort | Leung, Kwong-Sak |
collection | PubMed |
description | Protein–DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein–DNA bindings. However, it is considered that there are no simple one-to-one rules between amino acids and nucleotides. Many methods impose complicated features beyond sequence patterns. Protein-DNA bindings are formed from associated amino acid and nucleotide sequence pairs, which determine many functional characteristics. Therefore, it is desirable to investigate associated sequence patterns between TFs and TFBSs. With increasing computational power, availability of massive experimental databases on DNA and proteins, and mature data mining techniques, we propose a framework to discover associated TF–TFBS binding sequence patterns in the most explicit and interpretable form from TRANSFAC. The framework is based on association rule mining with Apriori algorithm. The patterns found are evaluated by quantitative measurements at several levels on TRANSFAC. With further independent verifications from literatures, Protein Data Bank and homology modeling, there are strong evidences that the patterns discovered reveal real TF–TFBS bindings across different TFs and TFBSs, which can drive for further knowledge to better understand TF–TFBS bindings. |
format | Text |
id | pubmed-2965231 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-29652312010-10-28 Discovering protein–DNA binding sequence patterns using association rule mining Leung, Kwong-Sak Wong, Ka-Chun Chan, Tak-Ming Wong, Man-Hon Lee, Kin-Hong Lau, Chi-Kong Tsui, Stephen K. W. Nucleic Acids Res Computational Biology Protein–DNA bindings between transcription factors (TFs) and transcription factor binding sites (TFBSs) play an essential role in transcriptional regulation. Over the past decades, significant efforts have been made to study the principles for protein–DNA bindings. However, it is considered that there are no simple one-to-one rules between amino acids and nucleotides. Many methods impose complicated features beyond sequence patterns. Protein-DNA bindings are formed from associated amino acid and nucleotide sequence pairs, which determine many functional characteristics. Therefore, it is desirable to investigate associated sequence patterns between TFs and TFBSs. With increasing computational power, availability of massive experimental databases on DNA and proteins, and mature data mining techniques, we propose a framework to discover associated TF–TFBS binding sequence patterns in the most explicit and interpretable form from TRANSFAC. The framework is based on association rule mining with Apriori algorithm. The patterns found are evaluated by quantitative measurements at several levels on TRANSFAC. With further independent verifications from literatures, Protein Data Bank and homology modeling, there are strong evidences that the patterns discovered reveal real TF–TFBS bindings across different TFs and TFBSs, which can drive for further knowledge to better understand TF–TFBS bindings. Oxford University Press 2010-10 2010-06-06 /pmc/articles/PMC2965231/ /pubmed/20529874 http://dx.doi.org/10.1093/nar/gkq500 Text en © The Author(s) 2010. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.5 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Leung, Kwong-Sak Wong, Ka-Chun Chan, Tak-Ming Wong, Man-Hon Lee, Kin-Hong Lau, Chi-Kong Tsui, Stephen K. W. Discovering protein–DNA binding sequence patterns using association rule mining |
title | Discovering protein–DNA binding sequence patterns using association rule mining |
title_full | Discovering protein–DNA binding sequence patterns using association rule mining |
title_fullStr | Discovering protein–DNA binding sequence patterns using association rule mining |
title_full_unstemmed | Discovering protein–DNA binding sequence patterns using association rule mining |
title_short | Discovering protein–DNA binding sequence patterns using association rule mining |
title_sort | discovering protein–dna binding sequence patterns using association rule mining |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2965231/ https://www.ncbi.nlm.nih.gov/pubmed/20529874 http://dx.doi.org/10.1093/nar/gkq500 |
work_keys_str_mv | AT leungkwongsak discoveringproteindnabindingsequencepatternsusingassociationrulemining AT wongkachun discoveringproteindnabindingsequencepatternsusingassociationrulemining AT chantakming discoveringproteindnabindingsequencepatternsusingassociationrulemining AT wongmanhon discoveringproteindnabindingsequencepatternsusingassociationrulemining AT leekinhong discoveringproteindnabindingsequencepatternsusingassociationrulemining AT lauchikong discoveringproteindnabindingsequencepatternsusingassociationrulemining AT tsuistephenkw discoveringproteindnabindingsequencepatternsusingassociationrulemining |