Cargando…

DNA Motif Recognition Modeling from Protein Sequences

Although the existing works on DNA motif discovery on DNA sequences are plethoric, mechanistic knowledge to infer DNA motifs from protein sequences across multiple DNA-binding domain families without conducting any wet-lab experiments is still lacking. Therefore, the k-spectrum recognition modeling...

Descripción completa

Detalles Bibliográficos
Autor principal: Wong, Ka-Chun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6153143/
https://www.ncbi.nlm.nih.gov/pubmed/30267681
http://dx.doi.org/10.1016/j.isci.2018.09.003
_version_ 1783357467501002752
author Wong, Ka-Chun
author_facet Wong, Ka-Chun
author_sort Wong, Ka-Chun
collection PubMed
description Although the existing works on DNA motif discovery on DNA sequences are plethoric, mechanistic knowledge to infer DNA motifs from protein sequences across multiple DNA-binding domain families without conducting any wet-lab experiments is still lacking. Therefore, the k-spectrum recognition modeling is proposed to address the issues at the highest possible resolutions. The k-spectrum model can capture DNA motif patterns from protein sequences at the resolution in which local sequence context and nucleotide dependency can be taken into account completely. Multiple evaluation metrics are adopted and measured on millions of k-mer binding intensities from 92 proteins across 5 DNA-binding families (i.e., bHLH, bZIP, ETS, Forkhead, and Homeodomain), demonstrating its competitive edges. In addition, it not only can contribute to DNA motif recognition modeling but also can help prioritize the observed or even unobserved binding of single nucleotide variants on transcription factor binding sites in a genome-wide manner.
format Online
Article
Text
id pubmed-6153143
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-61531432018-09-25 DNA Motif Recognition Modeling from Protein Sequences Wong, Ka-Chun iScience Article Although the existing works on DNA motif discovery on DNA sequences are plethoric, mechanistic knowledge to infer DNA motifs from protein sequences across multiple DNA-binding domain families without conducting any wet-lab experiments is still lacking. Therefore, the k-spectrum recognition modeling is proposed to address the issues at the highest possible resolutions. The k-spectrum model can capture DNA motif patterns from protein sequences at the resolution in which local sequence context and nucleotide dependency can be taken into account completely. Multiple evaluation metrics are adopted and measured on millions of k-mer binding intensities from 92 proteins across 5 DNA-binding families (i.e., bHLH, bZIP, ETS, Forkhead, and Homeodomain), demonstrating its competitive edges. In addition, it not only can contribute to DNA motif recognition modeling but also can help prioritize the observed or even unobserved binding of single nucleotide variants on transcription factor binding sites in a genome-wide manner. Elsevier 2018-09-10 /pmc/articles/PMC6153143/ /pubmed/30267681 http://dx.doi.org/10.1016/j.isci.2018.09.003 Text en © 2018 The Author(s) http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Wong, Ka-Chun
DNA Motif Recognition Modeling from Protein Sequences
title DNA Motif Recognition Modeling from Protein Sequences
title_full DNA Motif Recognition Modeling from Protein Sequences
title_fullStr DNA Motif Recognition Modeling from Protein Sequences
title_full_unstemmed DNA Motif Recognition Modeling from Protein Sequences
title_short DNA Motif Recognition Modeling from Protein Sequences
title_sort dna motif recognition modeling from protein sequences
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6153143/
https://www.ncbi.nlm.nih.gov/pubmed/30267681
http://dx.doi.org/10.1016/j.isci.2018.09.003
work_keys_str_mv AT wongkachun dnamotifrecognitionmodelingfromproteinsequences