Cargando…

The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies

Understanding the molecular machinery involved in transcriptional regulation is central to improving our knowledge of an organism’s development, disease, and evolution. The building blocks of this complex molecular machinery are an organism’s genomic DNA sequence and transcription factor proteins. D...

Descripción completa

Detalles Bibliográficos
Autores principales: Elmas, Abdulkadir, Wang, Xiaodong, Dresch, Jacqueline M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5628859/
https://www.ncbi.nlm.nih.gov/pubmed/28982128
http://dx.doi.org/10.1371/journal.pone.0185570
_version_ 1783268954096009216
author Elmas, Abdulkadir
Wang, Xiaodong
Dresch, Jacqueline M.
author_facet Elmas, Abdulkadir
Wang, Xiaodong
Dresch, Jacqueline M.
author_sort Elmas, Abdulkadir
collection PubMed
description Understanding the molecular machinery involved in transcriptional regulation is central to improving our knowledge of an organism’s development, disease, and evolution. The building blocks of this complex molecular machinery are an organism’s genomic DNA sequence and transcription factor proteins. Despite the vast amount of sequence data now available for many model organisms, predicting where transcription factors bind, often referred to as ‘motif detection’ is still incredibly challenging. In this study, we develop a novel bioinformatic approach to binding site prediction. We do this by extending pre-existing SVM approaches in an unbiased way to include all possible gapped k-mers, representing different combinations of complex nucleotide dependencies within binding sites. We show the advantages of this new approach when compared to existing SVM approaches, through a rigorous set of cross-validation experiments. We also demonstrate the effectiveness of our new approach by reporting on its improved performance on a set of 127 genomic regions known to regulate gene expression along the anterio-posterior axis in early Drosophila embryos.
format Online
Article
Text
id pubmed-5628859
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-56288592017-10-20 The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies Elmas, Abdulkadir Wang, Xiaodong Dresch, Jacqueline M. PLoS One Research Article Understanding the molecular machinery involved in transcriptional regulation is central to improving our knowledge of an organism’s development, disease, and evolution. The building blocks of this complex molecular machinery are an organism’s genomic DNA sequence and transcription factor proteins. Despite the vast amount of sequence data now available for many model organisms, predicting where transcription factors bind, often referred to as ‘motif detection’ is still incredibly challenging. In this study, we develop a novel bioinformatic approach to binding site prediction. We do this by extending pre-existing SVM approaches in an unbiased way to include all possible gapped k-mers, representing different combinations of complex nucleotide dependencies within binding sites. We show the advantages of this new approach when compared to existing SVM approaches, through a rigorous set of cross-validation experiments. We also demonstrate the effectiveness of our new approach by reporting on its improved performance on a set of 127 genomic regions known to regulate gene expression along the anterio-posterior axis in early Drosophila embryos. Public Library of Science 2017-10-05 /pmc/articles/PMC5628859/ /pubmed/28982128 http://dx.doi.org/10.1371/journal.pone.0185570 Text en © 2017 Elmas et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Elmas, Abdulkadir
Wang, Xiaodong
Dresch, Jacqueline M.
The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies
title The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies
title_full The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies
title_fullStr The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies
title_full_unstemmed The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies
title_short The folded k-spectrum kernel: A machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies
title_sort folded k-spectrum kernel: a machine learning approach to detecting transcription factor binding sites with gapped nucleotide dependencies
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5628859/
https://www.ncbi.nlm.nih.gov/pubmed/28982128
http://dx.doi.org/10.1371/journal.pone.0185570
work_keys_str_mv AT elmasabdulkadir thefoldedkspectrumkernelamachinelearningapproachtodetectingtranscriptionfactorbindingsiteswithgappednucleotidedependencies
AT wangxiaodong thefoldedkspectrumkernelamachinelearningapproachtodetectingtranscriptionfactorbindingsiteswithgappednucleotidedependencies
AT dreschjacquelinem thefoldedkspectrumkernelamachinelearningapproachtodetectingtranscriptionfactorbindingsiteswithgappednucleotidedependencies
AT elmasabdulkadir foldedkspectrumkernelamachinelearningapproachtodetectingtranscriptionfactorbindingsiteswithgappednucleotidedependencies
AT wangxiaodong foldedkspectrumkernelamachinelearningapproachtodetectingtranscriptionfactorbindingsiteswithgappednucleotidedependencies
AT dreschjacquelinem foldedkspectrumkernelamachinelearningapproachtodetectingtranscriptionfactorbindingsiteswithgappednucleotidedependencies