Cargando…

PNImodeler: web server for inferring protein-binding nucleotides from sequence data

BACKGROUND: Interactions between DNA and proteins are essential to many biological processes such as transcriptional regulation and DNA replication. With the increased availability of structures of protein-DNA complexes, several computational studies have been conducted to predict DNA binding sites...

Descripción completa

Detalles Bibliográficos
Autores principales: Im, Jinyong, Tuvshinjargal, Narankhuu, Park, Byungkyu, Lee, Wook, Huang, De-Shuang, Han, Kyungsook
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331809/
https://www.ncbi.nlm.nih.gov/pubmed/25708089
http://dx.doi.org/10.1186/1471-2164-16-S3-S6
_version_ 1782357784044503040
author Im, Jinyong
Tuvshinjargal, Narankhuu
Park, Byungkyu
Lee, Wook
Huang, De-Shuang
Han, Kyungsook
author_facet Im, Jinyong
Tuvshinjargal, Narankhuu
Park, Byungkyu
Lee, Wook
Huang, De-Shuang
Han, Kyungsook
author_sort Im, Jinyong
collection PubMed
description BACKGROUND: Interactions between DNA and proteins are essential to many biological processes such as transcriptional regulation and DNA replication. With the increased availability of structures of protein-DNA complexes, several computational studies have been conducted to predict DNA binding sites in proteins. However, little attempt has been made to predict protein binding sites in DNA. RESULTS: From an extensive analysis of protein-DNA complexes, we identified powerful features of DNA and protein sequences which can be used in predicting protein binding sites in DNA sequences. We developed two support vector machine (SVM) models that predict protein binding nucleotides from DNA and/or protein sequences. One SVM model that used DNA sequence data alone achieved a sensitivity of 73.4%, a specificity of 64.8%, an accuracy of 68.9% and a correlation coefficient of 0.382 with a test dataset that was not used in training. Another SVM model that used both DNA and protein sequences achieved a sensitivity of 67.6%, a specificity of 74.3%, an accuracy of 71.4% and a correlation coefficient of 0.418. CONCLUSIONS: Predicting binding sites in double-stranded DNAs is a more difficult task than predicting binding sites in single-stranded molecules. Our study showed that protein binding sites in double-stranded DNA molecules can be predicted with a comparable accuracy as those in single-stranded molecules. Our study also demonstrated that using both DNA and protein sequences resulted in a better prediction performance than using DNA sequence data alone. The SVM models and datasets constructed in this study are available at http://bclab.inha.ac.kr/pnimodeler.
format Online
Article
Text
id pubmed-4331809
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43318092015-03-19 PNImodeler: web server for inferring protein-binding nucleotides from sequence data Im, Jinyong Tuvshinjargal, Narankhuu Park, Byungkyu Lee, Wook Huang, De-Shuang Han, Kyungsook BMC Genomics Proceedings BACKGROUND: Interactions between DNA and proteins are essential to many biological processes such as transcriptional regulation and DNA replication. With the increased availability of structures of protein-DNA complexes, several computational studies have been conducted to predict DNA binding sites in proteins. However, little attempt has been made to predict protein binding sites in DNA. RESULTS: From an extensive analysis of protein-DNA complexes, we identified powerful features of DNA and protein sequences which can be used in predicting protein binding sites in DNA sequences. We developed two support vector machine (SVM) models that predict protein binding nucleotides from DNA and/or protein sequences. One SVM model that used DNA sequence data alone achieved a sensitivity of 73.4%, a specificity of 64.8%, an accuracy of 68.9% and a correlation coefficient of 0.382 with a test dataset that was not used in training. Another SVM model that used both DNA and protein sequences achieved a sensitivity of 67.6%, a specificity of 74.3%, an accuracy of 71.4% and a correlation coefficient of 0.418. CONCLUSIONS: Predicting binding sites in double-stranded DNAs is a more difficult task than predicting binding sites in single-stranded molecules. Our study showed that protein binding sites in double-stranded DNA molecules can be predicted with a comparable accuracy as those in single-stranded molecules. Our study also demonstrated that using both DNA and protein sequences resulted in a better prediction performance than using DNA sequence data alone. The SVM models and datasets constructed in this study are available at http://bclab.inha.ac.kr/pnimodeler. BioMed Central 2015-01-29 /pmc/articles/PMC4331809/ /pubmed/25708089 http://dx.doi.org/10.1186/1471-2164-16-S3-S6 Text en Copyright © 2015 Im et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Im, Jinyong
Tuvshinjargal, Narankhuu
Park, Byungkyu
Lee, Wook
Huang, De-Shuang
Han, Kyungsook
PNImodeler: web server for inferring protein-binding nucleotides from sequence data
title PNImodeler: web server for inferring protein-binding nucleotides from sequence data
title_full PNImodeler: web server for inferring protein-binding nucleotides from sequence data
title_fullStr PNImodeler: web server for inferring protein-binding nucleotides from sequence data
title_full_unstemmed PNImodeler: web server for inferring protein-binding nucleotides from sequence data
title_short PNImodeler: web server for inferring protein-binding nucleotides from sequence data
title_sort pnimodeler: web server for inferring protein-binding nucleotides from sequence data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4331809/
https://www.ncbi.nlm.nih.gov/pubmed/25708089
http://dx.doi.org/10.1186/1471-2164-16-S3-S6
work_keys_str_mv AT imjinyong pnimodelerwebserverforinferringproteinbindingnucleotidesfromsequencedata
AT tuvshinjargalnarankhuu pnimodelerwebserverforinferringproteinbindingnucleotidesfromsequencedata
AT parkbyungkyu pnimodelerwebserverforinferringproteinbindingnucleotidesfromsequencedata
AT leewook pnimodelerwebserverforinferringproteinbindingnucleotidesfromsequencedata
AT huangdeshuang pnimodelerwebserverforinferringproteinbindingnucleotidesfromsequencedata
AT hankyungsook pnimodelerwebserverforinferringproteinbindingnucleotidesfromsequencedata