Cargando…

An SVM-based method for assessment of transcription factor-DNA complex models

BACKGROUND: Atomic details of protein-DNA complexes can provide insightful information for better understanding of the function and binding specificity of DNA binding proteins. In addition to experimental methods for solving protein-DNA complex structures, protein-DNA docking can be used to predict...

Descripción completa

Detalles Bibliográficos
Autores principales: Corona, Rosario I., Sudarshan, Sanjana, Aluru, Srinivas, Guo, Jun-tao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302363/
https://www.ncbi.nlm.nih.gov/pubmed/30577740
http://dx.doi.org/10.1186/s12859-018-2538-y
_version_ 1783381961395404800
author Corona, Rosario I.
Sudarshan, Sanjana
Aluru, Srinivas
Guo, Jun-tao
author_facet Corona, Rosario I.
Sudarshan, Sanjana
Aluru, Srinivas
Guo, Jun-tao
author_sort Corona, Rosario I.
collection PubMed
description BACKGROUND: Atomic details of protein-DNA complexes can provide insightful information for better understanding of the function and binding specificity of DNA binding proteins. In addition to experimental methods for solving protein-DNA complex structures, protein-DNA docking can be used to predict native or near-native complex models. A docking program typically generates a large number of complex conformations and predicts the complex model(s) based on interaction energies between protein and DNA. However, the prediction accuracy is hampered by current approaches to model assessment, especially when docking simulations fail to produce any near-native models. RESULTS: We present here a Support Vector Machine (SVM)-based approach for quality assessment of the predicted transcription factor (TF)-DNA complex models. Besides a knowledge-based protein-DNA interaction potential DDNA3, we applied several structural features that have been shown to play important roles in binding specificity between transcription factors and DNA molecules to quality assessment of complex models. To address the issue of unbalanced positive and negative cases in the training dataset, we applied hard-negative mining, an iterative training process that selects an initial training dataset by combining all of the positive cases and a random sample from the negative cases. Results show that the SVM model greatly improves prediction accuracy (84.2%) over two knowledge-based protein-DNA interaction potentials, orientation potential (60.8%) and DDNA3 (68.4%). The improvement is achieved through reducing the number of false positive predictions, especially for the hard docking cases, in which a docking algorithm fails to produce any near-native complex models. CONCLUSIONS: A learning-based SVM scoring model with structural features for specific protein-DNA binding and an atomic-level protein-DNA interaction potential DDNA3 significantly improves prediction accuracy of complex models by successfully identifying cases without near-native structural models. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2538-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6302363
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-63023632018-12-31 An SVM-based method for assessment of transcription factor-DNA complex models Corona, Rosario I. Sudarshan, Sanjana Aluru, Srinivas Guo, Jun-tao BMC Bioinformatics Research BACKGROUND: Atomic details of protein-DNA complexes can provide insightful information for better understanding of the function and binding specificity of DNA binding proteins. In addition to experimental methods for solving protein-DNA complex structures, protein-DNA docking can be used to predict native or near-native complex models. A docking program typically generates a large number of complex conformations and predicts the complex model(s) based on interaction energies between protein and DNA. However, the prediction accuracy is hampered by current approaches to model assessment, especially when docking simulations fail to produce any near-native models. RESULTS: We present here a Support Vector Machine (SVM)-based approach for quality assessment of the predicted transcription factor (TF)-DNA complex models. Besides a knowledge-based protein-DNA interaction potential DDNA3, we applied several structural features that have been shown to play important roles in binding specificity between transcription factors and DNA molecules to quality assessment of complex models. To address the issue of unbalanced positive and negative cases in the training dataset, we applied hard-negative mining, an iterative training process that selects an initial training dataset by combining all of the positive cases and a random sample from the negative cases. Results show that the SVM model greatly improves prediction accuracy (84.2%) over two knowledge-based protein-DNA interaction potentials, orientation potential (60.8%) and DDNA3 (68.4%). The improvement is achieved through reducing the number of false positive predictions, especially for the hard docking cases, in which a docking algorithm fails to produce any near-native complex models. CONCLUSIONS: A learning-based SVM scoring model with structural features for specific protein-DNA binding and an atomic-level protein-DNA interaction potential DDNA3 significantly improves prediction accuracy of complex models by successfully identifying cases without near-native structural models. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2538-y) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-21 /pmc/articles/PMC6302363/ /pubmed/30577740 http://dx.doi.org/10.1186/s12859-018-2538-y Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Corona, Rosario I.
Sudarshan, Sanjana
Aluru, Srinivas
Guo, Jun-tao
An SVM-based method for assessment of transcription factor-DNA complex models
title An SVM-based method for assessment of transcription factor-DNA complex models
title_full An SVM-based method for assessment of transcription factor-DNA complex models
title_fullStr An SVM-based method for assessment of transcription factor-DNA complex models
title_full_unstemmed An SVM-based method for assessment of transcription factor-DNA complex models
title_short An SVM-based method for assessment of transcription factor-DNA complex models
title_sort svm-based method for assessment of transcription factor-dna complex models
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302363/
https://www.ncbi.nlm.nih.gov/pubmed/30577740
http://dx.doi.org/10.1186/s12859-018-2538-y
work_keys_str_mv AT coronarosarioi ansvmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels
AT sudarshansanjana ansvmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels
AT alurusrinivas ansvmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels
AT guojuntao ansvmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels
AT coronarosarioi svmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels
AT sudarshansanjana svmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels
AT alurusrinivas svmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels
AT guojuntao svmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels