Cargando…
An SVM-based method for assessment of transcription factor-DNA complex models
BACKGROUND: Atomic details of protein-DNA complexes can provide insightful information for better understanding of the function and binding specificity of DNA binding proteins. In addition to experimental methods for solving protein-DNA complex structures, protein-DNA docking can be used to predict...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302363/ https://www.ncbi.nlm.nih.gov/pubmed/30577740 http://dx.doi.org/10.1186/s12859-018-2538-y |
_version_ | 1783381961395404800 |
---|---|
author | Corona, Rosario I. Sudarshan, Sanjana Aluru, Srinivas Guo, Jun-tao |
author_facet | Corona, Rosario I. Sudarshan, Sanjana Aluru, Srinivas Guo, Jun-tao |
author_sort | Corona, Rosario I. |
collection | PubMed |
description | BACKGROUND: Atomic details of protein-DNA complexes can provide insightful information for better understanding of the function and binding specificity of DNA binding proteins. In addition to experimental methods for solving protein-DNA complex structures, protein-DNA docking can be used to predict native or near-native complex models. A docking program typically generates a large number of complex conformations and predicts the complex model(s) based on interaction energies between protein and DNA. However, the prediction accuracy is hampered by current approaches to model assessment, especially when docking simulations fail to produce any near-native models. RESULTS: We present here a Support Vector Machine (SVM)-based approach for quality assessment of the predicted transcription factor (TF)-DNA complex models. Besides a knowledge-based protein-DNA interaction potential DDNA3, we applied several structural features that have been shown to play important roles in binding specificity between transcription factors and DNA molecules to quality assessment of complex models. To address the issue of unbalanced positive and negative cases in the training dataset, we applied hard-negative mining, an iterative training process that selects an initial training dataset by combining all of the positive cases and a random sample from the negative cases. Results show that the SVM model greatly improves prediction accuracy (84.2%) over two knowledge-based protein-DNA interaction potentials, orientation potential (60.8%) and DDNA3 (68.4%). The improvement is achieved through reducing the number of false positive predictions, especially for the hard docking cases, in which a docking algorithm fails to produce any near-native complex models. CONCLUSIONS: A learning-based SVM scoring model with structural features for specific protein-DNA binding and an atomic-level protein-DNA interaction potential DDNA3 significantly improves prediction accuracy of complex models by successfully identifying cases without near-native structural models. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2538-y) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6302363 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-63023632018-12-31 An SVM-based method for assessment of transcription factor-DNA complex models Corona, Rosario I. Sudarshan, Sanjana Aluru, Srinivas Guo, Jun-tao BMC Bioinformatics Research BACKGROUND: Atomic details of protein-DNA complexes can provide insightful information for better understanding of the function and binding specificity of DNA binding proteins. In addition to experimental methods for solving protein-DNA complex structures, protein-DNA docking can be used to predict native or near-native complex models. A docking program typically generates a large number of complex conformations and predicts the complex model(s) based on interaction energies between protein and DNA. However, the prediction accuracy is hampered by current approaches to model assessment, especially when docking simulations fail to produce any near-native models. RESULTS: We present here a Support Vector Machine (SVM)-based approach for quality assessment of the predicted transcription factor (TF)-DNA complex models. Besides a knowledge-based protein-DNA interaction potential DDNA3, we applied several structural features that have been shown to play important roles in binding specificity between transcription factors and DNA molecules to quality assessment of complex models. To address the issue of unbalanced positive and negative cases in the training dataset, we applied hard-negative mining, an iterative training process that selects an initial training dataset by combining all of the positive cases and a random sample from the negative cases. Results show that the SVM model greatly improves prediction accuracy (84.2%) over two knowledge-based protein-DNA interaction potentials, orientation potential (60.8%) and DDNA3 (68.4%). The improvement is achieved through reducing the number of false positive predictions, especially for the hard docking cases, in which a docking algorithm fails to produce any near-native complex models. CONCLUSIONS: A learning-based SVM scoring model with structural features for specific protein-DNA binding and an atomic-level protein-DNA interaction potential DDNA3 significantly improves prediction accuracy of complex models by successfully identifying cases without near-native structural models. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2538-y) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-21 /pmc/articles/PMC6302363/ /pubmed/30577740 http://dx.doi.org/10.1186/s12859-018-2538-y Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Corona, Rosario I. Sudarshan, Sanjana Aluru, Srinivas Guo, Jun-tao An SVM-based method for assessment of transcription factor-DNA complex models |
title | An SVM-based method for assessment of transcription factor-DNA complex models |
title_full | An SVM-based method for assessment of transcription factor-DNA complex models |
title_fullStr | An SVM-based method for assessment of transcription factor-DNA complex models |
title_full_unstemmed | An SVM-based method for assessment of transcription factor-DNA complex models |
title_short | An SVM-based method for assessment of transcription factor-DNA complex models |
title_sort | svm-based method for assessment of transcription factor-dna complex models |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302363/ https://www.ncbi.nlm.nih.gov/pubmed/30577740 http://dx.doi.org/10.1186/s12859-018-2538-y |
work_keys_str_mv | AT coronarosarioi ansvmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT sudarshansanjana ansvmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT alurusrinivas ansvmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT guojuntao ansvmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT coronarosarioi svmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT sudarshansanjana svmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT alurusrinivas svmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT guojuntao svmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels |