Cargando…

An SVM-based method for assessment of transcription factor-DNA complex models

BACKGROUND: Atomic details of protein-DNA complexes can provide insightful information for better understanding of the function and binding specificity of DNA binding proteins. In addition to experimental methods for solving protein-DNA complex structures, protein-DNA docking can be used to predict...

Descripción completa

Detalles Bibliográficos
Autores principales:	Corona, Rosario I., Sudarshan, Sanjana, Aluru, Srinivas, Guo, Jun-tao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302363/ https://www.ncbi.nlm.nih.gov/pubmed/30577740 http://dx.doi.org/10.1186/s12859-018-2538-y

_version_	1783381961395404800
author	Corona, Rosario I. Sudarshan, Sanjana Aluru, Srinivas Guo, Jun-tao
author_facet	Corona, Rosario I. Sudarshan, Sanjana Aluru, Srinivas Guo, Jun-tao
author_sort	Corona, Rosario I.
collection	PubMed
description	BACKGROUND: Atomic details of protein-DNA complexes can provide insightful information for better understanding of the function and binding specificity of DNA binding proteins. In addition to experimental methods for solving protein-DNA complex structures, protein-DNA docking can be used to predict native or near-native complex models. A docking program typically generates a large number of complex conformations and predicts the complex model(s) based on interaction energies between protein and DNA. However, the prediction accuracy is hampered by current approaches to model assessment, especially when docking simulations fail to produce any near-native models. RESULTS: We present here a Support Vector Machine (SVM)-based approach for quality assessment of the predicted transcription factor (TF)-DNA complex models. Besides a knowledge-based protein-DNA interaction potential DDNA3, we applied several structural features that have been shown to play important roles in binding specificity between transcription factors and DNA molecules to quality assessment of complex models. To address the issue of unbalanced positive and negative cases in the training dataset, we applied hard-negative mining, an iterative training process that selects an initial training dataset by combining all of the positive cases and a random sample from the negative cases. Results show that the SVM model greatly improves prediction accuracy (84.2%) over two knowledge-based protein-DNA interaction potentials, orientation potential (60.8%) and DDNA3 (68.4%). The improvement is achieved through reducing the number of false positive predictions, especially for the hard docking cases, in which a docking algorithm fails to produce any near-native complex models. CONCLUSIONS: A learning-based SVM scoring model with structural features for specific protein-DNA binding and an atomic-level protein-DNA interaction potential DDNA3 significantly improves prediction accuracy of complex models by successfully identifying cases without near-native structural models. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2538-y) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6302363
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-63023632018-12-31 An SVM-based method for assessment of transcription factor-DNA complex models Corona, Rosario I. Sudarshan, Sanjana Aluru, Srinivas Guo, Jun-tao BMC Bioinformatics Research BACKGROUND: Atomic details of protein-DNA complexes can provide insightful information for better understanding of the function and binding specificity of DNA binding proteins. In addition to experimental methods for solving protein-DNA complex structures, protein-DNA docking can be used to predict native or near-native complex models. A docking program typically generates a large number of complex conformations and predicts the complex model(s) based on interaction energies between protein and DNA. However, the prediction accuracy is hampered by current approaches to model assessment, especially when docking simulations fail to produce any near-native models. RESULTS: We present here a Support Vector Machine (SVM)-based approach for quality assessment of the predicted transcription factor (TF)-DNA complex models. Besides a knowledge-based protein-DNA interaction potential DDNA3, we applied several structural features that have been shown to play important roles in binding specificity between transcription factors and DNA molecules to quality assessment of complex models. To address the issue of unbalanced positive and negative cases in the training dataset, we applied hard-negative mining, an iterative training process that selects an initial training dataset by combining all of the positive cases and a random sample from the negative cases. Results show that the SVM model greatly improves prediction accuracy (84.2%) over two knowledge-based protein-DNA interaction potentials, orientation potential (60.8%) and DDNA3 (68.4%). The improvement is achieved through reducing the number of false positive predictions, especially for the hard docking cases, in which a docking algorithm fails to produce any near-native complex models. CONCLUSIONS: A learning-based SVM scoring model with structural features for specific protein-DNA binding and an atomic-level protein-DNA interaction potential DDNA3 significantly improves prediction accuracy of complex models by successfully identifying cases without near-native structural models. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2538-y) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-21 /pmc/articles/PMC6302363/ /pubmed/30577740 http://dx.doi.org/10.1186/s12859-018-2538-y Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Corona, Rosario I. Sudarshan, Sanjana Aluru, Srinivas Guo, Jun-tao An SVM-based method for assessment of transcription factor-DNA complex models
title	An SVM-based method for assessment of transcription factor-DNA complex models
title_full	An SVM-based method for assessment of transcription factor-DNA complex models
title_fullStr	An SVM-based method for assessment of transcription factor-DNA complex models
title_full_unstemmed	An SVM-based method for assessment of transcription factor-DNA complex models
title_short	An SVM-based method for assessment of transcription factor-DNA complex models
title_sort	svm-based method for assessment of transcription factor-dna complex models
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6302363/ https://www.ncbi.nlm.nih.gov/pubmed/30577740 http://dx.doi.org/10.1186/s12859-018-2538-y
work_keys_str_mv	AT coronarosarioi ansvmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT sudarshansanjana ansvmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT alurusrinivas ansvmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT guojuntao ansvmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT coronarosarioi svmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT sudarshansanjana svmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT alurusrinivas svmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels AT guojuntao svmbasedmethodforassessmentoftranscriptionfactordnacomplexmodels

An SVM-based method for assessment of transcription factor-DNA complex models

Ejemplares similares