Cargando…
An ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target DNA
BACKGROUND: The ability to engineer zinc finger proteins binding to a DNA sequence of choice is essential for targeted genome editing to be possible. Experimental techniques and molecular docking have been successful in predicting protein-DNA interactions, however, they are highly time and resource...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5260015/ https://www.ncbi.nlm.nih.gov/pubmed/28155662 http://dx.doi.org/10.1186/s12864-016-3323-9 |
_version_ | 1782499324204154880 |
---|---|
author | Dutta, Shayoni Madan, Spandan Parikh, Harsh Sundar, Durai |
author_facet | Dutta, Shayoni Madan, Spandan Parikh, Harsh Sundar, Durai |
author_sort | Dutta, Shayoni |
collection | PubMed |
description | BACKGROUND: The ability to engineer zinc finger proteins binding to a DNA sequence of choice is essential for targeted genome editing to be possible. Experimental techniques and molecular docking have been successful in predicting protein-DNA interactions, however, they are highly time and resource intensive. Here, we present a novel algorithm designed for high throughput prediction of optimal zinc finger protein for 9 bp DNA sequences of choice. In accordance with the principles of information theory, a subset identified by using K-means clustering was used as a representative for the space of all possible 9 bp DNA sequences. The modeling and simulation results assuming synergistic mode of binding obtained from this subset were used to train an ensemble micro neural network. Synergistic mode of binding is the closest to the DNA-protein binding seen in nature, and gives much higher quality predictions, while the time and resources increase exponentially in the trade off. Our algorithm is inspired from an ensemble machine learning approach, and incorporates the predictions made by 100 parallel neural networks, each with a different hidden layer architecture designed to pick up different features from the training dataset to predict optimal zinc finger proteins for any 9 bp target DNA. RESULTS: The model gave an accuracy of an average 83% sequence identity for the testing dataset. The BLAST e-value are well within the statistical confidence interval of E-05 for 100% of the testing samples. The geometric mean and median value for the BLAST e-values were found to be 1.70E-12 and 7.00E-12 respectively. For final validation of approach, we compared our predictions against optimal ZFPs reported in literature for a set of experimentally studied DNA sequences. The accuracy, as measured by the average string identity between our predictions and the optimal zinc finger protein reported in literature for a 9 bp DNA target was found to be as high as 81% for DNA targets with a consensus sequence GCNGNNGCN reported in literature. Moreover, the average string identity of our predictions for a catalogue of over 100 9 bp DNA for which the optimal zinc finger protein has been reported in literature was found to be 71%. CONCLUSIONS: Validation with experimental data shows that our tool is capable of domain adaptation and thus scales well to datasets other than the training set with high accuracy. As synergistic binding comes the closest to the ideal mode of binding, our algorithm predicts biologically relevant results in sync with the experimental data present in the literature. While there have been disjointed attempts to approach this problem synergistically reported in literature, there is no work covering the whole sample space. Our algorithm allows designing zinc finger proteins for DNA targets of the user’s choice, opening up new frontiers in the field of targeted genome editing. This algorithm is also available as an easy to use web server, ZifNN, at http://web.iitd.ac.in/~sundar/ZifNN/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3323-9) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5260015 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-52600152017-01-26 An ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target DNA Dutta, Shayoni Madan, Spandan Parikh, Harsh Sundar, Durai BMC Genomics Research BACKGROUND: The ability to engineer zinc finger proteins binding to a DNA sequence of choice is essential for targeted genome editing to be possible. Experimental techniques and molecular docking have been successful in predicting protein-DNA interactions, however, they are highly time and resource intensive. Here, we present a novel algorithm designed for high throughput prediction of optimal zinc finger protein for 9 bp DNA sequences of choice. In accordance with the principles of information theory, a subset identified by using K-means clustering was used as a representative for the space of all possible 9 bp DNA sequences. The modeling and simulation results assuming synergistic mode of binding obtained from this subset were used to train an ensemble micro neural network. Synergistic mode of binding is the closest to the DNA-protein binding seen in nature, and gives much higher quality predictions, while the time and resources increase exponentially in the trade off. Our algorithm is inspired from an ensemble machine learning approach, and incorporates the predictions made by 100 parallel neural networks, each with a different hidden layer architecture designed to pick up different features from the training dataset to predict optimal zinc finger proteins for any 9 bp target DNA. RESULTS: The model gave an accuracy of an average 83% sequence identity for the testing dataset. The BLAST e-value are well within the statistical confidence interval of E-05 for 100% of the testing samples. The geometric mean and median value for the BLAST e-values were found to be 1.70E-12 and 7.00E-12 respectively. For final validation of approach, we compared our predictions against optimal ZFPs reported in literature for a set of experimentally studied DNA sequences. The accuracy, as measured by the average string identity between our predictions and the optimal zinc finger protein reported in literature for a 9 bp DNA target was found to be as high as 81% for DNA targets with a consensus sequence GCNGNNGCN reported in literature. Moreover, the average string identity of our predictions for a catalogue of over 100 9 bp DNA for which the optimal zinc finger protein has been reported in literature was found to be 71%. CONCLUSIONS: Validation with experimental data shows that our tool is capable of domain adaptation and thus scales well to datasets other than the training set with high accuracy. As synergistic binding comes the closest to the ideal mode of binding, our algorithm predicts biologically relevant results in sync with the experimental data present in the literature. While there have been disjointed attempts to approach this problem synergistically reported in literature, there is no work covering the whole sample space. Our algorithm allows designing zinc finger proteins for DNA targets of the user’s choice, opening up new frontiers in the field of targeted genome editing. This algorithm is also available as an easy to use web server, ZifNN, at http://web.iitd.ac.in/~sundar/ZifNN/. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3323-9) contains supplementary material, which is available to authorized users. BioMed Central 2016-12-22 /pmc/articles/PMC5260015/ /pubmed/28155662 http://dx.doi.org/10.1186/s12864-016-3323-9 Text en © The Author(s). 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Dutta, Shayoni Madan, Spandan Parikh, Harsh Sundar, Durai An ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target DNA |
title | An ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target DNA |
title_full | An ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target DNA |
title_fullStr | An ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target DNA |
title_full_unstemmed | An ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target DNA |
title_short | An ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target DNA |
title_sort | ensemble micro neural network approach for elucidating interactions between zinc finger proteins and their target dna |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5260015/ https://www.ncbi.nlm.nih.gov/pubmed/28155662 http://dx.doi.org/10.1186/s12864-016-3323-9 |
work_keys_str_mv | AT duttashayoni anensemblemicroneuralnetworkapproachforelucidatinginteractionsbetweenzincfingerproteinsandtheirtargetdna AT madanspandan anensemblemicroneuralnetworkapproachforelucidatinginteractionsbetweenzincfingerproteinsandtheirtargetdna AT parikhharsh anensemblemicroneuralnetworkapproachforelucidatinginteractionsbetweenzincfingerproteinsandtheirtargetdna AT sundardurai anensemblemicroneuralnetworkapproachforelucidatinginteractionsbetweenzincfingerproteinsandtheirtargetdna AT duttashayoni ensemblemicroneuralnetworkapproachforelucidatinginteractionsbetweenzincfingerproteinsandtheirtargetdna AT madanspandan ensemblemicroneuralnetworkapproachforelucidatinginteractionsbetweenzincfingerproteinsandtheirtargetdna AT parikhharsh ensemblemicroneuralnetworkapproachforelucidatinginteractionsbetweenzincfingerproteinsandtheirtargetdna AT sundardurai ensemblemicroneuralnetworkapproachforelucidatinginteractionsbetweenzincfingerproteinsandtheirtargetdna |