Cargando…

Prediction of novel mouse TLR9 agonists using a random forest approach

BACKGROUND: Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the con...

Descripción completa

Detalles Bibliográficos
Autores principales: Khanna, Varun, Li, Lei, Fung, Johnson, Ranganathan, Shoba, Petrovsky, Nikolai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6924143/
https://www.ncbi.nlm.nih.gov/pubmed/31856726
http://dx.doi.org/10.1186/s12860-019-0241-0
_version_ 1783481672649408512
author Khanna, Varun
Li, Lei
Fung, Johnson
Ranganathan, Shoba
Petrovsky, Nikolai
author_facet Khanna, Varun
Li, Lei
Fung, Johnson
Ranganathan, Shoba
Petrovsky, Nikolai
author_sort Khanna, Varun
collection PubMed
description BACKGROUND: Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. RESULTS: Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. CONCLUSION: We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists.
format Online
Article
Text
id pubmed-6924143
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69241432019-12-30 Prediction of novel mouse TLR9 agonists using a random forest approach Khanna, Varun Li, Lei Fung, Johnson Ranganathan, Shoba Petrovsky, Nikolai BMC Mol Cell Biol Research BACKGROUND: Toll-like receptor 9 is a key innate immune receptor involved in detecting infectious diseases and cancer. TLR9 activates the innate immune system following the recognition of single-stranded DNA oligonucleotides (ODN) containing unmethylated cytosine-guanine (CpG) motifs. Due to the considerable number of rotatable bonds in ODNs, high-throughput in silico screening for potential TLR9 activity via traditional structure-based virtual screening approaches of CpG ODNs is challenging. In the current study, we present a machine learning based method for predicting novel mouse TLR9 (mTLR9) agonists based on features including count and position of motifs, the distance between the motifs and graphically derived features such as the radius of gyration and moment of Inertia. We employed an in-house experimentally validated dataset of 396 single-stranded synthetic ODNs, to compare the results of five machine learning algorithms. Since the dataset was highly imbalanced, we used an ensemble learning approach based on repeated random down-sampling. RESULTS: Using in-house experimental TLR9 activity data we found that random forest algorithm outperformed other algorithms for our dataset for TLR9 activity prediction. Therefore, we developed a cross-validated ensemble classifier of 20 random forest models. The average Matthews correlation coefficient and balanced accuracy of our ensemble classifier in test samples was 0.61 and 80.0%, respectively, with the maximum balanced accuracy and Matthews correlation coefficient of 87.0% and 0.75, respectively. We confirmed common sequence motifs including ‘CC’, ‘GG’,‘AG’, ‘CCCG’ and ‘CGGC’ were overrepresented in mTLR9 agonists. Predictions on 6000 randomly generated ODNs were ranked and the top 100 ODNs were synthesized and experimentally tested for activity in a mTLR9 reporter cell assay, with 91 of the 100 selected ODNs showing high activity, confirming the accuracy of the model in predicting mTLR9 activity. CONCLUSION: We combined repeated random down-sampling with random forest to overcome the class imbalance problem and achieved promising results. Overall, we showed that the random forest algorithm outperformed other machine learning algorithms including support vector machines, shrinkage discriminant analysis, gradient boosting machine and neural networks. Due to its predictive performance and simplicity, the random forest technique is a useful method for prediction of mTLR9 ODN agonists. BioMed Central 2019-12-20 /pmc/articles/PMC6924143/ /pubmed/31856726 http://dx.doi.org/10.1186/s12860-019-0241-0 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Khanna, Varun
Li, Lei
Fung, Johnson
Ranganathan, Shoba
Petrovsky, Nikolai
Prediction of novel mouse TLR9 agonists using a random forest approach
title Prediction of novel mouse TLR9 agonists using a random forest approach
title_full Prediction of novel mouse TLR9 agonists using a random forest approach
title_fullStr Prediction of novel mouse TLR9 agonists using a random forest approach
title_full_unstemmed Prediction of novel mouse TLR9 agonists using a random forest approach
title_short Prediction of novel mouse TLR9 agonists using a random forest approach
title_sort prediction of novel mouse tlr9 agonists using a random forest approach
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6924143/
https://www.ncbi.nlm.nih.gov/pubmed/31856726
http://dx.doi.org/10.1186/s12860-019-0241-0
work_keys_str_mv AT khannavarun predictionofnovelmousetlr9agonistsusingarandomforestapproach
AT lilei predictionofnovelmousetlr9agonistsusingarandomforestapproach
AT fungjohnson predictionofnovelmousetlr9agonistsusingarandomforestapproach
AT ranganathanshoba predictionofnovelmousetlr9agonistsusingarandomforestapproach
AT petrovskynikolai predictionofnovelmousetlr9agonistsusingarandomforestapproach