Cargando…

Identification of infectious disease-associated host genes using machine learning techniques

BACKGROUND: With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their developme...

Descripción completa

Detalles Bibliográficos
Autores principales: Barman, Ranjan Kumar, Mukhopadhyay, Anirban, Maulik, Ujjwal, Das, Santasabuj
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6935192/
https://www.ncbi.nlm.nih.gov/pubmed/31881961
http://dx.doi.org/10.1186/s12859-019-3317-0
_version_ 1783483538303090688
author Barman, Ranjan Kumar
Mukhopadhyay, Anirban
Maulik, Ujjwal
Das, Santasabuj
author_facet Barman, Ranjan Kumar
Mukhopadhyay, Anirban
Maulik, Ujjwal
Das, Santasabuj
author_sort Barman, Ranjan Kumar
collection PubMed
description BACKGROUND: With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their development and help to identify novel therapeutic targets. RESULTS: We developed a machine learning techniques-based classification approach to identify infectious disease-associated host genes by integrating sequence and protein interaction network features. Among different methods, Deep Neural Networks (DNN) model with 16 selected features for pseudo-amino acid composition (PAAC) and network properties achieved the highest accuracy of 86.33% with sensitivity of 85.61% and specificity of 86.57%. The DNN classifier also attained an accuracy of 83.33% on a blind dataset and a sensitivity of 83.1% on an independent dataset. Furthermore, to predict unknown infectious disease-associated host genes, we applied the proposed DNN model to all reviewed proteins from the database. Seventy-six out of 100 highly-predicted infectious disease-associated genes from our study were also found in experimentally-verified human-pathogen protein-protein interactions (PPIs). Finally, we validated the highly-predicted infectious disease-associated genes by disease and gene ontology enrichment analysis and found that many of them are shared by one or more of the other diseases, such as cancer, metabolic and immune related diseases. CONCLUSIONS: To the best of our knowledge, this is the first computational method to identify infectious disease-associated host genes. The proposed method will help large-scale prediction of host genes associated with infectious-diseases. However, our results indicated that for small datasets, advanced DNN-based method does not offer significant advantage over the simpler supervised machine learning techniques, such as Support Vector Machine (SVM) or Random Forest (RF) for the prediction of infectious disease-associated host genes. Significant overlap of infectious disease with cancer and metabolic disease on disease and gene ontology enrichment analysis suggests that these diseases perturb the functions of the same cellular signaling pathways and may be treated by drugs that tend to reverse these perturbations. Moreover, identification of novel candidate genes associated with infectious diseases would help us to explain disease pathogenesis further and develop novel therapeutics.
format Online
Article
Text
id pubmed-6935192
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69351922019-12-30 Identification of infectious disease-associated host genes using machine learning techniques Barman, Ranjan Kumar Mukhopadhyay, Anirban Maulik, Ujjwal Das, Santasabuj BMC Bioinformatics Research Article BACKGROUND: With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their development and help to identify novel therapeutic targets. RESULTS: We developed a machine learning techniques-based classification approach to identify infectious disease-associated host genes by integrating sequence and protein interaction network features. Among different methods, Deep Neural Networks (DNN) model with 16 selected features for pseudo-amino acid composition (PAAC) and network properties achieved the highest accuracy of 86.33% with sensitivity of 85.61% and specificity of 86.57%. The DNN classifier also attained an accuracy of 83.33% on a blind dataset and a sensitivity of 83.1% on an independent dataset. Furthermore, to predict unknown infectious disease-associated host genes, we applied the proposed DNN model to all reviewed proteins from the database. Seventy-six out of 100 highly-predicted infectious disease-associated genes from our study were also found in experimentally-verified human-pathogen protein-protein interactions (PPIs). Finally, we validated the highly-predicted infectious disease-associated genes by disease and gene ontology enrichment analysis and found that many of them are shared by one or more of the other diseases, such as cancer, metabolic and immune related diseases. CONCLUSIONS: To the best of our knowledge, this is the first computational method to identify infectious disease-associated host genes. The proposed method will help large-scale prediction of host genes associated with infectious-diseases. However, our results indicated that for small datasets, advanced DNN-based method does not offer significant advantage over the simpler supervised machine learning techniques, such as Support Vector Machine (SVM) or Random Forest (RF) for the prediction of infectious disease-associated host genes. Significant overlap of infectious disease with cancer and metabolic disease on disease and gene ontology enrichment analysis suggests that these diseases perturb the functions of the same cellular signaling pathways and may be treated by drugs that tend to reverse these perturbations. Moreover, identification of novel candidate genes associated with infectious diseases would help us to explain disease pathogenesis further and develop novel therapeutics. BioMed Central 2019-12-27 /pmc/articles/PMC6935192/ /pubmed/31881961 http://dx.doi.org/10.1186/s12859-019-3317-0 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Barman, Ranjan Kumar
Mukhopadhyay, Anirban
Maulik, Ujjwal
Das, Santasabuj
Identification of infectious disease-associated host genes using machine learning techniques
title Identification of infectious disease-associated host genes using machine learning techniques
title_full Identification of infectious disease-associated host genes using machine learning techniques
title_fullStr Identification of infectious disease-associated host genes using machine learning techniques
title_full_unstemmed Identification of infectious disease-associated host genes using machine learning techniques
title_short Identification of infectious disease-associated host genes using machine learning techniques
title_sort identification of infectious disease-associated host genes using machine learning techniques
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6935192/
https://www.ncbi.nlm.nih.gov/pubmed/31881961
http://dx.doi.org/10.1186/s12859-019-3317-0
work_keys_str_mv AT barmanranjankumar identificationofinfectiousdiseaseassociatedhostgenesusingmachinelearningtechniques
AT mukhopadhyayanirban identificationofinfectiousdiseaseassociatedhostgenesusingmachinelearningtechniques
AT maulikujjwal identificationofinfectiousdiseaseassociatedhostgenesusingmachinelearningtechniques
AT dassantasabuj identificationofinfectiousdiseaseassociatedhostgenesusingmachinelearningtechniques