Cargando…

Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning

Pathogens causing infections, and particularly when invading the host cells, require the host cell machinery for efficient regeneration and proliferation during infection. For their life cycle, host proteins are needed and these Host Dependency Factors (HDF) may serve as therapeutic targets. Several...

Descripción completa

Detalles Bibliográficos
Autores principales: Aromolaran, Olufemi, Beder, Thomas, Adedeji, Eunice, Ajamma, Yvonne, Oyelade, Jelili, Adebiyi, Ezekiel, Koenig, Rainer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8385402/
https://www.ncbi.nlm.nih.gov/pubmed/34471501
http://dx.doi.org/10.1016/j.csbj.2021.08.010
_version_ 1783742085864620032
author Aromolaran, Olufemi
Beder, Thomas
Adedeji, Eunice
Ajamma, Yvonne
Oyelade, Jelili
Adebiyi, Ezekiel
Koenig, Rainer
author_facet Aromolaran, Olufemi
Beder, Thomas
Adedeji, Eunice
Ajamma, Yvonne
Oyelade, Jelili
Adebiyi, Ezekiel
Koenig, Rainer
author_sort Aromolaran, Olufemi
collection PubMed
description Pathogens causing infections, and particularly when invading the host cells, require the host cell machinery for efficient regeneration and proliferation during infection. For their life cycle, host proteins are needed and these Host Dependency Factors (HDF) may serve as therapeutic targets. Several attempts have approached screening for HDF producing large lists of potential HDF with, however, only marginal overlap. To get consistency into the data of these experimental studies, we developed a machine learning pipeline. As a case study, we used publicly available lists of experimentally derived HDF from twelve different screening studies based on gene perturbation in Drosophila melanogaster cells or in vivo upon bacterial or protozoan infection. A total of 50,334 gene features were generated from diverse categories including their functional annotations, topology attributes in protein interaction networks, nucleotide and protein sequence features, homology properties and subcellular localization. Cross-validation revealed an excellent prediction performance. All feature categories contributed to the model. Predicted and experimentally derived HDF showed a good consistency when investigating their common cellular processes and function. Cellular processes and molecular function of these genes were highly enriched in membrane trafficking, particularly in the trans-Golgi network, cell cycle and the Rab GTPase binding family. Using our machine learning approach, we show that HDF in organisms can be predicted with high accuracy evidencing their common investigated characteristics. We elucidated cellular processes which are utilized by invading pathogens during infection. Finally, we provide a list of 208 novel HDF proposed for future experimental studies.
format Online
Article
Text
id pubmed-8385402
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-83854022021-08-31 Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning Aromolaran, Olufemi Beder, Thomas Adedeji, Eunice Ajamma, Yvonne Oyelade, Jelili Adebiyi, Ezekiel Koenig, Rainer Comput Struct Biotechnol J Research Article Pathogens causing infections, and particularly when invading the host cells, require the host cell machinery for efficient regeneration and proliferation during infection. For their life cycle, host proteins are needed and these Host Dependency Factors (HDF) may serve as therapeutic targets. Several attempts have approached screening for HDF producing large lists of potential HDF with, however, only marginal overlap. To get consistency into the data of these experimental studies, we developed a machine learning pipeline. As a case study, we used publicly available lists of experimentally derived HDF from twelve different screening studies based on gene perturbation in Drosophila melanogaster cells or in vivo upon bacterial or protozoan infection. A total of 50,334 gene features were generated from diverse categories including their functional annotations, topology attributes in protein interaction networks, nucleotide and protein sequence features, homology properties and subcellular localization. Cross-validation revealed an excellent prediction performance. All feature categories contributed to the model. Predicted and experimentally derived HDF showed a good consistency when investigating their common cellular processes and function. Cellular processes and molecular function of these genes were highly enriched in membrane trafficking, particularly in the trans-Golgi network, cell cycle and the Rab GTPase binding family. Using our machine learning approach, we show that HDF in organisms can be predicted with high accuracy evidencing their common investigated characteristics. We elucidated cellular processes which are utilized by invading pathogens during infection. Finally, we provide a list of 208 novel HDF proposed for future experimental studies. Research Network of Computational and Structural Biotechnology 2021-08-09 /pmc/articles/PMC8385402/ /pubmed/34471501 http://dx.doi.org/10.1016/j.csbj.2021.08.010 Text en © 2021 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Aromolaran, Olufemi
Beder, Thomas
Adedeji, Eunice
Ajamma, Yvonne
Oyelade, Jelili
Adebiyi, Ezekiel
Koenig, Rainer
Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning
title Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning
title_full Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning
title_fullStr Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning
title_full_unstemmed Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning
title_short Predicting host dependency factors of pathogens in Drosophila melanogaster using machine learning
title_sort predicting host dependency factors of pathogens in drosophila melanogaster using machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8385402/
https://www.ncbi.nlm.nih.gov/pubmed/34471501
http://dx.doi.org/10.1016/j.csbj.2021.08.010
work_keys_str_mv AT aromolaranolufemi predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
AT bederthomas predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
AT adedejieunice predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
AT ajammayvonne predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
AT oyeladejelili predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
AT adebiyiezekiel predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning
AT koenigrainer predictinghostdependencyfactorsofpathogensindrosophilamelanogasterusingmachinelearning