Cargando…

Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features

Genes are termed to be essential if their loss of function compromises viability or results in profound loss of fitness. On the genome scale, these genes can be determined experimentally employing RNAi or knockout screens, but this is very resource intensive. Computational methods for essential gene...

Descripción completa

Detalles Bibliográficos
Autores principales: Aromolaran, Olufemi, Beder, Thomas, Oswald, Marcus, Oyelade, Jelili, Adebiyi, Ezekiel, Koenig, Rainer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096750/
https://www.ncbi.nlm.nih.gov/pubmed/32257045
http://dx.doi.org/10.1016/j.csbj.2020.02.022
_version_ 1783510902873522176
author Aromolaran, Olufemi
Beder, Thomas
Oswald, Marcus
Oyelade, Jelili
Adebiyi, Ezekiel
Koenig, Rainer
author_facet Aromolaran, Olufemi
Beder, Thomas
Oswald, Marcus
Oyelade, Jelili
Adebiyi, Ezekiel
Koenig, Rainer
author_sort Aromolaran, Olufemi
collection PubMed
description Genes are termed to be essential if their loss of function compromises viability or results in profound loss of fitness. On the genome scale, these genes can be determined experimentally employing RNAi or knockout screens, but this is very resource intensive. Computational methods for essential gene prediction can overcome this drawback, particularly when intrinsic (e.g. from the protein sequence) as well as extrinsic features (e.g. from transcription profiles) are considered. In this work, we employed machine learning to predict essential genes in Drosophila melanogaster. A total of 27,340 features were generated based on a large variety of different aspects comprising nucleotide and protein sequences, gene networks, protein-protein interactions, evolutionary conservation and functional annotations. Employing cross-validation, we obtained an excellent prediction performance. The best model achieved in D. melanogaster a ROC-AUC of 0.90, a PR-AUC of 0.30 and a F1 score of 0.34. Our approach considerably outperformed a benchmark method in which only features derived from the protein sequences were used (P < 0.001). Investigating which features contributed to this success, we found all categories of features, most prominently network topological, functional and sequence-based features. To evaluate our approach we performed the same workflow for essential gene prediction in human and achieved an ROC-AUC = 0.97, PR-AUC = 0.73, and F1 = 0.64. In summary, this study shows that using our well-elaborated assembly of features covering a broad range of intrinsic and extrinsic gene and protein features enabled intelligent systems to predict well the essentiality of genes in an organism.
format Online
Article
Text
id pubmed-7096750
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-70967502020-03-31 Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features Aromolaran, Olufemi Beder, Thomas Oswald, Marcus Oyelade, Jelili Adebiyi, Ezekiel Koenig, Rainer Comput Struct Biotechnol J Research Article Genes are termed to be essential if their loss of function compromises viability or results in profound loss of fitness. On the genome scale, these genes can be determined experimentally employing RNAi or knockout screens, but this is very resource intensive. Computational methods for essential gene prediction can overcome this drawback, particularly when intrinsic (e.g. from the protein sequence) as well as extrinsic features (e.g. from transcription profiles) are considered. In this work, we employed machine learning to predict essential genes in Drosophila melanogaster. A total of 27,340 features were generated based on a large variety of different aspects comprising nucleotide and protein sequences, gene networks, protein-protein interactions, evolutionary conservation and functional annotations. Employing cross-validation, we obtained an excellent prediction performance. The best model achieved in D. melanogaster a ROC-AUC of 0.90, a PR-AUC of 0.30 and a F1 score of 0.34. Our approach considerably outperformed a benchmark method in which only features derived from the protein sequences were used (P < 0.001). Investigating which features contributed to this success, we found all categories of features, most prominently network topological, functional and sequence-based features. To evaluate our approach we performed the same workflow for essential gene prediction in human and achieved an ROC-AUC = 0.97, PR-AUC = 0.73, and F1 = 0.64. In summary, this study shows that using our well-elaborated assembly of features covering a broad range of intrinsic and extrinsic gene and protein features enabled intelligent systems to predict well the essentiality of genes in an organism. Research Network of Computational and Structural Biotechnology 2020-03-10 /pmc/articles/PMC7096750/ /pubmed/32257045 http://dx.doi.org/10.1016/j.csbj.2020.02.022 Text en © 2020 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Aromolaran, Olufemi
Beder, Thomas
Oswald, Marcus
Oyelade, Jelili
Adebiyi, Ezekiel
Koenig, Rainer
Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features
title Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features
title_full Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features
title_fullStr Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features
title_full_unstemmed Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features
title_short Essential gene prediction in Drosophila melanogaster using machine learning approaches based on sequence and functional features
title_sort essential gene prediction in drosophila melanogaster using machine learning approaches based on sequence and functional features
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7096750/
https://www.ncbi.nlm.nih.gov/pubmed/32257045
http://dx.doi.org/10.1016/j.csbj.2020.02.022
work_keys_str_mv AT aromolaranolufemi essentialgenepredictionindrosophilamelanogasterusingmachinelearningapproachesbasedonsequenceandfunctionalfeatures
AT bederthomas essentialgenepredictionindrosophilamelanogasterusingmachinelearningapproachesbasedonsequenceandfunctionalfeatures
AT oswaldmarcus essentialgenepredictionindrosophilamelanogasterusingmachinelearningapproachesbasedonsequenceandfunctionalfeatures
AT oyeladejelili essentialgenepredictionindrosophilamelanogasterusingmachinelearningapproachesbasedonsequenceandfunctionalfeatures
AT adebiyiezekiel essentialgenepredictionindrosophilamelanogasterusingmachinelearningapproachesbasedonsequenceandfunctionalfeatures
AT koenigrainer essentialgenepredictionindrosophilamelanogasterusingmachinelearningapproachesbasedonsequenceandfunctionalfeatures