Cargando…

Identifying essential genes across eukaryotes by machine learning

Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality inf...

Descripción completa

Detalles Bibliográficos
Autores principales: Beder, Thomas, Aromolaran, Olufemi, Dönitz, Jürgen, Tapanelli, Sofia, Adedeji, Eunice O, Adebiyi, Ezekiel, Bucher, Gregor, Koenig, Rainer
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8634067/
https://www.ncbi.nlm.nih.gov/pubmed/34859210
http://dx.doi.org/10.1093/nargab/lqab110
_version_ 1784608060879142912
author Beder, Thomas
Aromolaran, Olufemi
Dönitz, Jürgen
Tapanelli, Sofia
Adedeji, Eunice O
Adebiyi, Ezekiel
Bucher, Gregor
Koenig, Rainer
author_facet Beder, Thomas
Aromolaran, Olufemi
Dönitz, Jürgen
Tapanelli, Sofia
Adedeji, Eunice O
Adebiyi, Ezekiel
Bucher, Gregor
Koenig, Rainer
author_sort Beder, Thomas
collection PubMed
description Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality information is obtained from studying single cells or whole, multi-cellular organisms, and particularly when derived from human cell line screens and human population studies. We employed machine learning across six model eukaryotes and 60 381 genes, using 41 635 features derived from the sequence, gene function information and network topology. Within a leave-one-organism-out cross-validation, the classifiers showed high generalizability with an average accuracy close to 80% in the left-out species. As a case study, we applied the method to Tribolium castaneum and Bombyx mori and validated predictions experimentally yielding similar performances. Finally, using the classifier based on the studied model organisms enabled linking the essentiality information of human cell line screens and population studies.
format Online
Article
Text
id pubmed-8634067
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-86340672021-12-01 Identifying essential genes across eukaryotes by machine learning Beder, Thomas Aromolaran, Olufemi Dönitz, Jürgen Tapanelli, Sofia Adedeji, Eunice O Adebiyi, Ezekiel Bucher, Gregor Koenig, Rainer NAR Genom Bioinform Standard Article Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality information is obtained from studying single cells or whole, multi-cellular organisms, and particularly when derived from human cell line screens and human population studies. We employed machine learning across six model eukaryotes and 60 381 genes, using 41 635 features derived from the sequence, gene function information and network topology. Within a leave-one-organism-out cross-validation, the classifiers showed high generalizability with an average accuracy close to 80% in the left-out species. As a case study, we applied the method to Tribolium castaneum and Bombyx mori and validated predictions experimentally yielding similar performances. Finally, using the classifier based on the studied model organisms enabled linking the essentiality information of human cell line screens and population studies. Oxford University Press 2021-11-30 /pmc/articles/PMC8634067/ /pubmed/34859210 http://dx.doi.org/10.1093/nargab/lqab110 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Beder, Thomas
Aromolaran, Olufemi
Dönitz, Jürgen
Tapanelli, Sofia
Adedeji, Eunice O
Adebiyi, Ezekiel
Bucher, Gregor
Koenig, Rainer
Identifying essential genes across eukaryotes by machine learning
title Identifying essential genes across eukaryotes by machine learning
title_full Identifying essential genes across eukaryotes by machine learning
title_fullStr Identifying essential genes across eukaryotes by machine learning
title_full_unstemmed Identifying essential genes across eukaryotes by machine learning
title_short Identifying essential genes across eukaryotes by machine learning
title_sort identifying essential genes across eukaryotes by machine learning
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8634067/
https://www.ncbi.nlm.nih.gov/pubmed/34859210
http://dx.doi.org/10.1093/nargab/lqab110
work_keys_str_mv AT bederthomas identifyingessentialgenesacrosseukaryotesbymachinelearning
AT aromolaranolufemi identifyingessentialgenesacrosseukaryotesbymachinelearning
AT donitzjurgen identifyingessentialgenesacrosseukaryotesbymachinelearning
AT tapanellisofia identifyingessentialgenesacrosseukaryotesbymachinelearning
AT adedejieuniceo identifyingessentialgenesacrosseukaryotesbymachinelearning
AT adebiyiezekiel identifyingessentialgenesacrosseukaryotesbymachinelearning
AT buchergregor identifyingessentialgenesacrosseukaryotesbymachinelearning
AT koenigrainer identifyingessentialgenesacrosseukaryotesbymachinelearning