Cargando…
Identifying essential genes across eukaryotes by machine learning
Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality inf...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8634067/ https://www.ncbi.nlm.nih.gov/pubmed/34859210 http://dx.doi.org/10.1093/nargab/lqab110 |
_version_ | 1784608060879142912 |
---|---|
author | Beder, Thomas Aromolaran, Olufemi Dönitz, Jürgen Tapanelli, Sofia Adedeji, Eunice O Adebiyi, Ezekiel Bucher, Gregor Koenig, Rainer |
author_facet | Beder, Thomas Aromolaran, Olufemi Dönitz, Jürgen Tapanelli, Sofia Adedeji, Eunice O Adebiyi, Ezekiel Bucher, Gregor Koenig, Rainer |
author_sort | Beder, Thomas |
collection | PubMed |
description | Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality information is obtained from studying single cells or whole, multi-cellular organisms, and particularly when derived from human cell line screens and human population studies. We employed machine learning across six model eukaryotes and 60 381 genes, using 41 635 features derived from the sequence, gene function information and network topology. Within a leave-one-organism-out cross-validation, the classifiers showed high generalizability with an average accuracy close to 80% in the left-out species. As a case study, we applied the method to Tribolium castaneum and Bombyx mori and validated predictions experimentally yielding similar performances. Finally, using the classifier based on the studied model organisms enabled linking the essentiality information of human cell line screens and population studies. |
format | Online Article Text |
id | pubmed-8634067 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-86340672021-12-01 Identifying essential genes across eukaryotes by machine learning Beder, Thomas Aromolaran, Olufemi Dönitz, Jürgen Tapanelli, Sofia Adedeji, Eunice O Adebiyi, Ezekiel Bucher, Gregor Koenig, Rainer NAR Genom Bioinform Standard Article Identifying essential genes on a genome scale is resource intensive and has been performed for only a few eukaryotes. For less studied organisms essentiality might be predicted by gene homology. However, this approach cannot be applied to non-conserved genes. Additionally, divergent essentiality information is obtained from studying single cells or whole, multi-cellular organisms, and particularly when derived from human cell line screens and human population studies. We employed machine learning across six model eukaryotes and 60 381 genes, using 41 635 features derived from the sequence, gene function information and network topology. Within a leave-one-organism-out cross-validation, the classifiers showed high generalizability with an average accuracy close to 80% in the left-out species. As a case study, we applied the method to Tribolium castaneum and Bombyx mori and validated predictions experimentally yielding similar performances. Finally, using the classifier based on the studied model organisms enabled linking the essentiality information of human cell line screens and population studies. Oxford University Press 2021-11-30 /pmc/articles/PMC8634067/ /pubmed/34859210 http://dx.doi.org/10.1093/nargab/lqab110 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Standard Article Beder, Thomas Aromolaran, Olufemi Dönitz, Jürgen Tapanelli, Sofia Adedeji, Eunice O Adebiyi, Ezekiel Bucher, Gregor Koenig, Rainer Identifying essential genes across eukaryotes by machine learning |
title | Identifying essential genes across eukaryotes by machine learning |
title_full | Identifying essential genes across eukaryotes by machine learning |
title_fullStr | Identifying essential genes across eukaryotes by machine learning |
title_full_unstemmed | Identifying essential genes across eukaryotes by machine learning |
title_short | Identifying essential genes across eukaryotes by machine learning |
title_sort | identifying essential genes across eukaryotes by machine learning |
topic | Standard Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8634067/ https://www.ncbi.nlm.nih.gov/pubmed/34859210 http://dx.doi.org/10.1093/nargab/lqab110 |
work_keys_str_mv | AT bederthomas identifyingessentialgenesacrosseukaryotesbymachinelearning AT aromolaranolufemi identifyingessentialgenesacrosseukaryotesbymachinelearning AT donitzjurgen identifyingessentialgenesacrosseukaryotesbymachinelearning AT tapanellisofia identifyingessentialgenesacrosseukaryotesbymachinelearning AT adedejieuniceo identifyingessentialgenesacrosseukaryotesbymachinelearning AT adebiyiezekiel identifyingessentialgenesacrosseukaryotesbymachinelearning AT buchergregor identifyingessentialgenesacrosseukaryotesbymachinelearning AT koenigrainer identifyingessentialgenesacrosseukaryotesbymachinelearning |