Cargando…

An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features()

The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukary...

Descripción completa

Detalles Bibliográficos
Autores principales: Campos, Tulio L., Korhonen, Pasi K., Gasser, Robin B., Young, Neil D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6607062/
https://www.ncbi.nlm.nih.gov/pubmed/31312416
http://dx.doi.org/10.1016/j.csbj.2019.05.008
_version_ 1783432019369263104
author Campos, Tulio L.
Korhonen, Pasi K.
Gasser, Robin B.
Young, Neil D.
author_facet Campos, Tulio L.
Korhonen, Pasi K.
Gasser, Robin B.
Young, Neil D.
author_sort Campos, Tulio L.
collection PubMed
description The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs among selected species of eukaryotes, and then employed a systematic machine-learning approach, using protein sequence-derived features and selection procedures, to investigate essential gene predictions within and among species. We showed that the numbers of essential gene orthologs comprise small fractions when compared with the total number of orthologs among the eukaryotic species studied. In addition, we demonstrated that machine-learning models trained with subsets of essentiality-related data performed better than random guessing of gene essentiality for a particular species. Consistent with our gene ortholog analysis, the predictions of essential genes among multiple (including distantly-related) species is possible, yet challenging, suggesting that most essential genes are unique to a species. The present work provides a foundation for the expansion of genome-wide essentiality investigations in eukaryotes using machine learning approaches.
format Online
Article
Text
id pubmed-6607062
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-66070622019-07-16 An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features() Campos, Tulio L. Korhonen, Pasi K. Gasser, Robin B. Young, Neil D. Comput Struct Biotechnol J Research Article The availability of whole-genome sequences and associated multi-omics data sets, combined with advances in gene knockout and knockdown methods, has enabled large-scale annotation and exploration of gene and protein functions in eukaryotes. Knowing which genes are essential for the survival of eukaryotic organisms is paramount for an understanding of the basic mechanisms of life, and could assist in identifying intervention targets in eukaryotic pathogens and cancer. Here, we studied essential gene orthologs among selected species of eukaryotes, and then employed a systematic machine-learning approach, using protein sequence-derived features and selection procedures, to investigate essential gene predictions within and among species. We showed that the numbers of essential gene orthologs comprise small fractions when compared with the total number of orthologs among the eukaryotic species studied. In addition, we demonstrated that machine-learning models trained with subsets of essentiality-related data performed better than random guessing of gene essentiality for a particular species. Consistent with our gene ortholog analysis, the predictions of essential genes among multiple (including distantly-related) species is possible, yet challenging, suggesting that most essential genes are unique to a species. The present work provides a foundation for the expansion of genome-wide essentiality investigations in eukaryotes using machine learning approaches. Research Network of Computational and Structural Biotechnology 2019-06-08 /pmc/articles/PMC6607062/ /pubmed/31312416 http://dx.doi.org/10.1016/j.csbj.2019.05.008 Text en © 2019 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Campos, Tulio L.
Korhonen, Pasi K.
Gasser, Robin B.
Young, Neil D.
An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features()
title An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features()
title_full An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features()
title_fullStr An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features()
title_full_unstemmed An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features()
title_short An Evaluation of Machine Learning Approaches for the Prediction of Essential Genes in Eukaryotes Using Protein Sequence-Derived Features()
title_sort evaluation of machine learning approaches for the prediction of essential genes in eukaryotes using protein sequence-derived features()
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6607062/
https://www.ncbi.nlm.nih.gov/pubmed/31312416
http://dx.doi.org/10.1016/j.csbj.2019.05.008
work_keys_str_mv AT campostuliol anevaluationofmachinelearningapproachesforthepredictionofessentialgenesineukaryotesusingproteinsequencederivedfeatures
AT korhonenpasik anevaluationofmachinelearningapproachesforthepredictionofessentialgenesineukaryotesusingproteinsequencederivedfeatures
AT gasserrobinb anevaluationofmachinelearningapproachesforthepredictionofessentialgenesineukaryotesusingproteinsequencederivedfeatures
AT youngneild anevaluationofmachinelearningapproachesforthepredictionofessentialgenesineukaryotesusingproteinsequencederivedfeatures
AT campostuliol evaluationofmachinelearningapproachesforthepredictionofessentialgenesineukaryotesusingproteinsequencederivedfeatures
AT korhonenpasik evaluationofmachinelearningapproachesforthepredictionofessentialgenesineukaryotesusingproteinsequencederivedfeatures
AT gasserrobinb evaluationofmachinelearningapproachesforthepredictionofessentialgenesineukaryotesusingproteinsequencederivedfeatures
AT youngneild evaluationofmachinelearningapproachesforthepredictionofessentialgenesineukaryotesusingproteinsequencederivedfeatures