Cargando…

Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning

Defining genes that are essential for life has major implications for understanding critical biological processes and mechanisms. Although essential genes have been identified and characterised experimentally using functional genomic tools, it is challenging to predict with confidence such genes fro...

Descripción completa

Detalles Bibliográficos
Autores principales: Campos, Tulio L., Korhonen, Pasi K., Sternberg, Paul W., Gasser, Robin B., Young, Neil D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7251299/
https://www.ncbi.nlm.nih.gov/pubmed/32489524
http://dx.doi.org/10.1016/j.csbj.2020.05.008
_version_ 1783538936972312576
author Campos, Tulio L.
Korhonen, Pasi K.
Sternberg, Paul W.
Gasser, Robin B.
Young, Neil D.
author_facet Campos, Tulio L.
Korhonen, Pasi K.
Sternberg, Paul W.
Gasser, Robin B.
Young, Neil D.
author_sort Campos, Tulio L.
collection PubMed
description Defining genes that are essential for life has major implications for understanding critical biological processes and mechanisms. Although essential genes have been identified and characterised experimentally using functional genomic tools, it is challenging to predict with confidence such genes from molecular and phenomic data sets using computational methods. Using extensive data sets available for the model organism Caenorhabditis elegans, we constructed here a machine-learning (ML)-based workflow for the prediction of essential genes on a genome-wide scale. We identified strong predictors for such genes and showed that trained ML models consistently achieve highly-accurate classifications. Complementary analyses revealed an association between essential genes and chromosomal location. Our findings reveal that essential genes in C. elegans tend to be located in or near the centre of autosomal chromosomes; are positively correlated with low single nucleotide polymorphim (SNP) densities and epigenetic markers in promoter regions; are involved in protein and nucleotide processing; are transcribed in most cells; are enriched in reproductive tissues or are targets for small RNAs bound to the argonaut CSR-1. Based on these results, we hypothesise an interplay between epigenetic markers and small RNA pathways in the germline, with transcription-based memory; this hypothesis warrants testing. From a technical perspective, further work is needed to evaluate whether the present ML-based approach will be applicable to other metazoans (including Drosophila melanogaster) for which comprehensive data sets (i.e. genomic, transcriptomic, proteomic, variomic, epigenetic and phenomic) are available.
format Online
Article
Text
id pubmed-7251299
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-72512992020-06-01 Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning Campos, Tulio L. Korhonen, Pasi K. Sternberg, Paul W. Gasser, Robin B. Young, Neil D. Comput Struct Biotechnol J Research Article Defining genes that are essential for life has major implications for understanding critical biological processes and mechanisms. Although essential genes have been identified and characterised experimentally using functional genomic tools, it is challenging to predict with confidence such genes from molecular and phenomic data sets using computational methods. Using extensive data sets available for the model organism Caenorhabditis elegans, we constructed here a machine-learning (ML)-based workflow for the prediction of essential genes on a genome-wide scale. We identified strong predictors for such genes and showed that trained ML models consistently achieve highly-accurate classifications. Complementary analyses revealed an association between essential genes and chromosomal location. Our findings reveal that essential genes in C. elegans tend to be located in or near the centre of autosomal chromosomes; are positively correlated with low single nucleotide polymorphim (SNP) densities and epigenetic markers in promoter regions; are involved in protein and nucleotide processing; are transcribed in most cells; are enriched in reproductive tissues or are targets for small RNAs bound to the argonaut CSR-1. Based on these results, we hypothesise an interplay between epigenetic markers and small RNA pathways in the germline, with transcription-based memory; this hypothesis warrants testing. From a technical perspective, further work is needed to evaluate whether the present ML-based approach will be applicable to other metazoans (including Drosophila melanogaster) for which comprehensive data sets (i.e. genomic, transcriptomic, proteomic, variomic, epigenetic and phenomic) are available. Research Network of Computational and Structural Biotechnology 2020-05-15 /pmc/articles/PMC7251299/ /pubmed/32489524 http://dx.doi.org/10.1016/j.csbj.2020.05.008 Text en © 2020 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Campos, Tulio L.
Korhonen, Pasi K.
Sternberg, Paul W.
Gasser, Robin B.
Young, Neil D.
Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning
title Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning
title_full Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning
title_fullStr Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning
title_full_unstemmed Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning
title_short Predicting gene essentiality in Caenorhabditis elegans by feature engineering and machine-learning
title_sort predicting gene essentiality in caenorhabditis elegans by feature engineering and machine-learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7251299/
https://www.ncbi.nlm.nih.gov/pubmed/32489524
http://dx.doi.org/10.1016/j.csbj.2020.05.008
work_keys_str_mv AT campostuliol predictinggeneessentialityincaenorhabditiselegansbyfeatureengineeringandmachinelearning
AT korhonenpasik predictinggeneessentialityincaenorhabditiselegansbyfeatureengineeringandmachinelearning
AT sternbergpaulw predictinggeneessentialityincaenorhabditiselegansbyfeatureengineeringandmachinelearning
AT gasserrobinb predictinggeneessentialityincaenorhabditiselegansbyfeatureengineeringandmachinelearning
AT youngneild predictinggeneessentialityincaenorhabditiselegansbyfeatureengineeringandmachinelearning