Cargando…

Identifying mouse developmental essential genes using machine learning

The genes that are required for organismal survival are annotated as ‘essential genes’. Identifying all the essential genes of an animal species can reveal critical functions that are needed during the development of the organism. To inform studies on mouse development, we developed a supervised mac...

Descripción completa

Detalles Bibliográficos
Autores principales: Tian, David, Wenlock, Stephanie, Kabir, Mitra, Tzotzos, George, Doig, Andrew J., Hentges, Kathryn E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Company of Biologists Ltd 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6307915/
https://www.ncbi.nlm.nih.gov/pubmed/30563825
http://dx.doi.org/10.1242/dmm.034546
_version_ 1783383094087122944
author Tian, David
Wenlock, Stephanie
Kabir, Mitra
Tzotzos, George
Doig, Andrew J.
Hentges, Kathryn E.
author_facet Tian, David
Wenlock, Stephanie
Kabir, Mitra
Tzotzos, George
Doig, Andrew J.
Hentges, Kathryn E.
author_sort Tian, David
collection PubMed
description The genes that are required for organismal survival are annotated as ‘essential genes’. Identifying all the essential genes of an animal species can reveal critical functions that are needed during the development of the organism. To inform studies on mouse development, we developed a supervised machine learning classifier based on phenotype data from mouse knockout experiments. We used this classifier to predict the essentiality of mouse genes lacking experimental data. Validation of our predictions against a blind test set of recent mouse knockout experimental data indicated a high level of accuracy (>80%). We also validated our predictions for other mouse mutagenesis methodologies, demonstrating that the predictions are accurate for lethal phenotypes isolated in random chemical mutagenesis screens and embryonic stem cell screens. The biological functions that are enriched in essential and non-essential genes have been identified, showing that essential genes tend to encode intracellular proteins that interact with nucleic acids. The genome distribution of predicted essential and non-essential genes was analysed, demonstrating that the density of essential genes varies throughout the genome. A comparison with human essential and non-essential genes was performed, revealing conservation between human and mouse gene essentiality status. Our genome-wide predictions of mouse essential genes will be of value for the planning of mouse knockout experiments and phenotyping assays, for understanding the functional processes required during mouse development, and for the prioritisation of disease candidate genes identified in human genome and exome sequence datasets.
format Online
Article
Text
id pubmed-6307915
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher The Company of Biologists Ltd
record_format MEDLINE/PubMed
spelling pubmed-63079152018-12-28 Identifying mouse developmental essential genes using machine learning Tian, David Wenlock, Stephanie Kabir, Mitra Tzotzos, George Doig, Andrew J. Hentges, Kathryn E. Dis Model Mech Resource Article The genes that are required for organismal survival are annotated as ‘essential genes’. Identifying all the essential genes of an animal species can reveal critical functions that are needed during the development of the organism. To inform studies on mouse development, we developed a supervised machine learning classifier based on phenotype data from mouse knockout experiments. We used this classifier to predict the essentiality of mouse genes lacking experimental data. Validation of our predictions against a blind test set of recent mouse knockout experimental data indicated a high level of accuracy (>80%). We also validated our predictions for other mouse mutagenesis methodologies, demonstrating that the predictions are accurate for lethal phenotypes isolated in random chemical mutagenesis screens and embryonic stem cell screens. The biological functions that are enriched in essential and non-essential genes have been identified, showing that essential genes tend to encode intracellular proteins that interact with nucleic acids. The genome distribution of predicted essential and non-essential genes was analysed, demonstrating that the density of essential genes varies throughout the genome. A comparison with human essential and non-essential genes was performed, revealing conservation between human and mouse gene essentiality status. Our genome-wide predictions of mouse essential genes will be of value for the planning of mouse knockout experiments and phenotyping assays, for understanding the functional processes required during mouse development, and for the prioritisation of disease candidate genes identified in human genome and exome sequence datasets. The Company of Biologists Ltd 2018-12-01 2018-12-13 /pmc/articles/PMC6307915/ /pubmed/30563825 http://dx.doi.org/10.1242/dmm.034546 Text en © 2018. Published by The Company of Biologists Ltd http://creativecommons.org/licenses/by/4.0This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.
spellingShingle Resource Article
Tian, David
Wenlock, Stephanie
Kabir, Mitra
Tzotzos, George
Doig, Andrew J.
Hentges, Kathryn E.
Identifying mouse developmental essential genes using machine learning
title Identifying mouse developmental essential genes using machine learning
title_full Identifying mouse developmental essential genes using machine learning
title_fullStr Identifying mouse developmental essential genes using machine learning
title_full_unstemmed Identifying mouse developmental essential genes using machine learning
title_short Identifying mouse developmental essential genes using machine learning
title_sort identifying mouse developmental essential genes using machine learning
topic Resource Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6307915/
https://www.ncbi.nlm.nih.gov/pubmed/30563825
http://dx.doi.org/10.1242/dmm.034546
work_keys_str_mv AT tiandavid identifyingmousedevelopmentalessentialgenesusingmachinelearning
AT wenlockstephanie identifyingmousedevelopmentalessentialgenesusingmachinelearning
AT kabirmitra identifyingmousedevelopmentalessentialgenesusingmachinelearning
AT tzotzosgeorge identifyingmousedevelopmentalessentialgenesusingmachinelearning
AT doigandrewj identifyingmousedevelopmentalessentialgenesusingmachinelearning
AT hentgeskathryne identifyingmousedevelopmentalessentialgenesusingmachinelearning