Cargando…

Towards the identification of essential genes using targeted genome sequencing and comparative analysis

BACKGROUND: The identification of genes essential for survival is of theoretical importance in the understanding of the minimal requirements for cellular life, and of practical importance in the identification of potential drug targets in novel pathogens. With the great time and expense required for...

Descripción completa

Detalles Bibliográficos
Autores principales:	Gustafson, Adam M, Snitkin, Evan S, Parker, Stephen CJ, DeLisi, Charles, Kasif, Simon
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1624830/ https://www.ncbi.nlm.nih.gov/pubmed/17052348 http://dx.doi.org/10.1186/1471-2164-7-265

_version_	1782130570540613632
author	Gustafson, Adam M Snitkin, Evan S Parker, Stephen CJ DeLisi, Charles Kasif, Simon
author_facet	Gustafson, Adam M Snitkin, Evan S Parker, Stephen CJ DeLisi, Charles Kasif, Simon
author_sort	Gustafson, Adam M
collection	PubMed
description	BACKGROUND: The identification of genes essential for survival is of theoretical importance in the understanding of the minimal requirements for cellular life, and of practical importance in the identification of potential drug targets in novel pathogens. With the great time and expense required for experimental studies aimed at constructing a catalog of essential genes in a given organism, a computational approach which could identify essential genes with high accuracy would be of great value. RESULTS: We gathered numerous features which could be generated automatically from genome sequence data and assessed their relationship to essentiality, and subsequently utilized machine learning to construct an integrated classifier of essential genes in both S. cerevisiae and E. coli. When looking at single features, phyletic retention, a measure of the number of organisms an ortholog is present in, was the most predictive of essentiality. Furthermore, during construction of our phyletic retention feature we for the first time explored the evolutionary relationship among the set of organisms in which the presence of a gene is most predictive of essentiality. We found that in both E. coli and S. cerevisiae the optimal sets always contain host-associated organisms with small genomes which are closely related to the reference. Using five optimally selected organisms, we were able to improve predictive accuracy as compared to using all available sequenced organisms. We hypothesize the predictive power of these genomes is a consequence of the process of reductive evolution, by which many parasites and symbionts evolved their gene content. In addition, essentiality is measured in rich media, a condition which resembles the environments of these organisms in their hosts where many nutrients are provided. Finally, we demonstrate that integration of our most highly predictive features using a probabilistic classifier resulted in accuracies surpassing any individual feature. CONCLUSION: Using features obtainable directly from sequence data, we were able to construct a classifier which can predict essential genes with high accuracy. Furthermore, our analysis of the set of genomes in which the presence of a gene is most predictive of essentiality may suggest ways in which targeted sequencing can be used in the identification of essential genes. In summary, the methods presented here can aid in the reduction of time and money invested in essential gene identification by targeting those genes for experimentation which are predicted as being essential with a high probability.
format	Text
id	pubmed-1624830
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-16248302006-10-26 Towards the identification of essential genes using targeted genome sequencing and comparative analysis Gustafson, Adam M Snitkin, Evan S Parker, Stephen CJ DeLisi, Charles Kasif, Simon BMC Genomics Research Article BACKGROUND: The identification of genes essential for survival is of theoretical importance in the understanding of the minimal requirements for cellular life, and of practical importance in the identification of potential drug targets in novel pathogens. With the great time and expense required for experimental studies aimed at constructing a catalog of essential genes in a given organism, a computational approach which could identify essential genes with high accuracy would be of great value. RESULTS: We gathered numerous features which could be generated automatically from genome sequence data and assessed their relationship to essentiality, and subsequently utilized machine learning to construct an integrated classifier of essential genes in both S. cerevisiae and E. coli. When looking at single features, phyletic retention, a measure of the number of organisms an ortholog is present in, was the most predictive of essentiality. Furthermore, during construction of our phyletic retention feature we for the first time explored the evolutionary relationship among the set of organisms in which the presence of a gene is most predictive of essentiality. We found that in both E. coli and S. cerevisiae the optimal sets always contain host-associated organisms with small genomes which are closely related to the reference. Using five optimally selected organisms, we were able to improve predictive accuracy as compared to using all available sequenced organisms. We hypothesize the predictive power of these genomes is a consequence of the process of reductive evolution, by which many parasites and symbionts evolved their gene content. In addition, essentiality is measured in rich media, a condition which resembles the environments of these organisms in their hosts where many nutrients are provided. Finally, we demonstrate that integration of our most highly predictive features using a probabilistic classifier resulted in accuracies surpassing any individual feature. CONCLUSION: Using features obtainable directly from sequence data, we were able to construct a classifier which can predict essential genes with high accuracy. Furthermore, our analysis of the set of genomes in which the presence of a gene is most predictive of essentiality may suggest ways in which targeted sequencing can be used in the identification of essential genes. In summary, the methods presented here can aid in the reduction of time and money invested in essential gene identification by targeting those genes for experimentation which are predicted as being essential with a high probability. BioMed Central 2006-10-19 /pmc/articles/PMC1624830/ /pubmed/17052348 http://dx.doi.org/10.1186/1471-2164-7-265 Text en Copyright © 2006 Gustafson et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Gustafson, Adam M Snitkin, Evan S Parker, Stephen CJ DeLisi, Charles Kasif, Simon Towards the identification of essential genes using targeted genome sequencing and comparative analysis
title	Towards the identification of essential genes using targeted genome sequencing and comparative analysis
title_full	Towards the identification of essential genes using targeted genome sequencing and comparative analysis
title_fullStr	Towards the identification of essential genes using targeted genome sequencing and comparative analysis
title_full_unstemmed	Towards the identification of essential genes using targeted genome sequencing and comparative analysis
title_short	Towards the identification of essential genes using targeted genome sequencing and comparative analysis
title_sort	towards the identification of essential genes using targeted genome sequencing and comparative analysis
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1624830/ https://www.ncbi.nlm.nih.gov/pubmed/17052348 http://dx.doi.org/10.1186/1471-2164-7-265
work_keys_str_mv	AT gustafsonadamm towardstheidentificationofessentialgenesusingtargetedgenomesequencingandcomparativeanalysis AT snitkinevans towardstheidentificationofessentialgenesusingtargetedgenomesequencingandcomparativeanalysis AT parkerstephencj towardstheidentificationofessentialgenesusingtargetedgenomesequencingandcomparativeanalysis AT delisicharles towardstheidentificationofessentialgenesusingtargetedgenomesequencingandcomparativeanalysis AT kasifsimon towardstheidentificationofessentialgenesusingtargetedgenomesequencingandcomparativeanalysis

Towards the identification of essential genes using targeted genome sequencing and comparative analysis

Ejemplares similares