Cargando…

G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes

BACKGROUND: In previous studies, gene neighborhoods—spatial clusters of co-expressed genes in the genome—have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. In the current study, we developed a Gene Neigh...

Descripción completa

Detalles Bibliográficos
Autores principales: Lemay, Danielle G, Martin, William F, Hinrichs, Angie S, Rijnkels, Monique, German, J Bruce, Korf, Ian, Pollard, Katherine S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3575404/
https://www.ncbi.nlm.nih.gov/pubmed/23020263
http://dx.doi.org/10.1186/1471-2105-13-253
_version_ 1782259720776581120
author Lemay, Danielle G
Martin, William F
Hinrichs, Angie S
Rijnkels, Monique
German, J Bruce
Korf, Ian
Pollard, Katherine S
author_facet Lemay, Danielle G
Martin, William F
Hinrichs, Angie S
Rijnkels, Monique
German, J Bruce
Korf, Ian
Pollard, Katherine S
author_sort Lemay, Danielle G
collection PubMed
description BACKGROUND: In previous studies, gene neighborhoods—spatial clusters of co-expressed genes in the genome—have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. In the current study, we developed a Gene Neighborhood Scoring Tool (G-NEST) which combines genomic location, gene expression, and evolutionary sequence conservation data to score putative gene neighborhoods across all possible window sizes simultaneously. RESULTS: Using G-NEST on atlases of mouse and human tissue expression data, we found that large neighborhoods of ten or more genes are extremely rare in mammalian genomes. When they do occur, neighborhoods are typically composed of families of related genes. Both the highest scoring and the largest neighborhoods in mammalian genomes are formed by tandem gene duplication. Mammalian gene neighborhoods contain highly and variably expressed genes. Co-localized noisy gene pairs exhibit lower evolutionary conservation of their adjacent genome locations, suggesting that their shared transcriptional background may be disadvantageous. Genes that are essential to mammalian survival and reproduction are less likely to occur in neighborhoods, although neighborhoods are enriched with genes that function in mitosis. We also found that gene orientation and protein-protein interactions are partially responsible for maintenance of gene neighborhoods. CONCLUSIONS: Our experiments using G-NEST confirm that tandem gene duplication is the primary driver of non-random gene order in mammalian genomes. Non-essentiality, co-functionality, gene orientation, and protein-protein interactions are additional forces that maintain gene neighborhoods, especially those formed by tandem duplicates. We expect G-NEST to be useful for other applications such as the identification of core regulatory modules, common transcriptional backgrounds, and chromatin domains. The software is available at http://docpollard.org/software.html
format Online
Article
Text
id pubmed-3575404
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35754042013-02-19 G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes Lemay, Danielle G Martin, William F Hinrichs, Angie S Rijnkels, Monique German, J Bruce Korf, Ian Pollard, Katherine S BMC Bioinformatics Research Article BACKGROUND: In previous studies, gene neighborhoods—spatial clusters of co-expressed genes in the genome—have been defined using arbitrary rules such as requiring adjacency, a minimum number of genes, a fixed window size, or a minimum expression level. In the current study, we developed a Gene Neighborhood Scoring Tool (G-NEST) which combines genomic location, gene expression, and evolutionary sequence conservation data to score putative gene neighborhoods across all possible window sizes simultaneously. RESULTS: Using G-NEST on atlases of mouse and human tissue expression data, we found that large neighborhoods of ten or more genes are extremely rare in mammalian genomes. When they do occur, neighborhoods are typically composed of families of related genes. Both the highest scoring and the largest neighborhoods in mammalian genomes are formed by tandem gene duplication. Mammalian gene neighborhoods contain highly and variably expressed genes. Co-localized noisy gene pairs exhibit lower evolutionary conservation of their adjacent genome locations, suggesting that their shared transcriptional background may be disadvantageous. Genes that are essential to mammalian survival and reproduction are less likely to occur in neighborhoods, although neighborhoods are enriched with genes that function in mitosis. We also found that gene orientation and protein-protein interactions are partially responsible for maintenance of gene neighborhoods. CONCLUSIONS: Our experiments using G-NEST confirm that tandem gene duplication is the primary driver of non-random gene order in mammalian genomes. Non-essentiality, co-functionality, gene orientation, and protein-protein interactions are additional forces that maintain gene neighborhoods, especially those formed by tandem duplicates. We expect G-NEST to be useful for other applications such as the identification of core regulatory modules, common transcriptional backgrounds, and chromatin domains. The software is available at http://docpollard.org/software.html BioMed Central 2012-09-28 /pmc/articles/PMC3575404/ /pubmed/23020263 http://dx.doi.org/10.1186/1471-2105-13-253 Text en Copyright ©2012 Lemay et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Lemay, Danielle G
Martin, William F
Hinrichs, Angie S
Rijnkels, Monique
German, J Bruce
Korf, Ian
Pollard, Katherine S
G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes
title G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes
title_full G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes
title_fullStr G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes
title_full_unstemmed G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes
title_short G-NEST: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes
title_sort g-nest: a gene neighborhood scoring tool to identify co-conserved, co-expressed genes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3575404/
https://www.ncbi.nlm.nih.gov/pubmed/23020263
http://dx.doi.org/10.1186/1471-2105-13-253
work_keys_str_mv AT lemaydanielleg gnestageneneighborhoodscoringtooltoidentifycoconservedcoexpressedgenes
AT martinwilliamf gnestageneneighborhoodscoringtooltoidentifycoconservedcoexpressedgenes
AT hinrichsangies gnestageneneighborhoodscoringtooltoidentifycoconservedcoexpressedgenes
AT rijnkelsmonique gnestageneneighborhoodscoringtooltoidentifycoconservedcoexpressedgenes
AT germanjbruce gnestageneneighborhoodscoringtooltoidentifycoconservedcoexpressedgenes
AT korfian gnestageneneighborhoodscoringtooltoidentifycoconservedcoexpressedgenes
AT pollardkatherines gnestageneneighborhoodscoringtooltoidentifycoconservedcoexpressedgenes