Cargando…

Statistics for approximate gene clusters

BACKGROUND: Genes occurring co-localized in multiple genomes can be strong indicators for either functional constraints on the genome organization or remnant ancestral gene order. The computational detection of these patterns, which are usually referred to as gene clusters, has become increasingly s...

Descripción completa

Detalles Bibliográficos
Autores principales: Jahn, Katharina, Winter, Sascha, Stoye, Jens, Böcker, Sebastian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3908651/
https://www.ncbi.nlm.nih.gov/pubmed/24564620
http://dx.doi.org/10.1186/1471-2105-14-S15-S14
_version_ 1782301730391719936
author Jahn, Katharina
Winter, Sascha
Stoye, Jens
Böcker, Sebastian
author_facet Jahn, Katharina
Winter, Sascha
Stoye, Jens
Böcker, Sebastian
author_sort Jahn, Katharina
collection PubMed
description BACKGROUND: Genes occurring co-localized in multiple genomes can be strong indicators for either functional constraints on the genome organization or remnant ancestral gene order. The computational detection of these patterns, which are usually referred to as gene clusters, has become increasingly sensitive over the past decade. The most powerful approaches allow for various types of imperfect cluster conservation: Cluster locations may be internally rearranged. The individual cluster locations may contain only a subset of the cluster genes and may be disrupted by uninvolved genes. Moreover cluster locations may not at all occur in some or even most of the studied genomes. The detection of such low quality clusters increases the risk of mistaking faint patterns that occur merely by chance for genuine findings. Therefore, it is crucial to estimate the significance of computational gene cluster predictions and discriminate between true conservation and coincidental clustering. RESULTS: In this paper, we present an efficient and accurate approach to estimate the significance of gene cluster predictions under the approximate common intervals model. Given a single gene cluster prediction, we calculate the probability to observe it with the same or a higher degree of conservation under the null hypothesis of random gene order, and add a correction factor to account for multiple testing. Our approach considers all parameters that define the quality of gene cluster conservation: the number of genomes in which the cluster occurs, the number of involved genes, the degree of conservation in the different genomes, as well as the frequency of the clustered genes within each genome. We apply our approach to evaluate gene cluster predictions in a large set of well annotated genomes.
format Online
Article
Text
id pubmed-3908651
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-39086512014-02-13 Statistics for approximate gene clusters Jahn, Katharina Winter, Sascha Stoye, Jens Böcker, Sebastian BMC Bioinformatics Proceedings BACKGROUND: Genes occurring co-localized in multiple genomes can be strong indicators for either functional constraints on the genome organization or remnant ancestral gene order. The computational detection of these patterns, which are usually referred to as gene clusters, has become increasingly sensitive over the past decade. The most powerful approaches allow for various types of imperfect cluster conservation: Cluster locations may be internally rearranged. The individual cluster locations may contain only a subset of the cluster genes and may be disrupted by uninvolved genes. Moreover cluster locations may not at all occur in some or even most of the studied genomes. The detection of such low quality clusters increases the risk of mistaking faint patterns that occur merely by chance for genuine findings. Therefore, it is crucial to estimate the significance of computational gene cluster predictions and discriminate between true conservation and coincidental clustering. RESULTS: In this paper, we present an efficient and accurate approach to estimate the significance of gene cluster predictions under the approximate common intervals model. Given a single gene cluster prediction, we calculate the probability to observe it with the same or a higher degree of conservation under the null hypothesis of random gene order, and add a correction factor to account for multiple testing. Our approach considers all parameters that define the quality of gene cluster conservation: the number of genomes in which the cluster occurs, the number of involved genes, the degree of conservation in the different genomes, as well as the frequency of the clustered genes within each genome. We apply our approach to evaluate gene cluster predictions in a large set of well annotated genomes. BioMed Central 2013-12-13 /pmc/articles/PMC3908651/ /pubmed/24564620 http://dx.doi.org/10.1186/1471-2105-14-S15-S14 Text en Copyright © 2013 Jahn et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Jahn, Katharina
Winter, Sascha
Stoye, Jens
Böcker, Sebastian
Statistics for approximate gene clusters
title Statistics for approximate gene clusters
title_full Statistics for approximate gene clusters
title_fullStr Statistics for approximate gene clusters
title_full_unstemmed Statistics for approximate gene clusters
title_short Statistics for approximate gene clusters
title_sort statistics for approximate gene clusters
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3908651/
https://www.ncbi.nlm.nih.gov/pubmed/24564620
http://dx.doi.org/10.1186/1471-2105-14-S15-S14
work_keys_str_mv AT jahnkatharina statisticsforapproximategeneclusters
AT wintersascha statisticsforapproximategeneclusters
AT stoyejens statisticsforapproximategeneclusters
AT bockersebastian statisticsforapproximategeneclusters