Cargando…

Identifying gene clusters by discovering common intervals in indeterminate strings

BACKGROUND: Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of i...

Descripción completa

Detalles Bibliográficos
Autores principales: Doerr, Daniel, Stoye, Jens, Böcker, Sebastian, Jahn, Katharina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4274641/
https://www.ncbi.nlm.nih.gov/pubmed/25571793
http://dx.doi.org/10.1186/1471-2164-15-S6-S2
_version_ 1782350008472829952
author Doerr, Daniel
Stoye, Jens
Böcker, Sebastian
Jahn, Katharina
author_facet Doerr, Daniel
Stoye, Jens
Böcker, Sebastian
Jahn, Katharina
author_sort Doerr, Daniel
collection PubMed
description BACKGROUND: Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Errors introduced in this process amplify in subsequent gene order analyses and thus may deteriorate gene cluster prediction. RESULTS: In this work, we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure of the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes. CONCLUSIONS: Our model is able to detect gene clusters that would be also detected with well-established gene family-based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.
format Online
Article
Text
id pubmed-4274641
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42746412015-01-02 Identifying gene clusters by discovering common intervals in indeterminate strings Doerr, Daniel Stoye, Jens Böcker, Sebastian Jahn, Katharina BMC Genomics Research BACKGROUND: Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Errors introduced in this process amplify in subsequent gene order analyses and thus may deteriorate gene cluster prediction. RESULTS: In this work, we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure of the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes. CONCLUSIONS: Our model is able to detect gene clusters that would be also detected with well-established gene family-based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments. BioMed Central 2014-10-17 /pmc/articles/PMC4274641/ /pubmed/25571793 http://dx.doi.org/10.1186/1471-2164-15-S6-S2 Text en Copyright © 2014 Doerr et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Doerr, Daniel
Stoye, Jens
Böcker, Sebastian
Jahn, Katharina
Identifying gene clusters by discovering common intervals in indeterminate strings
title Identifying gene clusters by discovering common intervals in indeterminate strings
title_full Identifying gene clusters by discovering common intervals in indeterminate strings
title_fullStr Identifying gene clusters by discovering common intervals in indeterminate strings
title_full_unstemmed Identifying gene clusters by discovering common intervals in indeterminate strings
title_short Identifying gene clusters by discovering common intervals in indeterminate strings
title_sort identifying gene clusters by discovering common intervals in indeterminate strings
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4274641/
https://www.ncbi.nlm.nih.gov/pubmed/25571793
http://dx.doi.org/10.1186/1471-2164-15-S6-S2
work_keys_str_mv AT doerrdaniel identifyinggeneclustersbydiscoveringcommonintervalsinindeterminatestrings
AT stoyejens identifyinggeneclustersbydiscoveringcommonintervalsinindeterminatestrings
AT bockersebastian identifyinggeneclustersbydiscoveringcommonintervalsinindeterminatestrings
AT jahnkatharina identifyinggeneclustersbydiscoveringcommonintervalsinindeterminatestrings