Cargando…

Identifying gene clusters by discovering common intervals in indeterminate strings

BACKGROUND: Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Doerr, Daniel, Stoye, Jens, Böcker, Sebastian, Jahn, Katharina
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4274641/ https://www.ncbi.nlm.nih.gov/pubmed/25571793 http://dx.doi.org/10.1186/1471-2164-15-S6-S2

_version_	1782350008472829952
author	Doerr, Daniel Stoye, Jens Böcker, Sebastian Jahn, Katharina
author_facet	Doerr, Daniel Stoye, Jens Böcker, Sebastian Jahn, Katharina
author_sort	Doerr, Daniel
collection	PubMed
description	BACKGROUND: Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Errors introduced in this process amplify in subsequent gene order analyses and thus may deteriorate gene cluster prediction. RESULTS: In this work, we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure of the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes. CONCLUSIONS: Our model is able to detect gene clusters that would be also detected with well-established gene family-based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments.
format	Online Article Text
id	pubmed-4274641
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42746412015-01-02 Identifying gene clusters by discovering common intervals in indeterminate strings Doerr, Daniel Stoye, Jens Böcker, Sebastian Jahn, Katharina BMC Genomics Research BACKGROUND: Comparative analyses of chromosomal gene orders are successfully used to predict gene clusters in bacterial and fungal genomes. Present models for detecting sets of co-localized genes in chromosomal sequences require prior knowledge of gene family assignments of genes in the dataset of interest. These families are often computationally predicted on the basis of sequence similarity or higher order features of gene products. Errors introduced in this process amplify in subsequent gene order analyses and thus may deteriorate gene cluster prediction. RESULTS: In this work, we present a new dynamic model and efficient computational approaches for gene cluster prediction suitable in scenarios ranging from traditional gene family-based gene cluster prediction, via multiple conflicting gene family annotations, to gene family-free analysis, in which gene clusters are predicted solely on the basis of a pairwise similarity measure of the genes of different genomes. We evaluate our gene family-free model against a gene family-based model on a dataset of 93 bacterial genomes. CONCLUSIONS: Our model is able to detect gene clusters that would be also detected with well-established gene family-based approaches. Moreover, we show that it is able to detect conserved regions which are missed by gene family-based methods due to wrong or deficient gene family assignments. BioMed Central 2014-10-17 /pmc/articles/PMC4274641/ /pubmed/25571793 http://dx.doi.org/10.1186/1471-2164-15-S6-S2 Text en Copyright © 2014 Doerr et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Doerr, Daniel Stoye, Jens Böcker, Sebastian Jahn, Katharina Identifying gene clusters by discovering common intervals in indeterminate strings
title	Identifying gene clusters by discovering common intervals in indeterminate strings
title_full	Identifying gene clusters by discovering common intervals in indeterminate strings
title_fullStr	Identifying gene clusters by discovering common intervals in indeterminate strings
title_full_unstemmed	Identifying gene clusters by discovering common intervals in indeterminate strings
title_short	Identifying gene clusters by discovering common intervals in indeterminate strings
title_sort	identifying gene clusters by discovering common intervals in indeterminate strings
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4274641/ https://www.ncbi.nlm.nih.gov/pubmed/25571793 http://dx.doi.org/10.1186/1471-2164-15-S6-S2
work_keys_str_mv	AT doerrdaniel identifyinggeneclustersbydiscoveringcommonintervalsinindeterminatestrings AT stoyejens identifyinggeneclustersbydiscoveringcommonintervalsinindeterminatestrings AT bockersebastian identifyinggeneclustersbydiscoveringcommonintervalsinindeterminatestrings AT jahnkatharina identifyinggeneclustersbydiscoveringcommonintervalsinindeterminatestrings

Identifying gene clusters by discovering common intervals in indeterminate strings

Ejemplares similares