Cargando…

A gene pattern mining algorithm using interchangeable gene sets for prokaryotes

BACKGROUND: Mining gene patterns that are common to multiple genomes is an important biological problem, which can lead us to novel biological insights. When family classification of genes is available, this problem is similar to the pattern mining problem in the data mining community. However, when...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Meng, Choi, Kwangmin, Su, Wei, Kim, Sun, Yang, Jiong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2279103/
https://www.ncbi.nlm.nih.gov/pubmed/18302784
http://dx.doi.org/10.1186/1471-2105-9-124
_version_ 1782152053030649856
author Hu, Meng
Choi, Kwangmin
Su, Wei
Kim, Sun
Yang, Jiong
author_facet Hu, Meng
Choi, Kwangmin
Su, Wei
Kim, Sun
Yang, Jiong
author_sort Hu, Meng
collection PubMed
description BACKGROUND: Mining gene patterns that are common to multiple genomes is an important biological problem, which can lead us to novel biological insights. When family classification of genes is available, this problem is similar to the pattern mining problem in the data mining community. However, when family classification information is not available, mining gene patterns is a challenging problem. There are several well developed algorithms for predicting gene patterns in a pair of genomes, such as FISH and DAGchainer. These algorithms use the optimization problem formulation which is solved using the dynamic programming technique. Unfortunately, extending these algorithms to multiple genome cases is not trivial due to the rapid increase in time and space complexity. RESULTS: In this paper, we propose a novel algorithm for mining gene patterns in more than two prokaryote genomes using interchangeable sets. The basic idea is to extend the pattern mining technique from the data mining community to handle the situation where family classification information is not available using interchangeable sets. In an experiment with four newly sequenced genomes (where the gene annotation is unavailable), we show that the gene pattern can capture important biological information. To examine the effectiveness of gene patterns further, we propose an ortholog prediction method based on our gene pattern mining algorithm and compare our method to the bi-directional best hit (BBH) technique in terms of COG orthologous gene classification information. The experiment show that our algorithm achieves a 3% increase in recall compared to BBH without sacrificing the precision of ortholog detection. CONCLUSION: The discovered gene patterns can be used for the detecting of ortholog and genes that collaborate for a common biological function.
format Text
id pubmed-2279103
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22791032008-04-03 A gene pattern mining algorithm using interchangeable gene sets for prokaryotes Hu, Meng Choi, Kwangmin Su, Wei Kim, Sun Yang, Jiong BMC Bioinformatics Research Article BACKGROUND: Mining gene patterns that are common to multiple genomes is an important biological problem, which can lead us to novel biological insights. When family classification of genes is available, this problem is similar to the pattern mining problem in the data mining community. However, when family classification information is not available, mining gene patterns is a challenging problem. There are several well developed algorithms for predicting gene patterns in a pair of genomes, such as FISH and DAGchainer. These algorithms use the optimization problem formulation which is solved using the dynamic programming technique. Unfortunately, extending these algorithms to multiple genome cases is not trivial due to the rapid increase in time and space complexity. RESULTS: In this paper, we propose a novel algorithm for mining gene patterns in more than two prokaryote genomes using interchangeable sets. The basic idea is to extend the pattern mining technique from the data mining community to handle the situation where family classification information is not available using interchangeable sets. In an experiment with four newly sequenced genomes (where the gene annotation is unavailable), we show that the gene pattern can capture important biological information. To examine the effectiveness of gene patterns further, we propose an ortholog prediction method based on our gene pattern mining algorithm and compare our method to the bi-directional best hit (BBH) technique in terms of COG orthologous gene classification information. The experiment show that our algorithm achieves a 3% increase in recall compared to BBH without sacrificing the precision of ortholog detection. CONCLUSION: The discovered gene patterns can be used for the detecting of ortholog and genes that collaborate for a common biological function. BioMed Central 2008-02-26 /pmc/articles/PMC2279103/ /pubmed/18302784 http://dx.doi.org/10.1186/1471-2105-9-124 Text en Copyright © 2008 Hu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Hu, Meng
Choi, Kwangmin
Su, Wei
Kim, Sun
Yang, Jiong
A gene pattern mining algorithm using interchangeable gene sets for prokaryotes
title A gene pattern mining algorithm using interchangeable gene sets for prokaryotes
title_full A gene pattern mining algorithm using interchangeable gene sets for prokaryotes
title_fullStr A gene pattern mining algorithm using interchangeable gene sets for prokaryotes
title_full_unstemmed A gene pattern mining algorithm using interchangeable gene sets for prokaryotes
title_short A gene pattern mining algorithm using interchangeable gene sets for prokaryotes
title_sort gene pattern mining algorithm using interchangeable gene sets for prokaryotes
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2279103/
https://www.ncbi.nlm.nih.gov/pubmed/18302784
http://dx.doi.org/10.1186/1471-2105-9-124
work_keys_str_mv AT humeng agenepatternminingalgorithmusinginterchangeablegenesetsforprokaryotes
AT choikwangmin agenepatternminingalgorithmusinginterchangeablegenesetsforprokaryotes
AT suwei agenepatternminingalgorithmusinginterchangeablegenesetsforprokaryotes
AT kimsun agenepatternminingalgorithmusinginterchangeablegenesetsforprokaryotes
AT yangjiong agenepatternminingalgorithmusinginterchangeablegenesetsforprokaryotes
AT humeng genepatternminingalgorithmusinginterchangeablegenesetsforprokaryotes
AT choikwangmin genepatternminingalgorithmusinginterchangeablegenesetsforprokaryotes
AT suwei genepatternminingalgorithmusinginterchangeablegenesetsforprokaryotes
AT kimsun genepatternminingalgorithmusinginterchangeablegenesetsforprokaryotes
AT yangjiong genepatternminingalgorithmusinginterchangeablegenesetsforprokaryotes