Cargando…

WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures

BACKGROUND: An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed...

Descripción completa

Detalles Bibliográficos
Autores principales: Lichtenberg, Jens, Kurz, Kyle, Liang, Xiaoyu, Al-ouran, Rami, Neiman, Lev, Nau, Lee J, Welch, Joshua D, Jacox, Edwin, Bitterman, Thomas, Ecker, Klaus, Elnitski, Laura, Drews, Frank, Lee, Stephen Sauchi, Welch, Lonnie R
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040532/
https://www.ncbi.nlm.nih.gov/pubmed/21210985
http://dx.doi.org/10.1186/1471-2105-11-S12-S6
_version_ 1782198331174289408
author Lichtenberg, Jens
Kurz, Kyle
Liang, Xiaoyu
Al-ouran, Rami
Neiman, Lev
Nau, Lee J
Welch, Joshua D
Jacox, Edwin
Bitterman, Thomas
Ecker, Klaus
Elnitski, Laura
Drews, Frank
Lee, Stephen Sauchi
Welch, Lonnie R
author_facet Lichtenberg, Jens
Kurz, Kyle
Liang, Xiaoyu
Al-ouran, Rami
Neiman, Lev
Nau, Lee J
Welch, Joshua D
Jacox, Edwin
Bitterman, Thomas
Ecker, Klaus
Elnitski, Laura
Drews, Frank
Lee, Stephen Sauchi
Welch, Lonnie R
author_sort Lichtenberg, Jens
collection PubMed
description BACKGROUND: An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. METHODS: This manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model. RESULTS: A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center. CONCLUSION: WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data.
format Text
id pubmed-3040532
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30405322011-02-18 WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures Lichtenberg, Jens Kurz, Kyle Liang, Xiaoyu Al-ouran, Rami Neiman, Lev Nau, Lee J Welch, Joshua D Jacox, Edwin Bitterman, Thomas Ecker, Klaus Elnitski, Laura Drews, Frank Lee, Stephen Sauchi Welch, Lonnie R BMC Bioinformatics Proceedings BACKGROUND: An important focus of genomic science is the discovery and characterization of all functional elements within genomes. In silico methods are used in genome studies to discover putative regulatory genomic elements (called words or motifs). Although a number of methods have been developed for motif discovery, most of them lack the scalability needed to analyze large genomic data sets. METHODS: This manuscript presents WordSeeker, an enumerative motif discovery toolkit that utilizes multi-core and distributed computational platforms to enable scalable analysis of genomic data. A controller task coordinates activities of worker nodes, each of which (1) enumerates a subset of the DNA word space and (2) scores words with a distributed Markov chain model. RESULTS: A comprehensive suite of performance tests was conducted to demonstrate the performance, speedup and efficiency of WordSeeker. The scalability of the toolkit enabled the analysis of the entire genome of Arabidopsis thaliana; the results of the analysis were integrated into The Arabidopsis Gene Regulatory Information Server (AGRIS). A public version of WordSeeker was deployed on the Glenn cluster at the Ohio Supercomputer Center. CONCLUSION: WordSeeker effectively utilizes concurrent computing platforms to enable the identification of putative functional elements in genomic data sets. This capability facilitates the analysis of the large quantity of sequenced genomic data. BioMed Central 2010-12-21 /pmc/articles/PMC3040532/ /pubmed/21210985 http://dx.doi.org/10.1186/1471-2105-11-S12-S6 Text en Copyright ©2010 Lichtenberg et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Lichtenberg, Jens
Kurz, Kyle
Liang, Xiaoyu
Al-ouran, Rami
Neiman, Lev
Nau, Lee J
Welch, Joshua D
Jacox, Edwin
Bitterman, Thomas
Ecker, Klaus
Elnitski, Laura
Drews, Frank
Lee, Stephen Sauchi
Welch, Lonnie R
WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures
title WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures
title_full WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures
title_fullStr WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures
title_full_unstemmed WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures
title_short WordSeeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures
title_sort wordseeker: concurrent bioinformatics software for discovering genome-wide patterns and word-based genomic signatures
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040532/
https://www.ncbi.nlm.nih.gov/pubmed/21210985
http://dx.doi.org/10.1186/1471-2105-11-S12-S6
work_keys_str_mv AT lichtenbergjens wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT kurzkyle wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT liangxiaoyu wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT alouranrami wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT neimanlev wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT nauleej wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT welchjoshuad wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT jacoxedwin wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT bittermanthomas wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT eckerklaus wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT elnitskilaura wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT drewsfrank wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT leestephensauchi wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures
AT welchlonnier wordseekerconcurrentbioinformaticssoftwarefordiscoveringgenomewidepatternsandwordbasedgenomicsignatures