Cargando…

An efficient pseudomedian filter for tiling microrrays

BACKGROUND: Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several mod...

Descripción completa

Detalles Bibliográficos
Autores principales: Royce, Thomas E, Carriero, Nicholas J, Gerstein, Mark B
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1913926/
https://www.ncbi.nlm.nih.gov/pubmed/17555595
http://dx.doi.org/10.1186/1471-2105-8-186
_version_ 1782134091247779840
author Royce, Thomas E
Carriero, Nicholas J
Gerstein, Mark B
author_facet Royce, Thomas E
Carriero, Nicholas J
Gerstein, Mark B
author_sort Royce, Thomas E
collection PubMed
description BACKGROUND: Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a O(n(2)logn) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution. RESULTS: We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of n numbers to O(nlogn) from O(n(2)logn). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple O(log n) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets. CONCLUSION: Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at .
format Text
id pubmed-1913926
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19139262007-07-11 An efficient pseudomedian filter for tiling microrrays Royce, Thomas E Carriero, Nicholas J Gerstein, Mark B BMC Bioinformatics Methodology Article BACKGROUND: Tiling microarrays are becoming an essential technology in the functional genomics toolbox. They have been applied to the tasks of novel transcript identification, elucidation of transcription factor binding sites, detection of methylated DNA and several other applications in several model organisms. These experiments are being conducted at increasingly finer resolutions as the microarray technology enjoys increasingly greater feature densities. The increased densities naturally lead to increased data analysis requirements. Specifically, the most widely employed algorithm for tiling array analysis involves smoothing observed signals by computing pseudomedians within sliding windows, a O(n(2)logn) calculation in each window. This poor time complexity is an issue for tiling array analysis and could prove to be a real bottleneck as tiling microarray experiments become grander in scope and finer in resolution. RESULTS: We therefore implemented Monahan's HLQEST algorithm that reduces the runtime complexity for computing the pseudomedian of n numbers to O(nlogn) from O(n(2)logn). For a representative tiling microarray dataset, this modification reduced the smoothing procedure's runtime by nearly 90%. We then leveraged the fact that elements within sliding windows remain largely unchanged in overlapping windows (as one slides across genomic space) to further reduce computation by an additional 43%. This was achieved by the application of skip lists to maintaining a sorted list of values from window to window. This sorted list could be maintained with simple O(log n) inserts and deletes. We illustrate the favorable scaling properties of our algorithms with both time complexity analysis and benchmarking on synthetic datasets. CONCLUSION: Tiling microarray analyses that rely upon a sliding window pseudomedian calculation can require many hours of computation. We have eased this requirement significantly by implementing efficient algorithms that scale well with genomic feature density. This result not only speeds the current standard analyses, but also makes possible ones where many iterations of the filter may be required, such as might be required in a bootstrap or parameter estimation setting. Source code and executables are available at . BioMed Central 2007-06-07 /pmc/articles/PMC1913926/ /pubmed/17555595 http://dx.doi.org/10.1186/1471-2105-8-186 Text en Copyright © 2007 Royce et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Royce, Thomas E
Carriero, Nicholas J
Gerstein, Mark B
An efficient pseudomedian filter for tiling microrrays
title An efficient pseudomedian filter for tiling microrrays
title_full An efficient pseudomedian filter for tiling microrrays
title_fullStr An efficient pseudomedian filter for tiling microrrays
title_full_unstemmed An efficient pseudomedian filter for tiling microrrays
title_short An efficient pseudomedian filter for tiling microrrays
title_sort efficient pseudomedian filter for tiling microrrays
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1913926/
https://www.ncbi.nlm.nih.gov/pubmed/17555595
http://dx.doi.org/10.1186/1471-2105-8-186
work_keys_str_mv AT roycethomase anefficientpseudomedianfilterfortilingmicrorrays
AT carrieronicholasj anefficientpseudomedianfilterfortilingmicrorrays
AT gersteinmarkb anefficientpseudomedianfilterfortilingmicrorrays
AT roycethomase efficientpseudomedianfilterfortilingmicrorrays
AT carrieronicholasj efficientpseudomedianfilterfortilingmicrorrays
AT gersteinmarkb efficientpseudomedianfilterfortilingmicrorrays