Cargando…

Adaptive bandwidth kernel density estimation for next-generation sequencing data

BACKGROUND: High-throughput sequencing experiments can be viewed as measuring some sort of a "genomic signal" that may represent a biological event such as the binding of a transcription factor to the genome, locations of chromatin modifications, or even a background or control condition....

Descripción completa

Detalles Bibliográficos
Autores principales: Ramachandran, Parameswaran, Perkins, Theodore J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4043421/
https://www.ncbi.nlm.nih.gov/pubmed/24564977
http://dx.doi.org/10.1186/1753-6561-7-S7-S7
_version_ 1782318901701378048
author Ramachandran, Parameswaran
Perkins, Theodore J
author_facet Ramachandran, Parameswaran
Perkins, Theodore J
author_sort Ramachandran, Parameswaran
collection PubMed
description BACKGROUND: High-throughput sequencing experiments can be viewed as measuring some sort of a "genomic signal" that may represent a biological event such as the binding of a transcription factor to the genome, locations of chromatin modifications, or even a background or control condition. Numerous algorithms have been developed to extract different kinds of information from such data. However, there has been very little focus on the reconstruction of the genomic signal itself. Such reconstructions may be useful for a variety of purposes ranging from simple visualization of the signals to sophisticated comparison of different datasets. METHODS: Here, we propose that adaptive-bandwidth kernel density estimators are well-suited for genomic signal reconstructions. This class of estimators is a natural extension of the fixed-bandwidth estimators that have been employed in several existing ChIP-Seq analysis programs. RESULTS: Using a set of ChIP-Seq datasets from the ENCODE project, we show that adaptive-bandwidth estimators have greater accuracy at signal reconstruction compared to fixed-bandwidth estimators, and that they have significant advantages in terms of visualization as well. For both fixed and adaptive-bandwidth schemes, we demonstrate that smoothing parameters can be set automatically using a held-out set of tuning data. We also carry out a computational complexity analysis of the different schemes and confirm through experimentation that the necessary computations can be readily carried out on a modern workstation without any significant issues.
format Online
Article
Text
id pubmed-4043421
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40434212014-06-17 Adaptive bandwidth kernel density estimation for next-generation sequencing data Ramachandran, Parameswaran Perkins, Theodore J BMC Proc Proceedings BACKGROUND: High-throughput sequencing experiments can be viewed as measuring some sort of a "genomic signal" that may represent a biological event such as the binding of a transcription factor to the genome, locations of chromatin modifications, or even a background or control condition. Numerous algorithms have been developed to extract different kinds of information from such data. However, there has been very little focus on the reconstruction of the genomic signal itself. Such reconstructions may be useful for a variety of purposes ranging from simple visualization of the signals to sophisticated comparison of different datasets. METHODS: Here, we propose that adaptive-bandwidth kernel density estimators are well-suited for genomic signal reconstructions. This class of estimators is a natural extension of the fixed-bandwidth estimators that have been employed in several existing ChIP-Seq analysis programs. RESULTS: Using a set of ChIP-Seq datasets from the ENCODE project, we show that adaptive-bandwidth estimators have greater accuracy at signal reconstruction compared to fixed-bandwidth estimators, and that they have significant advantages in terms of visualization as well. For both fixed and adaptive-bandwidth schemes, we demonstrate that smoothing parameters can be set automatically using a held-out set of tuning data. We also carry out a computational complexity analysis of the different schemes and confirm through experimentation that the necessary computations can be readily carried out on a modern workstation without any significant issues. BioMed Central 2013-12-20 /pmc/articles/PMC4043421/ /pubmed/24564977 http://dx.doi.org/10.1186/1753-6561-7-S7-S7 Text en Copyright © 2013 Ramachandran and Perkins; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Proceedings
Ramachandran, Parameswaran
Perkins, Theodore J
Adaptive bandwidth kernel density estimation for next-generation sequencing data
title Adaptive bandwidth kernel density estimation for next-generation sequencing data
title_full Adaptive bandwidth kernel density estimation for next-generation sequencing data
title_fullStr Adaptive bandwidth kernel density estimation for next-generation sequencing data
title_full_unstemmed Adaptive bandwidth kernel density estimation for next-generation sequencing data
title_short Adaptive bandwidth kernel density estimation for next-generation sequencing data
title_sort adaptive bandwidth kernel density estimation for next-generation sequencing data
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4043421/
https://www.ncbi.nlm.nih.gov/pubmed/24564977
http://dx.doi.org/10.1186/1753-6561-7-S7-S7
work_keys_str_mv AT ramachandranparameswaran adaptivebandwidthkerneldensityestimationfornextgenerationsequencingdata
AT perkinstheodorej adaptivebandwidthkerneldensityestimationfornextgenerationsequencingdata