Cargando…

Defining window-boundaries for genomic analyses using smoothing spline techniques

BACKGROUND: High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of test...

Descripción completa

Detalles Bibliográficos
Autores principales:	Beissinger, Timothy M, Rosa, Guilherme JM, Kaeppler, Shawn M, Gianola, Daniel, de Leon, Natalia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4404117/ https://www.ncbi.nlm.nih.gov/pubmed/25928167 http://dx.doi.org/10.1186/s12711-015-0105-9

_version_	1782367449405980672
author	Beissinger, Timothy M Rosa, Guilherme JM Kaeppler, Shawn M Gianola, Daniel de Leon, Natalia
author_facet	Beissinger, Timothy M Rosa, Guilherme JM Kaeppler, Shawn M Gianola, Daniel de Leon, Natalia
author_sort	Beissinger, Timothy M
collection	PubMed
description	BACKGROUND: High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the data and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome. RESULTS: Simulations applying this method were performed to identify selection signatures from pooled sequencing F(ST) data, for which allele frequencies were estimated from a pool of individuals. The relative ratio of true to false positives was twice that generated by existing techniques. A comparison of the approach to a previous study that involved pooled sequencing F(ST) data from maize suggested that outlying windows were more clearly separated from their neighbors than when using a standard sliding window approach. CONCLUSIONS: We have developed a novel technique to identify window boundaries for subsequent analysis protocols. When applied to selection studies based on F(ST) data, this method provides a high discovery rate and minimizes false positives. The method is implemented in the R package GenWin, which is publicly available from CRAN.
format	Online Article Text
id	pubmed-4404117
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-44041172015-04-21 Defining window-boundaries for genomic analyses using smoothing spline techniques Beissinger, Timothy M Rosa, Guilherme JM Kaeppler, Shawn M Gianola, Daniel de Leon, Natalia Genet Sel Evol Research BACKGROUND: High-density genomic data is often analyzed by combining information over windows of adjacent markers. Interpretation of data grouped in windows versus at individual locations may increase statistical power, simplify computation, reduce sampling noise, and reduce the total number of tests performed. However, use of adjacent marker information can result in over- or under-smoothing, undesirable window boundary specifications, or highly correlated test statistics. We introduce a method for defining windows based on statistically guided breakpoints in the data, as a foundation for the analysis of multiple adjacent data points. This method involves first fitting a cubic smoothing spline to the data and then identifying the inflection points of the fitted spline, which serve as the boundaries of adjacent windows. This technique does not require prior knowledge of linkage disequilibrium, and therefore can be applied to data collected from individual or pooled sequencing experiments. Moreover, in contrast to existing methods, an arbitrary choice of window size is not necessary, since these are determined empirically and allowed to vary along the genome. RESULTS: Simulations applying this method were performed to identify selection signatures from pooled sequencing F(ST) data, for which allele frequencies were estimated from a pool of individuals. The relative ratio of true to false positives was twice that generated by existing techniques. A comparison of the approach to a previous study that involved pooled sequencing F(ST) data from maize suggested that outlying windows were more clearly separated from their neighbors than when using a standard sliding window approach. CONCLUSIONS: We have developed a novel technique to identify window boundaries for subsequent analysis protocols. When applied to selection studies based on F(ST) data, this method provides a high discovery rate and minimizes false positives. The method is implemented in the R package GenWin, which is publicly available from CRAN. BioMed Central 2015-04-17 /pmc/articles/PMC4404117/ /pubmed/25928167 http://dx.doi.org/10.1186/s12711-015-0105-9 Text en © Beissinger et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Beissinger, Timothy M Rosa, Guilherme JM Kaeppler, Shawn M Gianola, Daniel de Leon, Natalia Defining window-boundaries for genomic analyses using smoothing spline techniques
title	Defining window-boundaries for genomic analyses using smoothing spline techniques
title_full	Defining window-boundaries for genomic analyses using smoothing spline techniques
title_fullStr	Defining window-boundaries for genomic analyses using smoothing spline techniques
title_full_unstemmed	Defining window-boundaries for genomic analyses using smoothing spline techniques
title_short	Defining window-boundaries for genomic analyses using smoothing spline techniques
title_sort	defining window-boundaries for genomic analyses using smoothing spline techniques
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4404117/ https://www.ncbi.nlm.nih.gov/pubmed/25928167 http://dx.doi.org/10.1186/s12711-015-0105-9
work_keys_str_mv	AT beissingertimothym definingwindowboundariesforgenomicanalysesusingsmoothingsplinetechniques AT rosaguilhermejm definingwindowboundariesforgenomicanalysesusingsmoothingsplinetechniques AT kaepplershawnm definingwindowboundariesforgenomicanalysesusingsmoothingsplinetechniques AT gianoladaniel definingwindowboundariesforgenomicanalysesusingsmoothingsplinetechniques AT deleonnatalia definingwindowboundariesforgenomicanalysesusingsmoothingsplinetechniques

Defining window-boundaries for genomic analyses using smoothing spline techniques

Ejemplares similares