Cargando…

Estimation of CpG coverage in whole methylome next-generation sequencing studies

BACKGROUND: Methylation studies are a promising complement to genetic studies of DNA sequence. However, detailed prior biological knowledge is typically lacking, so methylome-wide association studies (MWAS) will be critical to detect disease relevant sites. A cost-effective approach involves the nex...

Descripción completa

Detalles Bibliográficos
Autores principales: van den Oord, Edwin JCG, Bukszar, Jozsef, Rudolf, Gábor, Nerella, Srilaxmi, McClay, Joseph L, Xie, Lin Y, Aberg, Karolina A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3599116/
https://www.ncbi.nlm.nih.gov/pubmed/23398781
http://dx.doi.org/10.1186/1471-2105-14-50
_version_ 1782262889898311680
author van den Oord, Edwin JCG
Bukszar, Jozsef
Rudolf, Gábor
Nerella, Srilaxmi
McClay, Joseph L
Xie, Lin Y
Aberg, Karolina A
author_facet van den Oord, Edwin JCG
Bukszar, Jozsef
Rudolf, Gábor
Nerella, Srilaxmi
McClay, Joseph L
Xie, Lin Y
Aberg, Karolina A
author_sort van den Oord, Edwin JCG
collection PubMed
description BACKGROUND: Methylation studies are a promising complement to genetic studies of DNA sequence. However, detailed prior biological knowledge is typically lacking, so methylome-wide association studies (MWAS) will be critical to detect disease relevant sites. A cost-effective approach involves the next-generation sequencing (NGS) of single-end libraries created from samples that are enriched for methylated DNA fragments. A limitation of single-end libraries is that the fragment size distribution is not observed. This hampers several aspects of the data analysis such as the calculation of enrichment measures that are based on the number of fragments covering the CpGs. RESULTS: We developed a non-parametric method that uses isolated CpGs to estimate sample-specific fragment size distributions from the empirical sequencing data. Through simulations we show that our method is highly accurate. While the traditional (extended) read count methods resulted in severely biased coverage estimates and introduces artificial inter-individual differences, through the use of the estimated fragment size distributions we could remove these biases almost entirely. Furthermore, we found correlations of 0.999 between coverage estimates obtained using fragment size distributions that were estimated with our method versus those that were “observed” in paired-end sequencing data. CONCLUSIONS: We propose a non-parametric method for estimating fragment size distributions that is highly precise and can improve the analysis of cost-effective MWAS studies that sequence single-end libraries created from samples that are enriched for methylated DNA fragments.
format Online
Article
Text
id pubmed-3599116
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35991162013-03-29 Estimation of CpG coverage in whole methylome next-generation sequencing studies van den Oord, Edwin JCG Bukszar, Jozsef Rudolf, Gábor Nerella, Srilaxmi McClay, Joseph L Xie, Lin Y Aberg, Karolina A BMC Bioinformatics Methodology Article BACKGROUND: Methylation studies are a promising complement to genetic studies of DNA sequence. However, detailed prior biological knowledge is typically lacking, so methylome-wide association studies (MWAS) will be critical to detect disease relevant sites. A cost-effective approach involves the next-generation sequencing (NGS) of single-end libraries created from samples that are enriched for methylated DNA fragments. A limitation of single-end libraries is that the fragment size distribution is not observed. This hampers several aspects of the data analysis such as the calculation of enrichment measures that are based on the number of fragments covering the CpGs. RESULTS: We developed a non-parametric method that uses isolated CpGs to estimate sample-specific fragment size distributions from the empirical sequencing data. Through simulations we show that our method is highly accurate. While the traditional (extended) read count methods resulted in severely biased coverage estimates and introduces artificial inter-individual differences, through the use of the estimated fragment size distributions we could remove these biases almost entirely. Furthermore, we found correlations of 0.999 between coverage estimates obtained using fragment size distributions that were estimated with our method versus those that were “observed” in paired-end sequencing data. CONCLUSIONS: We propose a non-parametric method for estimating fragment size distributions that is highly precise and can improve the analysis of cost-effective MWAS studies that sequence single-end libraries created from samples that are enriched for methylated DNA fragments. BioMed Central 2013-02-12 /pmc/articles/PMC3599116/ /pubmed/23398781 http://dx.doi.org/10.1186/1471-2105-14-50 Text en Copyright ©2013 van den Oord et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
van den Oord, Edwin JCG
Bukszar, Jozsef
Rudolf, Gábor
Nerella, Srilaxmi
McClay, Joseph L
Xie, Lin Y
Aberg, Karolina A
Estimation of CpG coverage in whole methylome next-generation sequencing studies
title Estimation of CpG coverage in whole methylome next-generation sequencing studies
title_full Estimation of CpG coverage in whole methylome next-generation sequencing studies
title_fullStr Estimation of CpG coverage in whole methylome next-generation sequencing studies
title_full_unstemmed Estimation of CpG coverage in whole methylome next-generation sequencing studies
title_short Estimation of CpG coverage in whole methylome next-generation sequencing studies
title_sort estimation of cpg coverage in whole methylome next-generation sequencing studies
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3599116/
https://www.ncbi.nlm.nih.gov/pubmed/23398781
http://dx.doi.org/10.1186/1471-2105-14-50
work_keys_str_mv AT vandenoordedwinjcg estimationofcpgcoverageinwholemethylomenextgenerationsequencingstudies
AT bukszarjozsef estimationofcpgcoverageinwholemethylomenextgenerationsequencingstudies
AT rudolfgabor estimationofcpgcoverageinwholemethylomenextgenerationsequencingstudies
AT nerellasrilaxmi estimationofcpgcoverageinwholemethylomenextgenerationsequencingstudies
AT mcclayjosephl estimationofcpgcoverageinwholemethylomenextgenerationsequencingstudies
AT xieliny estimationofcpgcoverageinwholemethylomenextgenerationsequencingstudies
AT abergkarolinaa estimationofcpgcoverageinwholemethylomenextgenerationsequencingstudies