Cargando…
Benchmarking of 4C-seq pipelines based on real and simulated data
MOTIVATION: With its capacity for high-resolution data output in one region of interest, chromosome conformation capture combined with high-throughput sequencing (4C-seq) is a state-of-the-art next-generation sequencing technique that provides epigenetic insights, and regularly advances current medi...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6901067/ https://www.ncbi.nlm.nih.gov/pubmed/31134276 http://dx.doi.org/10.1093/bioinformatics/btz426 |
_version_ | 1783477447981793280 |
---|---|
author | Walter, Carolin Schuetzmann, Daniel Rosenbauer, Frank Dugas, Martin |
author_facet | Walter, Carolin Schuetzmann, Daniel Rosenbauer, Frank Dugas, Martin |
author_sort | Walter, Carolin |
collection | PubMed |
description | MOTIVATION: With its capacity for high-resolution data output in one region of interest, chromosome conformation capture combined with high-throughput sequencing (4C-seq) is a state-of-the-art next-generation sequencing technique that provides epigenetic insights, and regularly advances current medical research. However, 4C-seq data are complex and prone to biases, and while specialized programs exist, an unbiased, extensive benchmarking is still lacking. Furthermore, neither substantial datasets with fully characterized ground truth, nor simulation programs for realistic 4C-seq data have been published. RESULTS: We conducted a benchmarking study on 66 4C-seq samples from 20 datasets, and developed a novel 4C-seq simulation software, Basic4CSim, to allow for detailed comparisons of 4C-seq algorithms on 50 simulated datasets with 10–120 samples each. Simulations and benchmarking were adapted to address different characteristics of 4C-seq data. Simulated data were compared with published samples to validate simulation settings. We identified differences between 4C-seq algorithms in terms of precision, recall, interaction structure, and run time, and observed general trends. Novel differential pipeline versions of single-sample based 4C-seq algorithms were included in the benchmarking. While no single tool was optimally suited for both near-cis and far-cis, and both single-sample and differential analyses, choosing a high-performing algorithm variant did improve results considerably. For near-cis scenarios, r3Cseq, peakC and FourCSeq offered high precision, while fourSig demonstrated high overall F(1) scores in far-cis analyses. Finally, 4C-seq simulations may aid in the development of improved analysis algorithms. AVAILABILITY AND IMPLEMENTATION: Basic4CSim is available at https://github.com/walter–ca/Basic4CSim. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6901067 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-69010672019-12-16 Benchmarking of 4C-seq pipelines based on real and simulated data Walter, Carolin Schuetzmann, Daniel Rosenbauer, Frank Dugas, Martin Bioinformatics Original Papers MOTIVATION: With its capacity for high-resolution data output in one region of interest, chromosome conformation capture combined with high-throughput sequencing (4C-seq) is a state-of-the-art next-generation sequencing technique that provides epigenetic insights, and regularly advances current medical research. However, 4C-seq data are complex and prone to biases, and while specialized programs exist, an unbiased, extensive benchmarking is still lacking. Furthermore, neither substantial datasets with fully characterized ground truth, nor simulation programs for realistic 4C-seq data have been published. RESULTS: We conducted a benchmarking study on 66 4C-seq samples from 20 datasets, and developed a novel 4C-seq simulation software, Basic4CSim, to allow for detailed comparisons of 4C-seq algorithms on 50 simulated datasets with 10–120 samples each. Simulations and benchmarking were adapted to address different characteristics of 4C-seq data. Simulated data were compared with published samples to validate simulation settings. We identified differences between 4C-seq algorithms in terms of precision, recall, interaction structure, and run time, and observed general trends. Novel differential pipeline versions of single-sample based 4C-seq algorithms were included in the benchmarking. While no single tool was optimally suited for both near-cis and far-cis, and both single-sample and differential analyses, choosing a high-performing algorithm variant did improve results considerably. For near-cis scenarios, r3Cseq, peakC and FourCSeq offered high precision, while fourSig demonstrated high overall F(1) scores in far-cis analyses. Finally, 4C-seq simulations may aid in the development of improved analysis algorithms. AVAILABILITY AND IMPLEMENTATION: Basic4CSim is available at https://github.com/walter–ca/Basic4CSim. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-12-01 2019-05-27 /pmc/articles/PMC6901067/ /pubmed/31134276 http://dx.doi.org/10.1093/bioinformatics/btz426 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Walter, Carolin Schuetzmann, Daniel Rosenbauer, Frank Dugas, Martin Benchmarking of 4C-seq pipelines based on real and simulated data |
title | Benchmarking of 4C-seq pipelines based on real and simulated data |
title_full | Benchmarking of 4C-seq pipelines based on real and simulated data |
title_fullStr | Benchmarking of 4C-seq pipelines based on real and simulated data |
title_full_unstemmed | Benchmarking of 4C-seq pipelines based on real and simulated data |
title_short | Benchmarking of 4C-seq pipelines based on real and simulated data |
title_sort | benchmarking of 4c-seq pipelines based on real and simulated data |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6901067/ https://www.ncbi.nlm.nih.gov/pubmed/31134276 http://dx.doi.org/10.1093/bioinformatics/btz426 |
work_keys_str_mv | AT waltercarolin benchmarkingof4cseqpipelinesbasedonrealandsimulateddata AT schuetzmanndaniel benchmarkingof4cseqpipelinesbasedonrealandsimulateddata AT rosenbauerfrank benchmarkingof4cseqpipelinesbasedonrealandsimulateddata AT dugasmartin benchmarkingof4cseqpipelinesbasedonrealandsimulateddata |