Cargando…

Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis

Massive parallel DNA sequencing combined with chromatin immunoprecipitation and a large variety of DNA/RNA-enrichment methodologies is at the origin of data resources of major importance. Indeed these resources, available for multiple genomes, represent the most comprehensive catalogue of (i) cell,...

Descripción completa

Detalles Bibliográficos
Autores principales: Mendoza-Parra, Marco Antonio, Gronemeyer, Hinrich
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4536145/
https://www.ncbi.nlm.nih.gov/pubmed/26484107
http://dx.doi.org/10.1016/j.gdata.2014.08.002
_version_ 1782385700394500096
author Mendoza-Parra, Marco Antonio
Gronemeyer, Hinrich
author_facet Mendoza-Parra, Marco Antonio
Gronemeyer, Hinrich
author_sort Mendoza-Parra, Marco Antonio
collection PubMed
description Massive parallel DNA sequencing combined with chromatin immunoprecipitation and a large variety of DNA/RNA-enrichment methodologies is at the origin of data resources of major importance. Indeed these resources, available for multiple genomes, represent the most comprehensive catalogue of (i) cell, development and signal transduction-specified patterns of binding sites for transcription factors (‘cistromes’) and for transcription and chromatin modifying machineries and (ii) the patterns of specific local post-translational modifications of histones and DNA (‘epigenome’) or of regulatory chromatin binding factors. In addition, (iii) the resources specifying chromatin structure alterations are emerging. Importantly, these types of “omics” datasets populate increasingly public repositories and provide highly valuable resources for the exploration of general principles of cell function in a multi-dimensional genome–transcriptome–epigenome–chromatin structure context. However, data mining is critically dependent on the data quality, an issue that, surprisingly, is still largely ignored by scientists and well-financed consortia, data repositories and scientific journals. So what determines the quality of ChIP-seq experiments and the datasets generated therefrom and what refrains scientists from associating quality criteria to their data? In this ‘opinion’ we trace the various parameters that influence the quality of this type of datasets, as well as the computational efforts that were made until now to qualify them. Moreover, we describe a universal quality control (QC) certification approach that provides a quality rating for ChIP-seq and enrichment-related assays. The corresponding QC tool and a regularly updated database, from which at present the quality parameters of more than 8000 datasets can be retrieved, are freely accessible at www.ngs-qc.org.
format Online
Article
Text
id pubmed-4536145
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-45361452015-10-19 Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis Mendoza-Parra, Marco Antonio Gronemeyer, Hinrich Genom Data Special Section “Democratizing Genomics Data” Massive parallel DNA sequencing combined with chromatin immunoprecipitation and a large variety of DNA/RNA-enrichment methodologies is at the origin of data resources of major importance. Indeed these resources, available for multiple genomes, represent the most comprehensive catalogue of (i) cell, development and signal transduction-specified patterns of binding sites for transcription factors (‘cistromes’) and for transcription and chromatin modifying machineries and (ii) the patterns of specific local post-translational modifications of histones and DNA (‘epigenome’) or of regulatory chromatin binding factors. In addition, (iii) the resources specifying chromatin structure alterations are emerging. Importantly, these types of “omics” datasets populate increasingly public repositories and provide highly valuable resources for the exploration of general principles of cell function in a multi-dimensional genome–transcriptome–epigenome–chromatin structure context. However, data mining is critically dependent on the data quality, an issue that, surprisingly, is still largely ignored by scientists and well-financed consortia, data repositories and scientific journals. So what determines the quality of ChIP-seq experiments and the datasets generated therefrom and what refrains scientists from associating quality criteria to their data? In this ‘opinion’ we trace the various parameters that influence the quality of this type of datasets, as well as the computational efforts that were made until now to qualify them. Moreover, we describe a universal quality control (QC) certification approach that provides a quality rating for ChIP-seq and enrichment-related assays. The corresponding QC tool and a regularly updated database, from which at present the quality parameters of more than 8000 datasets can be retrieved, are freely accessible at www.ngs-qc.org. Elsevier 2014-08-14 /pmc/articles/PMC4536145/ /pubmed/26484107 http://dx.doi.org/10.1016/j.gdata.2014.08.002 Text en © 2014 The Authors http://creativecommons.org/licenses/by/3.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle Special Section “Democratizing Genomics Data”
Mendoza-Parra, Marco Antonio
Gronemeyer, Hinrich
Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis
title Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis
title_full Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis
title_fullStr Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis
title_full_unstemmed Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis
title_short Assessing quality standards for ChIP-seq and related massive parallel sequencing-generated datasets: When rating goes beyond avoiding the crisis
title_sort assessing quality standards for chip-seq and related massive parallel sequencing-generated datasets: when rating goes beyond avoiding the crisis
topic Special Section “Democratizing Genomics Data”
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4536145/
https://www.ncbi.nlm.nih.gov/pubmed/26484107
http://dx.doi.org/10.1016/j.gdata.2014.08.002
work_keys_str_mv AT mendozaparramarcoantonio assessingqualitystandardsforchipseqandrelatedmassiveparallelsequencinggenerateddatasetswhenratinggoesbeyondavoidingthecrisis
AT gronemeyerhinrich assessingqualitystandardsforchipseqandrelatedmassiveparallelsequencinggenerateddatasetswhenratinggoesbeyondavoidingthecrisis