Cargando…

Uncovering hidden duplicated content in public transcriptomics data

As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rosikiewicz, Marta, Comte, Aurélie, Niknejad, Anne, Robinson-Rechavi, Marc, Bastian, Frederic B.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2013
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3595988/ https://www.ncbi.nlm.nih.gov/pubmed/23487185 http://dx.doi.org/10.1093/database/bat010

_version_	1782262453784018944
author	Rosikiewicz, Marta Comte, Aurélie Niknejad, Anne Robinson-Rechavi, Marc Bastian, Frederic B.
author_facet	Rosikiewicz, Marta Comte, Aurélie Niknejad, Anne Robinson-Rechavi, Marc Bastian, Frederic B.
author_sort	Rosikiewicz, Marta
collection	PubMed
description	As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data. Database URL: http://bgee.unil.ch/
format	Online Article Text
id	pubmed-3595988
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-35959882013-03-13 Uncovering hidden duplicated content in public transcriptomics data Rosikiewicz, Marta Comte, Aurélie Niknejad, Anne Robinson-Rechavi, Marc Bastian, Frederic B. Database (Oxford) Original Article As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data. Database URL: http://bgee.unil.ch/ Oxford University Press 2013-03-13 /pmc/articles/PMC3595988/ /pubmed/23487185 http://dx.doi.org/10.1093/database/bat010 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Rosikiewicz, Marta Comte, Aurélie Niknejad, Anne Robinson-Rechavi, Marc Bastian, Frederic B. Uncovering hidden duplicated content in public transcriptomics data
title	Uncovering hidden duplicated content in public transcriptomics data
title_full	Uncovering hidden duplicated content in public transcriptomics data
title_fullStr	Uncovering hidden duplicated content in public transcriptomics data
title_full_unstemmed	Uncovering hidden duplicated content in public transcriptomics data
title_short	Uncovering hidden duplicated content in public transcriptomics data
title_sort	uncovering hidden duplicated content in public transcriptomics data
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3595988/ https://www.ncbi.nlm.nih.gov/pubmed/23487185 http://dx.doi.org/10.1093/database/bat010
work_keys_str_mv	AT rosikiewiczmarta uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata AT comteaurelie uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata AT niknejadanne uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata AT robinsonrechavimarc uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata AT bastianfredericb uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata

Uncovering hidden duplicated content in public transcriptomics data

Ejemplares similares