Cargando…

Uncovering hidden duplicated content in public transcriptomics data

As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have...

Descripción completa

Detalles Bibliográficos
Autores principales: Rosikiewicz, Marta, Comte, Aurélie, Niknejad, Anne, Robinson-Rechavi, Marc, Bastian, Frederic B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3595988/
https://www.ncbi.nlm.nih.gov/pubmed/23487185
http://dx.doi.org/10.1093/database/bat010
_version_ 1782262453784018944
author Rosikiewicz, Marta
Comte, Aurélie
Niknejad, Anne
Robinson-Rechavi, Marc
Bastian, Frederic B.
author_facet Rosikiewicz, Marta
Comte, Aurélie
Niknejad, Anne
Robinson-Rechavi, Marc
Bastian, Frederic B.
author_sort Rosikiewicz, Marta
collection PubMed
description As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data. Database URL: http://bgee.unil.ch/
format Online
Article
Text
id pubmed-3595988
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-35959882013-03-13 Uncovering hidden duplicated content in public transcriptomics data Rosikiewicz, Marta Comte, Aurélie Niknejad, Anne Robinson-Rechavi, Marc Bastian, Frederic B. Database (Oxford) Original Article As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data. Database URL: http://bgee.unil.ch/ Oxford University Press 2013-03-13 /pmc/articles/PMC3595988/ /pubmed/23487185 http://dx.doi.org/10.1093/database/bat010 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Rosikiewicz, Marta
Comte, Aurélie
Niknejad, Anne
Robinson-Rechavi, Marc
Bastian, Frederic B.
Uncovering hidden duplicated content in public transcriptomics data
title Uncovering hidden duplicated content in public transcriptomics data
title_full Uncovering hidden duplicated content in public transcriptomics data
title_fullStr Uncovering hidden duplicated content in public transcriptomics data
title_full_unstemmed Uncovering hidden duplicated content in public transcriptomics data
title_short Uncovering hidden duplicated content in public transcriptomics data
title_sort uncovering hidden duplicated content in public transcriptomics data
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3595988/
https://www.ncbi.nlm.nih.gov/pubmed/23487185
http://dx.doi.org/10.1093/database/bat010
work_keys_str_mv AT rosikiewiczmarta uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata
AT comteaurelie uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata
AT niknejadanne uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata
AT robinsonrechavimarc uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata
AT bastianfredericb uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata