Cargando…
Uncovering hidden duplicated content in public transcriptomics data
As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3595988/ https://www.ncbi.nlm.nih.gov/pubmed/23487185 http://dx.doi.org/10.1093/database/bat010 |
_version_ | 1782262453784018944 |
---|---|
author | Rosikiewicz, Marta Comte, Aurélie Niknejad, Anne Robinson-Rechavi, Marc Bastian, Frederic B. |
author_facet | Rosikiewicz, Marta Comte, Aurélie Niknejad, Anne Robinson-Rechavi, Marc Bastian, Frederic B. |
author_sort | Rosikiewicz, Marta |
collection | PubMed |
description | As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data. Database URL: http://bgee.unil.ch/ |
format | Online Article Text |
id | pubmed-3595988 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-35959882013-03-13 Uncovering hidden duplicated content in public transcriptomics data Rosikiewicz, Marta Comte, Aurélie Niknejad, Anne Robinson-Rechavi, Marc Bastian, Frederic B. Database (Oxford) Original Article As part of the development of the database Bgee (a dataBase for Gene Expression Evolution), we annotate and analyse expression data from different types and different sources, notably Affymetrix data from GEO and ArrayExpress, and RNA-Seq data from SRA. During our quality control procedure, we have identified duplicated content in GEO and ArrayExpress, affecting ∼14% of our data: fully or partially duplicated experiments from independent data submissions, Affymetrix chips reused in several experiments, or reused within an experiment. We present here the procedure that we have established to filter such duplicates from Affymetrix data, and our procedure to identify future potential duplicates in RNA-Seq data. Database URL: http://bgee.unil.ch/ Oxford University Press 2013-03-13 /pmc/articles/PMC3595988/ /pubmed/23487185 http://dx.doi.org/10.1093/database/bat010 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Article Rosikiewicz, Marta Comte, Aurélie Niknejad, Anne Robinson-Rechavi, Marc Bastian, Frederic B. Uncovering hidden duplicated content in public transcriptomics data |
title | Uncovering hidden duplicated content in public transcriptomics data |
title_full | Uncovering hidden duplicated content in public transcriptomics data |
title_fullStr | Uncovering hidden duplicated content in public transcriptomics data |
title_full_unstemmed | Uncovering hidden duplicated content in public transcriptomics data |
title_short | Uncovering hidden duplicated content in public transcriptomics data |
title_sort | uncovering hidden duplicated content in public transcriptomics data |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3595988/ https://www.ncbi.nlm.nih.gov/pubmed/23487185 http://dx.doi.org/10.1093/database/bat010 |
work_keys_str_mv | AT rosikiewiczmarta uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata AT comteaurelie uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata AT niknejadanne uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata AT robinsonrechavimarc uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata AT bastianfredericb uncoveringhiddenduplicatedcontentinpublictranscriptomicsdata |