Cargando…

Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms

BACKGROUND: The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues....

Descripción completa

Detalles Bibliográficos
Autores principales: Milnthorpe, Andrew T, Soloviev, Mikhail
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3094240/
https://www.ncbi.nlm.nih.gov/pubmed/21496233
http://dx.doi.org/10.1186/1471-2105-12-97
_version_ 1782203524444061696
author Milnthorpe, Andrew T
Soloviev, Mikhail
author_facet Milnthorpe, Andrew T
Soloviev, Mikhail
author_sort Milnthorpe, Andrew T
collection PubMed
description BACKGROUND: The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries. RESULTS: We discovered that the CGAP approach often includes libraries from dependent or irrelevant tissues (one third of libraries were incorrect on average, with some tissue searches no correct libraries being selected at all). We also discovered that the CGAP approach reported genes from outside the selected libraries and may omit genes found within the libraries. Other errors include the incorrect estimation of the significance values and inaccurate settings for the library size cut-off values. We advocated a revised approach to finding libraries associated with tissues. In doing so, libraries from dependent or irrelevant tissues do not get included in the final library pool. We also revised the method for determining the presence or absence of a gene by searching the UniGene relational database, revised calculation of statistical significance and sorted the library cut-off filter. CONCLUSION: Our results justify re-evaluation of all previously reported results where NCBI CGAP expression data and tools were used.
format Text
id pubmed-3094240
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30942402011-05-14 Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms Milnthorpe, Andrew T Soloviev, Mikhail BMC Bioinformatics Correspondence BACKGROUND: The Cancer Genome Anatomy Project (CGAP) xProfiler and cDNA Digital Gene Expression Displayer (DGED) have been made available to the scientific community over a decade ago and since then were used widely to find genes which are differentially expressed between cancer and normal tissues. The tissue types are usually chosen according to the ontology hierarchy developed by NCBI. The xProfiler uses an internally available flat file database to determine the presence or absence of genes in the chosen libraries, while cDNA DGED uses the publicly available UniGene Expression and Gene relational databases to count the sequences found for each gene in the presented libraries. RESULTS: We discovered that the CGAP approach often includes libraries from dependent or irrelevant tissues (one third of libraries were incorrect on average, with some tissue searches no correct libraries being selected at all). We also discovered that the CGAP approach reported genes from outside the selected libraries and may omit genes found within the libraries. Other errors include the incorrect estimation of the significance values and inaccurate settings for the library size cut-off values. We advocated a revised approach to finding libraries associated with tissues. In doing so, libraries from dependent or irrelevant tissues do not get included in the final library pool. We also revised the method for determining the presence or absence of a gene by searching the UniGene relational database, revised calculation of statistical significance and sorted the library cut-off filter. CONCLUSION: Our results justify re-evaluation of all previously reported results where NCBI CGAP expression data and tools were used. BioMed Central 2011-04-15 /pmc/articles/PMC3094240/ /pubmed/21496233 http://dx.doi.org/10.1186/1471-2105-12-97 Text en Copyright ©2011 Milnthorpe and Soloviev; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Correspondence
Milnthorpe, Andrew T
Soloviev, Mikhail
Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms
title Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms
title_full Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms
title_fullStr Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms
title_full_unstemmed Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms
title_short Errors in CGAP xProfiler and cDNA DGED: the importance of library parsing and gene selection algorithms
title_sort errors in cgap xprofiler and cdna dged: the importance of library parsing and gene selection algorithms
topic Correspondence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3094240/
https://www.ncbi.nlm.nih.gov/pubmed/21496233
http://dx.doi.org/10.1186/1471-2105-12-97
work_keys_str_mv AT milnthorpeandrewt errorsincgapxprofilerandcdnadgedtheimportanceoflibraryparsingandgeneselectionalgorithms
AT solovievmikhail errorsincgapxprofilerandcdnadgedtheimportanceoflibraryparsingandgeneselectionalgorithms