Cargando…

Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering

Data quality is a recognized problem for high-throughput genomics platforms, as evinced by the proliferation of methods attempting to filter out lower quality data points. Different filtering methods lead to discordant results, raising the question, which methods are best? Astonishingly, little comp...

Descripción completa

Detalles Bibliográficos
Autores principales:	McDade, Kevin K., Chandran, Uma, Day, Roger S.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Libertas Academica 2015
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4686346/ https://www.ncbi.nlm.nih.gov/pubmed/26715829 http://dx.doi.org/10.4137/CIN.S33076

_version_	1782406429678764032
author	McDade, Kevin K. Chandran, Uma Day, Roger S.
author_facet	McDade, Kevin K. Chandran, Uma Day, Roger S.
author_sort	McDade, Kevin K.
collection	PubMed
description	Data quality is a recognized problem for high-throughput genomics platforms, as evinced by the proliferation of methods attempting to filter out lower quality data points. Different filtering methods lead to discordant results, raising the question, which methods are best? Astonishingly, little computational support is offered to analysts to decide which filtering methods are optimal for the research question at hand. To evaluate them, we begin with a pair of expression data sets, transcriptomic and proteomic, on the same samples. The pair of data sets form a test-bed for the evaluation. Identifier mapping between the data sets creates a collection of feature pairs, with correlations calculated for each pair. To evaluate a filtering strategy, we estimate posterior probabilities for the correctness of probesets accepted by the method. An analyst can set expected utilities that represent the trade-off between the quality and quantity of accepted features. We tested nine published probeset filtering methods and combination strategies. We used two test-beds from cancer studies providing transcriptomic and proteomic data. For reasonable utility settings, the Jetset filtering method was optimal for probeset filtering on both test-beds, even though both assay platforms were different. Further intersection with a second filtering method was indicated on one test-bed but not the other.
format	Online Article Text
id	pubmed-4686346
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Libertas Academica
record_format	MEDLINE/PubMed
spelling	pubmed-46863462015-12-29 Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering McDade, Kevin K. Chandran, Uma Day, Roger S. Cancer Inform Original Research Data quality is a recognized problem for high-throughput genomics platforms, as evinced by the proliferation of methods attempting to filter out lower quality data points. Different filtering methods lead to discordant results, raising the question, which methods are best? Astonishingly, little computational support is offered to analysts to decide which filtering methods are optimal for the research question at hand. To evaluate them, we begin with a pair of expression data sets, transcriptomic and proteomic, on the same samples. The pair of data sets form a test-bed for the evaluation. Identifier mapping between the data sets creates a collection of feature pairs, with correlations calculated for each pair. To evaluate a filtering strategy, we estimate posterior probabilities for the correctness of probesets accepted by the method. An analyst can set expected utilities that represent the trade-off between the quality and quantity of accepted features. We tested nine published probeset filtering methods and combination strategies. We used two test-beds from cancer studies providing transcriptomic and proteomic data. For reasonable utility settings, the Jetset filtering method was optimal for probeset filtering on both test-beds, even though both assay platforms were different. Further intersection with a second filtering method was indicated on one test-bed but not the other. Libertas Academica 2015-12-16 /pmc/articles/PMC4686346/ /pubmed/26715829 http://dx.doi.org/10.4137/CIN.S33076 Text en © 2015 the author(s), publisher and licensee Libertas Academica Ltd. This is an open access article published under the Creative Commons CC-BY-NC 3.0 license.
spellingShingle	Original Research McDade, Kevin K. Chandran, Uma Day, Roger S. Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering
title	Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering
title_full	Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering
title_fullStr	Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering
title_full_unstemmed	Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering
title_short	Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering
title_sort	improving cancer gene expression data quality through a tcga data-driven evaluation of identifier filtering
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4686346/ https://www.ncbi.nlm.nih.gov/pubmed/26715829 http://dx.doi.org/10.4137/CIN.S33076
work_keys_str_mv	AT mcdadekevink improvingcancergeneexpressiondataqualitythroughatcgadatadrivenevaluationofidentifierfiltering AT chandranuma improvingcancergeneexpressiondataqualitythroughatcgadatadrivenevaluationofidentifierfiltering AT dayrogers improvingcancergeneexpressiondataqualitythroughatcgadatadrivenevaluationofidentifierfiltering

Improving Cancer Gene Expression Data Quality through a TCGA Data-Driven Evaluation of Identifier Filtering

Ejemplares similares