Cargando…

A method for detecting and correcting feature misidentification on expression microarrays

BACKGROUND: Much of the microarray data published at Stanford is based on mouse and human arrays produced under controlled and monitored conditions at the Brown and Botstein laboratories and at the Stanford Functional Genomics Facility (SFGF). Nevertheless, as large datasets based on the Stanford Hu...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tu, I-Ping, Schaner, Marci, Diehn, Maximilian, Sikic, Branimir I, Brown, Patrick O, Botstein, David, Fero, Michael J
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2004
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC521069/ https://www.ncbi.nlm.nih.gov/pubmed/15357875 http://dx.doi.org/10.1186/1471-2164-5-64

_version_	1782121822538432512
author	Tu, I-Ping Schaner, Marci Diehn, Maximilian Sikic, Branimir I Brown, Patrick O Botstein, David Fero, Michael J
author_facet	Tu, I-Ping Schaner, Marci Diehn, Maximilian Sikic, Branimir I Brown, Patrick O Botstein, David Fero, Michael J
author_sort	Tu, I-Ping
collection	PubMed
description	BACKGROUND: Much of the microarray data published at Stanford is based on mouse and human arrays produced under controlled and monitored conditions at the Brown and Botstein laboratories and at the Stanford Functional Genomics Facility (SFGF). Nevertheless, as large datasets based on the Stanford Human array began to accumulate, a small but significant number of discrepancies were detected that required a serious attempt to track down the original source of error. Due to a controlled process environment, sufficient data was available to accurately track the entire process leading to up to the final expression data. In this paper, we describe our statistical methods to detect the inconsistencies in microarray data that arise from process errors, and discuss our technique to locate and fix these errors. RESULTS: To date, the Brown and Botstein laboratories and the Stanford Functional Genomics Facility have together produced 40,000 large-scale (10–50,000 feature) cDNA microarrays. By applying the heuristic described here, we have been able to check most of these arrays for misidentified features, and have been able to confidently apply fixes to the data where needed. Out of the 265 million features checked in our database, problems were detected and corrected on 1.3 million of them. CONCLUSION: Process errors in any genome scale high throughput production regime can lead to subsequent errors in data analysis. We show the value of tracking multi-step high throughput operations by using this knowledge to detect and correct misidentified data on gene expression microarrays.
format	Text
id	pubmed-521069
institution	National Center for Biotechnology Information
language	English
publishDate	2004
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-5210692004-10-03 A method for detecting and correcting feature misidentification on expression microarrays Tu, I-Ping Schaner, Marci Diehn, Maximilian Sikic, Branimir I Brown, Patrick O Botstein, David Fero, Michael J BMC Genomics Software BACKGROUND: Much of the microarray data published at Stanford is based on mouse and human arrays produced under controlled and monitored conditions at the Brown and Botstein laboratories and at the Stanford Functional Genomics Facility (SFGF). Nevertheless, as large datasets based on the Stanford Human array began to accumulate, a small but significant number of discrepancies were detected that required a serious attempt to track down the original source of error. Due to a controlled process environment, sufficient data was available to accurately track the entire process leading to up to the final expression data. In this paper, we describe our statistical methods to detect the inconsistencies in microarray data that arise from process errors, and discuss our technique to locate and fix these errors. RESULTS: To date, the Brown and Botstein laboratories and the Stanford Functional Genomics Facility have together produced 40,000 large-scale (10–50,000 feature) cDNA microarrays. By applying the heuristic described here, we have been able to check most of these arrays for misidentified features, and have been able to confidently apply fixes to the data where needed. Out of the 265 million features checked in our database, problems were detected and corrected on 1.3 million of them. CONCLUSION: Process errors in any genome scale high throughput production regime can lead to subsequent errors in data analysis. We show the value of tracking multi-step high throughput operations by using this knowledge to detect and correct misidentified data on gene expression microarrays. BioMed Central 2004-09-09 /pmc/articles/PMC521069/ /pubmed/15357875 http://dx.doi.org/10.1186/1471-2164-5-64 Text en Copyright © 2004 Tu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open-access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Tu, I-Ping Schaner, Marci Diehn, Maximilian Sikic, Branimir I Brown, Patrick O Botstein, David Fero, Michael J A method for detecting and correcting feature misidentification on expression microarrays
title	A method for detecting and correcting feature misidentification on expression microarrays
title_full	A method for detecting and correcting feature misidentification on expression microarrays
title_fullStr	A method for detecting and correcting feature misidentification on expression microarrays
title_full_unstemmed	A method for detecting and correcting feature misidentification on expression microarrays
title_short	A method for detecting and correcting feature misidentification on expression microarrays
title_sort	method for detecting and correcting feature misidentification on expression microarrays
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC521069/ https://www.ncbi.nlm.nih.gov/pubmed/15357875 http://dx.doi.org/10.1186/1471-2164-5-64
work_keys_str_mv	AT tuiping amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT schanermarci amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT diehnmaximilian amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT sikicbranimiri amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT brownpatricko amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT botsteindavid amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT feromichaelj amethodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT tuiping methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT schanermarci methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT diehnmaximilian methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT sikicbranimiri methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT brownpatricko methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT botsteindavid methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays AT feromichaelj methodfordetectingandcorrectingfeaturemisidentificationonexpressionmicroarrays

A method for detecting and correcting feature misidentification on expression microarrays

Ejemplares similares