Cargando…

SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements

Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusion...

Descripción completa

Detalles Bibliográficos
Autores principales: Cabanski, Christopher R., Qi, Yuan, Yin, Xiaoying, Bair, Eric, Hayward, Michele C., Fan, Cheng, Li, Jianying, Wilkerson, Matthew D., Marron, J. S., Perou, Charles M., Hayes, D. Neil
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2845619/
https://www.ncbi.nlm.nih.gov/pubmed/20360852
http://dx.doi.org/10.1371/journal.pone.0009905
_version_ 1782179420663971840
author Cabanski, Christopher R.
Qi, Yuan
Yin, Xiaoying
Bair, Eric
Hayward, Michele C.
Fan, Cheng
Li, Jianying
Wilkerson, Matthew D.
Marron, J. S.
Perou, Charles M.
Hayes, D. Neil
author_facet Cabanski, Christopher R.
Qi, Yuan
Yin, Xiaoying
Bair, Eric
Hayward, Michele C.
Fan, Cheng
Li, Jianying
Wilkerson, Matthew D.
Marron, J. S.
Perou, Charles M.
Hayes, D. Neil
author_sort Cabanski, Christopher R.
collection PubMed
description Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusions. These steps can dramatically change the interpretation of an experiment. Evaluation of processing steps has received limited attention in the literature. It is not straightforward to evaluate different processing methods and investigators are often unsure of the best method. We present a simple statistical tool, Standardized WithIn class Sum of Squares (SWISS), that allows investigators to compare alternate data processing methods, such as different experimental methods, normalizations, or technologies, on a dataset in terms of how well they cluster a priori biological classes. SWISS uses Euclidean distance to determine which method does a better job of clustering the data elements based on a priori classifications. We apply SWISS to three different gene expression applications. The first application uses four different datasets to compare different experimental methods, normalizations, and gene sets. The second application, using data from the MicroArray Quality Control (MAQC) project, compares different microarray platforms. The third application compares different technologies: a single Agilent two-color microarray versus one lane of RNA-Seq. These applications give an indication of the variety of problems that SWISS can be helpful in solving. The SWISS analysis of one-color versus two-color microarrays provides investigators who use two-color arrays the opportunity to review their results in light of a single-channel analysis, with all of the associated benefits offered by this design. Analysis of the MACQ data shows differential intersite reproducibility by array platform. SWISS also shows that one lane of RNA-Seq clusters data by biological phenotypes as well as a single Agilent two-color microarray.
format Text
id pubmed-2845619
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-28456192010-04-02 SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements Cabanski, Christopher R. Qi, Yuan Yin, Xiaoying Bair, Eric Hayward, Michele C. Fan, Cheng Li, Jianying Wilkerson, Matthew D. Marron, J. S. Perou, Charles M. Hayes, D. Neil PLoS One Research Article Contemporary high dimensional biological assays, such as mRNA expression microarrays, regularly involve multiple data processing steps, such as experimental processing, computational processing, sample selection, or feature selection (i.e. gene selection), prior to deriving any biological conclusions. These steps can dramatically change the interpretation of an experiment. Evaluation of processing steps has received limited attention in the literature. It is not straightforward to evaluate different processing methods and investigators are often unsure of the best method. We present a simple statistical tool, Standardized WithIn class Sum of Squares (SWISS), that allows investigators to compare alternate data processing methods, such as different experimental methods, normalizations, or technologies, on a dataset in terms of how well they cluster a priori biological classes. SWISS uses Euclidean distance to determine which method does a better job of clustering the data elements based on a priori classifications. We apply SWISS to three different gene expression applications. The first application uses four different datasets to compare different experimental methods, normalizations, and gene sets. The second application, using data from the MicroArray Quality Control (MAQC) project, compares different microarray platforms. The third application compares different technologies: a single Agilent two-color microarray versus one lane of RNA-Seq. These applications give an indication of the variety of problems that SWISS can be helpful in solving. The SWISS analysis of one-color versus two-color microarrays provides investigators who use two-color arrays the opportunity to review their results in light of a single-channel analysis, with all of the associated benefits offered by this design. Analysis of the MACQ data shows differential intersite reproducibility by array platform. SWISS also shows that one lane of RNA-Seq clusters data by biological phenotypes as well as a single Agilent two-color microarray. Public Library of Science 2010-03-26 /pmc/articles/PMC2845619/ /pubmed/20360852 http://dx.doi.org/10.1371/journal.pone.0009905 Text en This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Cabanski, Christopher R.
Qi, Yuan
Yin, Xiaoying
Bair, Eric
Hayward, Michele C.
Fan, Cheng
Li, Jianying
Wilkerson, Matthew D.
Marron, J. S.
Perou, Charles M.
Hayes, D. Neil
SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements
title SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements
title_full SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements
title_fullStr SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements
title_full_unstemmed SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements
title_short SWISS MADE: Standardized WithIn Class Sum of Squares to Evaluate Methodologies and Dataset Elements
title_sort swiss made: standardized within class sum of squares to evaluate methodologies and dataset elements
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2845619/
https://www.ncbi.nlm.nih.gov/pubmed/20360852
http://dx.doi.org/10.1371/journal.pone.0009905
work_keys_str_mv AT cabanskichristopherr swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT qiyuan swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT yinxiaoying swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT baireric swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT haywardmichelec swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT fancheng swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT lijianying swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT wilkersonmatthewd swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT marronjs swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT peroucharlesm swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements
AT hayesdneil swissmadestandardizedwithinclasssumofsquarestoevaluatemethodologiesanddatasetelements