Cargando…

Preprocessing and Quality Control Strategies for Illumina DASL Assay-Based Brain Gene Expression Studies with Semi-Degraded Samples

Available statistical preprocessing or quality control analysis tools for gene expression microarray datasets are known to greatly affect downstream data analysis, especially when degraded samples, unique tissue samples, or novel expression assays are used. It is therefore important to assess the va...

Descripción completa

Detalles Bibliográficos
Autores principales: Chow, Maggie L., Winn, Mary E., Li, Hai-Ri, April, Craig, Wynshaw-Boris, Anthony, Fan, Jian-Bing, Fu, Xiang-Dong, Courchesne, Eric, Schork, Nicholas J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Research Foundation 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3286152/
https://www.ncbi.nlm.nih.gov/pubmed/22375143
http://dx.doi.org/10.3389/fgene.2012.00011
_version_ 1782224529520590848
author Chow, Maggie L.
Winn, Mary E.
Li, Hai-Ri
April, Craig
Wynshaw-Boris, Anthony
Fan, Jian-Bing
Fu, Xiang-Dong
Courchesne, Eric
Schork, Nicholas J.
author_facet Chow, Maggie L.
Winn, Mary E.
Li, Hai-Ri
April, Craig
Wynshaw-Boris, Anthony
Fan, Jian-Bing
Fu, Xiang-Dong
Courchesne, Eric
Schork, Nicholas J.
author_sort Chow, Maggie L.
collection PubMed
description Available statistical preprocessing or quality control analysis tools for gene expression microarray datasets are known to greatly affect downstream data analysis, especially when degraded samples, unique tissue samples, or novel expression assays are used. It is therefore important to assess the validity and impact of the assumptions built in to preprocessing schemes for a dataset. We developed and assessed a data preprocessing strategy for use with the Illumina DASL-based gene expression assay with partially degraded postmortem prefrontal cortex samples. The samples were obtained from individuals with autism as part of an investigation of the pathogenic factors contributing to autism. Using statistical analysis methods and metrics such as those associated with multivariate distance matrix regression and mean inter-array correlation, we developed a DASL-based assay gene expression preprocessing pipeline to accommodate and detect problems with microarray-based gene expression values obtained with degraded brain samples. Key steps in the pipeline included outlier exclusion, data transformation and normalization, and batch effect and covariate corrections. Our goal was to produce a clean dataset for subsequent downstream differential expression analysis. We ultimately settled on available transformation and normalization algorithms in the R/Bioconductor package lumi based on an assessment of their use in various combinations. A log2-transformed, quantile-normalized, and batch and seizure-corrected procedure was likely the most appropriate for our data. We empirically tested different components of our proposed preprocessing strategy and believe that our results suggest that a preprocessing strategy that effectively identifies outliers, normalizes the data, and corrects for batch effects can be applied to all studies, even those pursued with degraded samples.
format Online
Article
Text
id pubmed-3286152
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Frontiers Research Foundation
record_format MEDLINE/PubMed
spelling pubmed-32861522012-02-28 Preprocessing and Quality Control Strategies for Illumina DASL Assay-Based Brain Gene Expression Studies with Semi-Degraded Samples Chow, Maggie L. Winn, Mary E. Li, Hai-Ri April, Craig Wynshaw-Boris, Anthony Fan, Jian-Bing Fu, Xiang-Dong Courchesne, Eric Schork, Nicholas J. Front Genet Genetics Available statistical preprocessing or quality control analysis tools for gene expression microarray datasets are known to greatly affect downstream data analysis, especially when degraded samples, unique tissue samples, or novel expression assays are used. It is therefore important to assess the validity and impact of the assumptions built in to preprocessing schemes for a dataset. We developed and assessed a data preprocessing strategy for use with the Illumina DASL-based gene expression assay with partially degraded postmortem prefrontal cortex samples. The samples were obtained from individuals with autism as part of an investigation of the pathogenic factors contributing to autism. Using statistical analysis methods and metrics such as those associated with multivariate distance matrix regression and mean inter-array correlation, we developed a DASL-based assay gene expression preprocessing pipeline to accommodate and detect problems with microarray-based gene expression values obtained with degraded brain samples. Key steps in the pipeline included outlier exclusion, data transformation and normalization, and batch effect and covariate corrections. Our goal was to produce a clean dataset for subsequent downstream differential expression analysis. We ultimately settled on available transformation and normalization algorithms in the R/Bioconductor package lumi based on an assessment of their use in various combinations. A log2-transformed, quantile-normalized, and batch and seizure-corrected procedure was likely the most appropriate for our data. We empirically tested different components of our proposed preprocessing strategy and believe that our results suggest that a preprocessing strategy that effectively identifies outliers, normalizes the data, and corrects for batch effects can be applied to all studies, even those pursued with degraded samples. Frontiers Research Foundation 2012-02-24 /pmc/articles/PMC3286152/ /pubmed/22375143 http://dx.doi.org/10.3389/fgene.2012.00011 Text en Copyright © 2012 Chow, Winn, Li, April, Wynshaw-Boris, Fan, Fu, Courchesne and Schork. http://www.frontiersin.org/licenseagreement This is an open-access article distributed under the terms of the Creative Commons Attribution Non Commercial License, which permits non-commercial use, distribution, and reproduction in other forums, provided the original authors and source are credited.
spellingShingle Genetics
Chow, Maggie L.
Winn, Mary E.
Li, Hai-Ri
April, Craig
Wynshaw-Boris, Anthony
Fan, Jian-Bing
Fu, Xiang-Dong
Courchesne, Eric
Schork, Nicholas J.
Preprocessing and Quality Control Strategies for Illumina DASL Assay-Based Brain Gene Expression Studies with Semi-Degraded Samples
title Preprocessing and Quality Control Strategies for Illumina DASL Assay-Based Brain Gene Expression Studies with Semi-Degraded Samples
title_full Preprocessing and Quality Control Strategies for Illumina DASL Assay-Based Brain Gene Expression Studies with Semi-Degraded Samples
title_fullStr Preprocessing and Quality Control Strategies for Illumina DASL Assay-Based Brain Gene Expression Studies with Semi-Degraded Samples
title_full_unstemmed Preprocessing and Quality Control Strategies for Illumina DASL Assay-Based Brain Gene Expression Studies with Semi-Degraded Samples
title_short Preprocessing and Quality Control Strategies for Illumina DASL Assay-Based Brain Gene Expression Studies with Semi-Degraded Samples
title_sort preprocessing and quality control strategies for illumina dasl assay-based brain gene expression studies with semi-degraded samples
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3286152/
https://www.ncbi.nlm.nih.gov/pubmed/22375143
http://dx.doi.org/10.3389/fgene.2012.00011
work_keys_str_mv AT chowmaggiel preprocessingandqualitycontrolstrategiesforilluminadaslassaybasedbraingeneexpressionstudieswithsemidegradedsamples
AT winnmarye preprocessingandqualitycontrolstrategiesforilluminadaslassaybasedbraingeneexpressionstudieswithsemidegradedsamples
AT lihairi preprocessingandqualitycontrolstrategiesforilluminadaslassaybasedbraingeneexpressionstudieswithsemidegradedsamples
AT aprilcraig preprocessingandqualitycontrolstrategiesforilluminadaslassaybasedbraingeneexpressionstudieswithsemidegradedsamples
AT wynshawborisanthony preprocessingandqualitycontrolstrategiesforilluminadaslassaybasedbraingeneexpressionstudieswithsemidegradedsamples
AT fanjianbing preprocessingandqualitycontrolstrategiesforilluminadaslassaybasedbraingeneexpressionstudieswithsemidegradedsamples
AT fuxiangdong preprocessingandqualitycontrolstrategiesforilluminadaslassaybasedbraingeneexpressionstudieswithsemidegradedsamples
AT courchesneeric preprocessingandqualitycontrolstrategiesforilluminadaslassaybasedbraingeneexpressionstudieswithsemidegradedsamples
AT schorknicholasj preprocessingandqualitycontrolstrategiesforilluminadaslassaybasedbraingeneexpressionstudieswithsemidegradedsamples