Cargando…

BABAR: an R package to simplify the normalisation of common reference design microarray-based transcriptomic datasets

BACKGROUND: The development of DNA microarrays has facilitated the generation of hundreds of thousands of transcriptomic datasets. The use of a common reference microarray design allows existing transcriptomic data to be readily compared and re-analysed in the light of new data, and the combination...

Descripción completa

Detalles Bibliográficos
Autores principales: Alston, Mark J, Seers, John, Hinton, Jay CD, Lucchini, Sacha
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2829013/
https://www.ncbi.nlm.nih.gov/pubmed/20128918
http://dx.doi.org/10.1186/1471-2105-11-73
_version_ 1782178059674189824
author Alston, Mark J
Seers, John
Hinton, Jay CD
Lucchini, Sacha
author_facet Alston, Mark J
Seers, John
Hinton, Jay CD
Lucchini, Sacha
author_sort Alston, Mark J
collection PubMed
description BACKGROUND: The development of DNA microarrays has facilitated the generation of hundreds of thousands of transcriptomic datasets. The use of a common reference microarray design allows existing transcriptomic data to be readily compared and re-analysed in the light of new data, and the combination of this design with large datasets is ideal for 'systems'-level analyses. One issue is that these datasets are typically collected over many years and may be heterogeneous in nature, containing different microarray file formats and gene array layouts, dye-swaps, and showing varying scales of log(2)- ratios of expression between microarrays. Excellent software exists for the normalisation and analysis of microarray data but many data have yet to be analysed as existing methods struggle with heterogeneous datasets; options include normalising microarrays on an individual or experimental group basis. Our solution was to develop the Batch Anti-Banana Algorithm in R (BABAR) algorithm and software package which uses cyclic loess to normalise across the complete dataset. We have already used BABAR to analyse the function of Salmonella genes involved in the process of infection of mammalian cells. RESULTS: The only input required by BABAR is unprocessed GenePix or BlueFuse microarray data files. BABAR provides a combination of 'within' and 'between' microarray normalisation steps and diagnostic boxplots. When applied to a real heterogeneous dataset, BABAR normalised the dataset to produce a comparable scaling between the microarrays, with the microarray data in excellent agreement with RT-PCR analysis. When applied to a real non-heterogeneous dataset and a simulated dataset, BABAR's performance in identifying differentially expressed genes showed some benefits over standard techniques. CONCLUSIONS: BABAR is an easy-to-use software tool, simplifying the simultaneous normalisation of heterogeneous two-colour common reference design cDNA microarray-based transcriptomic datasets. We show BABAR transforms real and simulated datasets to allow for the correct interpretation of these data, and is the ideal tool to facilitate the identification of differentially expressed genes or network inference analysis from transcriptomic datasets.
format Text
id pubmed-2829013
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28290132010-02-26 BABAR: an R package to simplify the normalisation of common reference design microarray-based transcriptomic datasets Alston, Mark J Seers, John Hinton, Jay CD Lucchini, Sacha BMC Bioinformatics Software BACKGROUND: The development of DNA microarrays has facilitated the generation of hundreds of thousands of transcriptomic datasets. The use of a common reference microarray design allows existing transcriptomic data to be readily compared and re-analysed in the light of new data, and the combination of this design with large datasets is ideal for 'systems'-level analyses. One issue is that these datasets are typically collected over many years and may be heterogeneous in nature, containing different microarray file formats and gene array layouts, dye-swaps, and showing varying scales of log(2)- ratios of expression between microarrays. Excellent software exists for the normalisation and analysis of microarray data but many data have yet to be analysed as existing methods struggle with heterogeneous datasets; options include normalising microarrays on an individual or experimental group basis. Our solution was to develop the Batch Anti-Banana Algorithm in R (BABAR) algorithm and software package which uses cyclic loess to normalise across the complete dataset. We have already used BABAR to analyse the function of Salmonella genes involved in the process of infection of mammalian cells. RESULTS: The only input required by BABAR is unprocessed GenePix or BlueFuse microarray data files. BABAR provides a combination of 'within' and 'between' microarray normalisation steps and diagnostic boxplots. When applied to a real heterogeneous dataset, BABAR normalised the dataset to produce a comparable scaling between the microarrays, with the microarray data in excellent agreement with RT-PCR analysis. When applied to a real non-heterogeneous dataset and a simulated dataset, BABAR's performance in identifying differentially expressed genes showed some benefits over standard techniques. CONCLUSIONS: BABAR is an easy-to-use software tool, simplifying the simultaneous normalisation of heterogeneous two-colour common reference design cDNA microarray-based transcriptomic datasets. We show BABAR transforms real and simulated datasets to allow for the correct interpretation of these data, and is the ideal tool to facilitate the identification of differentially expressed genes or network inference analysis from transcriptomic datasets. BioMed Central 2010-02-03 /pmc/articles/PMC2829013/ /pubmed/20128918 http://dx.doi.org/10.1186/1471-2105-11-73 Text en Copyright ©2010 Alston et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Alston, Mark J
Seers, John
Hinton, Jay CD
Lucchini, Sacha
BABAR: an R package to simplify the normalisation of common reference design microarray-based transcriptomic datasets
title BABAR: an R package to simplify the normalisation of common reference design microarray-based transcriptomic datasets
title_full BABAR: an R package to simplify the normalisation of common reference design microarray-based transcriptomic datasets
title_fullStr BABAR: an R package to simplify the normalisation of common reference design microarray-based transcriptomic datasets
title_full_unstemmed BABAR: an R package to simplify the normalisation of common reference design microarray-based transcriptomic datasets
title_short BABAR: an R package to simplify the normalisation of common reference design microarray-based transcriptomic datasets
title_sort babar: an r package to simplify the normalisation of common reference design microarray-based transcriptomic datasets
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2829013/
https://www.ncbi.nlm.nih.gov/pubmed/20128918
http://dx.doi.org/10.1186/1471-2105-11-73
work_keys_str_mv AT alstonmarkj babaranrpackagetosimplifythenormalisationofcommonreferencedesignmicroarraybasedtranscriptomicdatasets
AT seersjohn babaranrpackagetosimplifythenormalisationofcommonreferencedesignmicroarraybasedtranscriptomicdatasets
AT hintonjaycd babaranrpackagetosimplifythenormalisationofcommonreferencedesignmicroarraybasedtranscriptomicdatasets
AT lucchinisacha babaranrpackagetosimplifythenormalisationofcommonreferencedesignmicroarraybasedtranscriptomicdatasets