Cargando…

ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses

BACKGROUND: A survey of microarray databases reveals that most of the repository contents and data models are heterogeneous (i.e., data obtained from different chip manufacturers), and that the repositories provide only basic biological keywords linking to PubMed. As a result, it is difficult to fin...

Descripción completa

Detalles Bibliográficos
Autores principales: Stokes, Todd H, Torrance, JT, Li, Henry, Wang, May D
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2423441/
https://www.ncbi.nlm.nih.gov/pubmed/18541053
http://dx.doi.org/10.1186/1471-2105-9-S6-S18
_version_ 1782156100233068544
author Stokes, Todd H
Torrance, JT
Li, Henry
Wang, May D
author_facet Stokes, Todd H
Torrance, JT
Li, Henry
Wang, May D
author_sort Stokes, Todd H
collection PubMed
description BACKGROUND: A survey of microarray databases reveals that most of the repository contents and data models are heterogeneous (i.e., data obtained from different chip manufacturers), and that the repositories provide only basic biological keywords linking to PubMed. As a result, it is difficult to find datasets using research context or analysis parameters information beyond a few keywords. For example, to reduce the "curse-of-dimension" problem in microarray analysis, the number of samples is often increased by merging array data from different datasets. Knowing chip data parameters such as pre-processing steps (e.g., normalization, artefact removal, etc), and knowing any previous biological validation of the dataset is essential due to the heterogeneity of the data. However, most of the microarray repositories do not have meta-data information in the first place, and do not have a a mechanism to add or insert this information. Thus, there is a critical need to create "intelligent" microarray repositories that (1) enable update of meta-data with the raw array data, and (2) provide standardized archiving protocols to minimize bias from the raw data sources. RESULTS: To address the problems discussed, we have developed a community maintained system called ArrayWiki that unites disparate meta-data of microarray meta-experiments from multiple primary sources with four key features. First, ArrayWiki provides a user-friendly knowledge management interface in addition to a programmable interface using standards developed by Wikipedia. Second, ArrayWiki includes automated quality control processes (caCORRECT) and novel visualization methods (BioPNG, Gel Plots), which provide extra information about data quality unavailable in other microarray repositories. Third, it provides a user-curation capability through the familiar Wiki interface. Fourth, ArrayWiki provides users with simple text-based searches across all experiment meta-data, and exposes data to search engine crawlers (Semantic Agents) such as Google to further enhance data discovery. CONCLUSIONS: Microarray data and meta information in ArrayWiki are distributed and visualized using a novel and compact data storage format, BioPNG. Also, they are open to the research community for curation, modification, and contribution. By making a small investment of time to learn the syntax and structure common to all sites running MediaWiki software, domain scientists and practioners can all contribute to make better use of microarray technologies in research and medical practices. ArrayWiki is available at .
format Text
id pubmed-2423441
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24234412008-06-11 ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses Stokes, Todd H Torrance, JT Li, Henry Wang, May D BMC Bioinformatics Research BACKGROUND: A survey of microarray databases reveals that most of the repository contents and data models are heterogeneous (i.e., data obtained from different chip manufacturers), and that the repositories provide only basic biological keywords linking to PubMed. As a result, it is difficult to find datasets using research context or analysis parameters information beyond a few keywords. For example, to reduce the "curse-of-dimension" problem in microarray analysis, the number of samples is often increased by merging array data from different datasets. Knowing chip data parameters such as pre-processing steps (e.g., normalization, artefact removal, etc), and knowing any previous biological validation of the dataset is essential due to the heterogeneity of the data. However, most of the microarray repositories do not have meta-data information in the first place, and do not have a a mechanism to add or insert this information. Thus, there is a critical need to create "intelligent" microarray repositories that (1) enable update of meta-data with the raw array data, and (2) provide standardized archiving protocols to minimize bias from the raw data sources. RESULTS: To address the problems discussed, we have developed a community maintained system called ArrayWiki that unites disparate meta-data of microarray meta-experiments from multiple primary sources with four key features. First, ArrayWiki provides a user-friendly knowledge management interface in addition to a programmable interface using standards developed by Wikipedia. Second, ArrayWiki includes automated quality control processes (caCORRECT) and novel visualization methods (BioPNG, Gel Plots), which provide extra information about data quality unavailable in other microarray repositories. Third, it provides a user-curation capability through the familiar Wiki interface. Fourth, ArrayWiki provides users with simple text-based searches across all experiment meta-data, and exposes data to search engine crawlers (Semantic Agents) such as Google to further enhance data discovery. CONCLUSIONS: Microarray data and meta information in ArrayWiki are distributed and visualized using a novel and compact data storage format, BioPNG. Also, they are open to the research community for curation, modification, and contribution. By making a small investment of time to learn the syntax and structure common to all sites running MediaWiki software, domain scientists and practioners can all contribute to make better use of microarray technologies in research and medical practices. ArrayWiki is available at . BioMed Central 2008-05-28 /pmc/articles/PMC2423441/ /pubmed/18541053 http://dx.doi.org/10.1186/1471-2105-9-S6-S18 Text en Copyright © 2008 Stokes et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Stokes, Todd H
Torrance, JT
Li, Henry
Wang, May D
ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses
title ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses
title_full ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses
title_fullStr ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses
title_full_unstemmed ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses
title_short ArrayWiki: an enabling technology for sharing public microarray data repositories and meta-analyses
title_sort arraywiki: an enabling technology for sharing public microarray data repositories and meta-analyses
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2423441/
https://www.ncbi.nlm.nih.gov/pubmed/18541053
http://dx.doi.org/10.1186/1471-2105-9-S6-S18
work_keys_str_mv AT stokestoddh arraywikianenablingtechnologyforsharingpublicmicroarraydatarepositoriesandmetaanalyses
AT torrancejt arraywikianenablingtechnologyforsharingpublicmicroarraydatarepositoriesandmetaanalyses
AT lihenry arraywikianenablingtechnologyforsharingpublicmicroarraydatarepositoriesandmetaanalyses
AT wangmayd arraywikianenablingtechnologyforsharingpublicmicroarraydatarepositoriesandmetaanalyses