Cargando…

A novel data mining method to identify assay-specific signatures in functional genomic studies

BACKGROUND: The highly dimensional data produced by functional genomic (FG) studies makes it difficult to visualize relationships between gene products and experimental conditions (i.e., assays). Although dimensionality reduction methods such as principal component analysis (PCA) have been very usef...

Descripción completa

Detalles Bibliográficos
Autores principales: Rollins, Derrick K, Zhai, Dongmei, Joe, Alrica L, Guidarelli, Jack W, Murarka, Abhishek, Gonzalez, Ramon
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1599756/
https://www.ncbi.nlm.nih.gov/pubmed/16907975
http://dx.doi.org/10.1186/1471-2105-7-377
_version_ 1782130444405309440
author Rollins, Derrick K
Zhai, Dongmei
Joe, Alrica L
Guidarelli, Jack W
Murarka, Abhishek
Gonzalez, Ramon
author_facet Rollins, Derrick K
Zhai, Dongmei
Joe, Alrica L
Guidarelli, Jack W
Murarka, Abhishek
Gonzalez, Ramon
author_sort Rollins, Derrick K
collection PubMed
description BACKGROUND: The highly dimensional data produced by functional genomic (FG) studies makes it difficult to visualize relationships between gene products and experimental conditions (i.e., assays). Although dimensionality reduction methods such as principal component analysis (PCA) have been very useful, their application to identify assay-specific signatures has been limited by the lack of appropriate methodologies. This article proposes a new and powerful PCA-based method for the identification of assay-specific gene signatures in FG studies. RESULTS: The proposed method (PM) is unique for several reasons. First, it is the only one, to our knowledge, that uses gene contribution, a product of the loading and expression level, to obtain assay signatures. The PM develops and exploits two types of assay-specific contribution plots, which are new to the application of PCA in the FG area. The first type plots the assay-specific gene contribution against the given order of the genes and reveals variations in distribution between assay-specific gene signatures as well as outliers within assay groups indicating the degree of importance of the most dominant genes. The second type plots the contribution of each gene in ascending or descending order against a constantly increasing index. This type of plots reveals assay-specific gene signatures defined by the inflection points in the curve. In addition, sharp regions within the signature define the genes that contribute the most to the signature. We proposed and used the curvature as an appropriate metric to characterize these sharp regions, thus identifying the subset of genes contributing the most to the signature. Finally, the PM uses the full dataset to determine the final gene signature, thus eliminating the chance of gene exclusion by poor screening in earlier steps. The strengths of the PM are demonstrated using a simulation study, and two studies of real DNA microarray data – a study of classification of human tissue samples and a study of E. coli cultures with different medium formulations. CONCLUSION: We have developed a PCA-based method that effectively identifies assay-specific signatures in ranked groups of genes from the full data set in a more efficient and simplistic procedure than current approaches. Although this work demonstrates the ability of the PM to identify assay-specific signatures in DNA microarray experiments, this approach could be useful in areas such as proteomics and metabolomics.
format Text
id pubmed-1599756
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-15997562006-10-12 A novel data mining method to identify assay-specific signatures in functional genomic studies Rollins, Derrick K Zhai, Dongmei Joe, Alrica L Guidarelli, Jack W Murarka, Abhishek Gonzalez, Ramon BMC Bioinformatics Methodology Article BACKGROUND: The highly dimensional data produced by functional genomic (FG) studies makes it difficult to visualize relationships between gene products and experimental conditions (i.e., assays). Although dimensionality reduction methods such as principal component analysis (PCA) have been very useful, their application to identify assay-specific signatures has been limited by the lack of appropriate methodologies. This article proposes a new and powerful PCA-based method for the identification of assay-specific gene signatures in FG studies. RESULTS: The proposed method (PM) is unique for several reasons. First, it is the only one, to our knowledge, that uses gene contribution, a product of the loading and expression level, to obtain assay signatures. The PM develops and exploits two types of assay-specific contribution plots, which are new to the application of PCA in the FG area. The first type plots the assay-specific gene contribution against the given order of the genes and reveals variations in distribution between assay-specific gene signatures as well as outliers within assay groups indicating the degree of importance of the most dominant genes. The second type plots the contribution of each gene in ascending or descending order against a constantly increasing index. This type of plots reveals assay-specific gene signatures defined by the inflection points in the curve. In addition, sharp regions within the signature define the genes that contribute the most to the signature. We proposed and used the curvature as an appropriate metric to characterize these sharp regions, thus identifying the subset of genes contributing the most to the signature. Finally, the PM uses the full dataset to determine the final gene signature, thus eliminating the chance of gene exclusion by poor screening in earlier steps. The strengths of the PM are demonstrated using a simulation study, and two studies of real DNA microarray data – a study of classification of human tissue samples and a study of E. coli cultures with different medium formulations. CONCLUSION: We have developed a PCA-based method that effectively identifies assay-specific signatures in ranked groups of genes from the full data set in a more efficient and simplistic procedure than current approaches. Although this work demonstrates the ability of the PM to identify assay-specific signatures in DNA microarray experiments, this approach could be useful in areas such as proteomics and metabolomics. BioMed Central 2006-08-14 /pmc/articles/PMC1599756/ /pubmed/16907975 http://dx.doi.org/10.1186/1471-2105-7-377 Text en Copyright © 2006 Rollins et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Rollins, Derrick K
Zhai, Dongmei
Joe, Alrica L
Guidarelli, Jack W
Murarka, Abhishek
Gonzalez, Ramon
A novel data mining method to identify assay-specific signatures in functional genomic studies
title A novel data mining method to identify assay-specific signatures in functional genomic studies
title_full A novel data mining method to identify assay-specific signatures in functional genomic studies
title_fullStr A novel data mining method to identify assay-specific signatures in functional genomic studies
title_full_unstemmed A novel data mining method to identify assay-specific signatures in functional genomic studies
title_short A novel data mining method to identify assay-specific signatures in functional genomic studies
title_sort novel data mining method to identify assay-specific signatures in functional genomic studies
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1599756/
https://www.ncbi.nlm.nih.gov/pubmed/16907975
http://dx.doi.org/10.1186/1471-2105-7-377
work_keys_str_mv AT rollinsderrickk anoveldataminingmethodtoidentifyassayspecificsignaturesinfunctionalgenomicstudies
AT zhaidongmei anoveldataminingmethodtoidentifyassayspecificsignaturesinfunctionalgenomicstudies
AT joealrical anoveldataminingmethodtoidentifyassayspecificsignaturesinfunctionalgenomicstudies
AT guidarellijackw anoveldataminingmethodtoidentifyassayspecificsignaturesinfunctionalgenomicstudies
AT murarkaabhishek anoveldataminingmethodtoidentifyassayspecificsignaturesinfunctionalgenomicstudies
AT gonzalezramon anoveldataminingmethodtoidentifyassayspecificsignaturesinfunctionalgenomicstudies
AT rollinsderrickk noveldataminingmethodtoidentifyassayspecificsignaturesinfunctionalgenomicstudies
AT zhaidongmei noveldataminingmethodtoidentifyassayspecificsignaturesinfunctionalgenomicstudies
AT joealrical noveldataminingmethodtoidentifyassayspecificsignaturesinfunctionalgenomicstudies
AT guidarellijackw noveldataminingmethodtoidentifyassayspecificsignaturesinfunctionalgenomicstudies
AT murarkaabhishek noveldataminingmethodtoidentifyassayspecificsignaturesinfunctionalgenomicstudies
AT gonzalezramon noveldataminingmethodtoidentifyassayspecificsignaturesinfunctionalgenomicstudies