Cargando…

Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease

BACKGROUND: Feature extraction (FE) is difficult, particularly if there are more features than samples, as small sample numbers often result in biased outcomes or overfitting. Furthermore, multiple sample classes often complicate FE because evaluating performance, which is usual in supervised FE, is...

Descripción completa

Detalles Bibliográficos
Autores principales: Taguchi, Y-h, Iwadate, Mitsuo, Umeyama, Hideaki
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448281/
https://www.ncbi.nlm.nih.gov/pubmed/25925353
http://dx.doi.org/10.1186/s12859-015-0574-4
_version_ 1782373686706176000
author Taguchi, Y-h
Iwadate, Mitsuo
Umeyama, Hideaki
author_facet Taguchi, Y-h
Iwadate, Mitsuo
Umeyama, Hideaki
author_sort Taguchi, Y-h
collection PubMed
description BACKGROUND: Feature extraction (FE) is difficult, particularly if there are more features than samples, as small sample numbers often result in biased outcomes or overfitting. Furthermore, multiple sample classes often complicate FE because evaluating performance, which is usual in supervised FE, is generally harder than the two-class problem. Developing sample classification independent unsupervised methods would solve many of these problems. RESULTS: Two principal component analysis (PCA)-based FE, specifically, variational Bayes PCA (VBPCA) was extended to perform unsupervised FE, and together with conventional PCA (CPCA)-based unsupervised FE, were tested as sample classification independent unsupervised FE methods. VBPCA- and CPCA-based unsupervised FE both performed well when applied to simulated data, and a posttraumatic stress disorder (PTSD)-mediated heart disease data set that had multiple categorical class observations in mRNA/microRNA expression of stressed mouse heart. A critical set of PTSD miRNAs/mRNAs were identified that show aberrant expression between treatment and control samples, and significant, negative correlation with one another. Moreover, greater stability and biological feasibility than conventional supervised FE was also demonstrated. Based on the results obtained, in silico drug discovery was performed as translational validation of the methods. CONCLUSIONS: Our two proposed unsupervised FE methods (CPCA- and VBPCA-based) worked well on simulated data, and outperformed two conventional supervised FE methods on a real data set. Thus, these two methods have suggested equivalence for FE on categorical multiclass data sets, with potential translational utility for in silico drug discovery. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0574-4) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4448281
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44482812015-05-30 Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease Taguchi, Y-h Iwadate, Mitsuo Umeyama, Hideaki BMC Bioinformatics Research Article BACKGROUND: Feature extraction (FE) is difficult, particularly if there are more features than samples, as small sample numbers often result in biased outcomes or overfitting. Furthermore, multiple sample classes often complicate FE because evaluating performance, which is usual in supervised FE, is generally harder than the two-class problem. Developing sample classification independent unsupervised methods would solve many of these problems. RESULTS: Two principal component analysis (PCA)-based FE, specifically, variational Bayes PCA (VBPCA) was extended to perform unsupervised FE, and together with conventional PCA (CPCA)-based unsupervised FE, were tested as sample classification independent unsupervised FE methods. VBPCA- and CPCA-based unsupervised FE both performed well when applied to simulated data, and a posttraumatic stress disorder (PTSD)-mediated heart disease data set that had multiple categorical class observations in mRNA/microRNA expression of stressed mouse heart. A critical set of PTSD miRNAs/mRNAs were identified that show aberrant expression between treatment and control samples, and significant, negative correlation with one another. Moreover, greater stability and biological feasibility than conventional supervised FE was also demonstrated. Based on the results obtained, in silico drug discovery was performed as translational validation of the methods. CONCLUSIONS: Our two proposed unsupervised FE methods (CPCA- and VBPCA-based) worked well on simulated data, and outperformed two conventional supervised FE methods on a real data set. Thus, these two methods have suggested equivalence for FE on categorical multiclass data sets, with potential translational utility for in silico drug discovery. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0574-4) contains supplementary material, which is available to authorized users. BioMed Central 2015-04-30 /pmc/articles/PMC4448281/ /pubmed/25925353 http://dx.doi.org/10.1186/s12859-015-0574-4 Text en © Taguchi et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Taguchi, Y-h
Iwadate, Mitsuo
Umeyama, Hideaki
Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease
title Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease
title_full Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease
title_fullStr Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease
title_full_unstemmed Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease
title_short Principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease
title_sort principal component analysis-based unsupervised feature extraction applied to in silico drug discovery for posttraumatic stress disorder-mediated heart disease
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4448281/
https://www.ncbi.nlm.nih.gov/pubmed/25925353
http://dx.doi.org/10.1186/s12859-015-0574-4
work_keys_str_mv AT taguchiyh principalcomponentanalysisbasedunsupervisedfeatureextractionappliedtoinsilicodrugdiscoveryforposttraumaticstressdisordermediatedheartdisease
AT iwadatemitsuo principalcomponentanalysisbasedunsupervisedfeatureextractionappliedtoinsilicodrugdiscoveryforposttraumaticstressdisordermediatedheartdisease
AT umeyamahideaki principalcomponentanalysisbasedunsupervisedfeatureextractionappliedtoinsilicodrugdiscoveryforposttraumaticstressdisordermediatedheartdisease