Cargando…

Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data

This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems...

Descripción completa

Detalles Bibliográficos
Autores principales: Komori, Osamu, Pritchard, Mari, Eguchi, Shinto
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi Publishing Corporation 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3639638/
https://www.ncbi.nlm.nih.gov/pubmed/23662163
http://dx.doi.org/10.1155/2013/798189
_version_ 1782475968792756224
author Komori, Osamu
Pritchard, Mari
Eguchi, Shinto
author_facet Komori, Osamu
Pritchard, Mari
Eguchi, Shinto
author_sort Komori, Osamu
collection PubMed
description This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence.
format Online
Article
Text
id pubmed-3639638
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Hindawi Publishing Corporation
record_format MEDLINE/PubMed
spelling pubmed-36396382013-05-09 Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data Komori, Osamu Pritchard, Mari Eguchi, Shinto Comput Math Methods Med Research Article This paper discusses mathematical and statistical aspects in analysis methods applied to microarray gene expressions. We focus on pattern recognition to extract informative features embedded in the data for prediction of phenotypes. It has been pointed out that there are severely difficult problems due to the unbalance in the number of observed genes compared with the number of observed subjects. We make a reanalysis of microarray gene expression published data to detect many other gene sets with almost the same performance. We conclude in the current stage that it is not possible to extract only informative genes with high performance in the all observed genes. We investigate the reason why this difficulty still exists even though there are actively proposed analysis methods and learning algorithms in statistical machine learning approaches. We focus on the mutual coherence or the absolute value of the Pearson correlations between two genes and describe the distributions of the correlation for the selected set of genes and the total set. We show that the problem of finding informative genes in high dimensional data is ill-posed and that the difficulty is closely related with the mutual coherence. Hindawi Publishing Corporation 2013 2013-04-16 /pmc/articles/PMC3639638/ /pubmed/23662163 http://dx.doi.org/10.1155/2013/798189 Text en Copyright © 2013 Osamu Komori et al. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Komori, Osamu
Pritchard, Mari
Eguchi, Shinto
Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data
title Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data
title_full Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data
title_fullStr Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data
title_full_unstemmed Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data
title_short Multiple Suboptimal Solutions for Prediction Rules in Gene Expression Data
title_sort multiple suboptimal solutions for prediction rules in gene expression data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3639638/
https://www.ncbi.nlm.nih.gov/pubmed/23662163
http://dx.doi.org/10.1155/2013/798189
work_keys_str_mv AT komoriosamu multiplesuboptimalsolutionsforpredictionrulesingeneexpressiondata
AT pritchardmari multiplesuboptimalsolutionsforpredictionrulesingeneexpressiondata
AT eguchishinto multiplesuboptimalsolutionsforpredictionrulesingeneexpressiondata