Cargando…

Method Designed to Respect Molecular Heterogeneity Can Profoundly Correct Present Data Interpretations for Genome-Wide Expression Analysis

Although genome-wide expression analysis has become a routine tool for gaining insight into molecular mechanisms, extraction of information remains a major challenge. It has been unclear why standard statistical methods, such as the t-test and ANOVA, often lead to low levels of reproducibility, how...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Chih-Hao, Hsu, Chueh-Lin, Huang, Shih-Hao, Chen, Shih-Yuan, Hung, Yi-Lin, Chen, Hsiao-Rong, Wu, Yu-Chung, Su, Li-Jen, Lee, H.C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4368820/
https://www.ncbi.nlm.nih.gov/pubmed/25793610
http://dx.doi.org/10.1371/journal.pone.0121154
_version_ 1782362695010353152
author Chen, Chih-Hao
Hsu, Chueh-Lin
Huang, Shih-Hao
Chen, Shih-Yuan
Hung, Yi-Lin
Chen, Hsiao-Rong
Wu, Yu-Chung
Su, Li-Jen
Lee, H.C.
author_facet Chen, Chih-Hao
Hsu, Chueh-Lin
Huang, Shih-Hao
Chen, Shih-Yuan
Hung, Yi-Lin
Chen, Hsiao-Rong
Wu, Yu-Chung
Su, Li-Jen
Lee, H.C.
author_sort Chen, Chih-Hao
collection PubMed
description Although genome-wide expression analysis has become a routine tool for gaining insight into molecular mechanisms, extraction of information remains a major challenge. It has been unclear why standard statistical methods, such as the t-test and ANOVA, often lead to low levels of reproducibility, how likely applying fold-change cutoffs to enhance reproducibility is to miss key signals, and how adversely using such methods has affected data interpretations. We broadly examined expression data to investigate the reproducibility problem and discovered that molecular heterogeneity, a biological property of genetically different samples, has been improperly handled by the statistical methods. Here we give a mathematical description of the discovery and report the development of a statistical method, named HTA, for better handling molecular heterogeneity. We broadly demonstrate the improved sensitivity and specificity of HTA over the conventional methods and show that using fold-change cutoffs has lost much information. We illustrate the especial usefulness of HTA for heterogeneous diseases, by applying it to existing data sets of schizophrenia, bipolar disorder and Parkinson’s disease, and show it can abundantly and reproducibly uncover disease signatures not previously detectable. Based on 156 biological data sets, we estimate that the methodological issue has affected over 96% of expression studies and that HTA can profoundly correct 86% of the affected data interpretations. The methodological advancement can better facilitate systems understandings of biological processes, render biological inferences that are more reliable than they have hitherto been and engender translational medical applications, such as identifying diagnostic biomarkers and drug prediction, which are more robust.
format Online
Article
Text
id pubmed-4368820
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-43688202015-03-27 Method Designed to Respect Molecular Heterogeneity Can Profoundly Correct Present Data Interpretations for Genome-Wide Expression Analysis Chen, Chih-Hao Hsu, Chueh-Lin Huang, Shih-Hao Chen, Shih-Yuan Hung, Yi-Lin Chen, Hsiao-Rong Wu, Yu-Chung Su, Li-Jen Lee, H.C. PLoS One Research Article Although genome-wide expression analysis has become a routine tool for gaining insight into molecular mechanisms, extraction of information remains a major challenge. It has been unclear why standard statistical methods, such as the t-test and ANOVA, often lead to low levels of reproducibility, how likely applying fold-change cutoffs to enhance reproducibility is to miss key signals, and how adversely using such methods has affected data interpretations. We broadly examined expression data to investigate the reproducibility problem and discovered that molecular heterogeneity, a biological property of genetically different samples, has been improperly handled by the statistical methods. Here we give a mathematical description of the discovery and report the development of a statistical method, named HTA, for better handling molecular heterogeneity. We broadly demonstrate the improved sensitivity and specificity of HTA over the conventional methods and show that using fold-change cutoffs has lost much information. We illustrate the especial usefulness of HTA for heterogeneous diseases, by applying it to existing data sets of schizophrenia, bipolar disorder and Parkinson’s disease, and show it can abundantly and reproducibly uncover disease signatures not previously detectable. Based on 156 biological data sets, we estimate that the methodological issue has affected over 96% of expression studies and that HTA can profoundly correct 86% of the affected data interpretations. The methodological advancement can better facilitate systems understandings of biological processes, render biological inferences that are more reliable than they have hitherto been and engender translational medical applications, such as identifying diagnostic biomarkers and drug prediction, which are more robust. Public Library of Science 2015-03-20 /pmc/articles/PMC4368820/ /pubmed/25793610 http://dx.doi.org/10.1371/journal.pone.0121154 Text en © 2015 Chen et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Chen, Chih-Hao
Hsu, Chueh-Lin
Huang, Shih-Hao
Chen, Shih-Yuan
Hung, Yi-Lin
Chen, Hsiao-Rong
Wu, Yu-Chung
Su, Li-Jen
Lee, H.C.
Method Designed to Respect Molecular Heterogeneity Can Profoundly Correct Present Data Interpretations for Genome-Wide Expression Analysis
title Method Designed to Respect Molecular Heterogeneity Can Profoundly Correct Present Data Interpretations for Genome-Wide Expression Analysis
title_full Method Designed to Respect Molecular Heterogeneity Can Profoundly Correct Present Data Interpretations for Genome-Wide Expression Analysis
title_fullStr Method Designed to Respect Molecular Heterogeneity Can Profoundly Correct Present Data Interpretations for Genome-Wide Expression Analysis
title_full_unstemmed Method Designed to Respect Molecular Heterogeneity Can Profoundly Correct Present Data Interpretations for Genome-Wide Expression Analysis
title_short Method Designed to Respect Molecular Heterogeneity Can Profoundly Correct Present Data Interpretations for Genome-Wide Expression Analysis
title_sort method designed to respect molecular heterogeneity can profoundly correct present data interpretations for genome-wide expression analysis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4368820/
https://www.ncbi.nlm.nih.gov/pubmed/25793610
http://dx.doi.org/10.1371/journal.pone.0121154
work_keys_str_mv AT chenchihhao methoddesignedtorespectmolecularheterogeneitycanprofoundlycorrectpresentdatainterpretationsforgenomewideexpressionanalysis
AT hsuchuehlin methoddesignedtorespectmolecularheterogeneitycanprofoundlycorrectpresentdatainterpretationsforgenomewideexpressionanalysis
AT huangshihhao methoddesignedtorespectmolecularheterogeneitycanprofoundlycorrectpresentdatainterpretationsforgenomewideexpressionanalysis
AT chenshihyuan methoddesignedtorespectmolecularheterogeneitycanprofoundlycorrectpresentdatainterpretationsforgenomewideexpressionanalysis
AT hungyilin methoddesignedtorespectmolecularheterogeneitycanprofoundlycorrectpresentdatainterpretationsforgenomewideexpressionanalysis
AT chenhsiaorong methoddesignedtorespectmolecularheterogeneitycanprofoundlycorrectpresentdatainterpretationsforgenomewideexpressionanalysis
AT wuyuchung methoddesignedtorespectmolecularheterogeneitycanprofoundlycorrectpresentdatainterpretationsforgenomewideexpressionanalysis
AT sulijen methoddesignedtorespectmolecularheterogeneitycanprofoundlycorrectpresentdatainterpretationsforgenomewideexpressionanalysis
AT leehc methoddesignedtorespectmolecularheterogeneitycanprofoundlycorrectpresentdatainterpretationsforgenomewideexpressionanalysis